date:20250427

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

2025-04-27 Thread Sam James

Xi Ruoyao  writes:

> On Wed, 2025-04-23 at 12:43 +, Aleksandar Rakic wrote:
>> From 16b3207aed5e4846fde4f3ffa1253c65ef6ba056 Mon Sep 17 00:00:00 2001
>> From: Aleksandar Rakic 
>> Date: Wed, 23 Apr 2025 14:14:17 +0200
>> Subject: [PATCH] Make MSA and microMIPS R5 unsupported
>> 
>> There are no platforms nor simulators for MSA and microMIPS R5 so
>> turning off this support for now.
>> 
>> gcc/ChangeLog:
>> 
>>  * config/mips/mips.cc (mips_option_override): Error out for
>>  -mmicromips -mips32r5 -mmsa.
>> 
>> Cherry-picked 1009d6ff7a8d3b56e0224a6b193c5a7b3c29aa5f
>> from https://github.com/MIPS/gcc
>> 
>> Signed-off-by: Matthew Fortune 
>> Signed-off-by: Faraz Shahbazker 
>> Signed-off-by: Aleksandar Rakic 
>> ---
>>  gcc/config/mips/mips.cc | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
>> index 0d3d0263f2d..23205dfb616 100644
>> --- a/gcc/config/mips/mips.cc
>> +++ b/gcc/config/mips/mips.cc
>> @@ -20414,6 +20414,7 @@ static void
>>  mips_option_override (void)
>>  {
>>    int i, regno, mode;
>> +  unsigned int is_micromips;
>>  
>>    if (OPTION_SET_P (mips_isa_option))
>>  mips_isa_option_info = &mips_cpu_info_table[mips_isa_option];
>> @@ -20434,6 +20435,7 @@ mips_option_override (void)
>>    /* Save the base compression state and process flags as though we
>>   were generating uncompressed code.  */
>>    mips_base_compression_flags = TARGET_COMPRESSION;
>> +  is_micromips = TARGET_MICROMIPS;
>>    target_flags &= ~TARGET_COMPRESSION;
>>    mips_base_code_readable = mips_code_readable;
>>  
>> @@ -20678,7 +20680,7 @@ mips_option_override (void)
>>    "-mcompact-branches=never");
>>  }
>>  
>> -  if (is_micromips && TARGET_MSA)
>> +  if (is_micromips && mips_isa_rev <= 5 && TARGET_MSA)
>
> Why not just "TARGET_MICROMIPS && mips_isa_rev <= 5 && TARGET_MSA"?
>
>>  error ("unsupported combination: %s", "-mmicromips -mmsa");
>
> And should this line be updated too like "-mmicromips -mmsa is only
> supported for MIPSr6"?
>
> Unfortunately the original patch is already applied and breaking even a
> non-bootstrapping build for MIPS.  Thus a fix is needed ASAP or we'd
> revert the original patch.

i.e. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119929#c2.

Also, Aleksandar, do you have an account on Bugzilla? It'd be useful to
be able to CC you on any MIPS-related issues with the upstreaming of
these patches. Thanks.

Re: [PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

2025-04-27 Thread Eric Botcazou

> For Windows x86-32 targets, the Microsoft ABI only guarantees that the
> stack is aligned to 4-byte boundaries. GCC knows about the default
> alignment of the stack. However, before this commit, it did not realign the
> stack unless SSE was also enabled.
> 
> When a stricter (larger) alignment is requested, it's always necessary to
> realign the stack, as what Solaris does.

Yes, or else if you configure the compiler --with-fpmath=sse (which is IMO the 
right thing to do for native 32-bit x86 platforms nowadays).

>   PR target/07
>   * config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h.

FWIW looks good to me.

-- 
Eric Botcazou

[PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

2025-04-27 Thread LIU Hao


For Windows x86-32 targets, the Microsoft ABI only guarantees that the stack
is aligned to 4-byte boundaries. GCC knows about the default alignment of the
stack. However, before this commit, it did not realign the stack unless SSE
was also enabled.

When a stricter (larger) alignment is requested, it's always necessary to
realign the stack, as what Solaris does.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c14
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR target/07
* config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h.
---
 gcc/config/i386/cygming.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..d587d25a58a8 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 /* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
may require 16-byte alignment.  */
 #undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT TARGET_SSE
+#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
  /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
--
2.49.0

From d043b30147e00231a99012d631bfc6291340b283 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Sun, 27 Apr 2025 18:18:34 +0800
Subject: [PATCH] gcc: For Windows x86-32, always attempt to realign stack
 regardless of SSE

For Windows x86-32 targets, the Microsoft ABI only guarantees that the stack
is aligned to 4-byte boundaries. GCC knows about the default alignment of the
stack. However, before this commit, it did not realign the stack unless SSE
was also enabled.

When a stricter (larger) alignment is requested, it's always necessary to
realign the stack, as what Solaris does.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c14
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR target/07
* config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h.
---
 gcc/config/i386/cygming.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..d587d25a58a8 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 /* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
may require 16-byte alignment.  */
 #undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT TARGET_SSE
+#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
 
 /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
-- 
2.49.0



OpenPGP_signature.asc
Description: OpenPGP digital signature

RE: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-27 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, April 25, 2025 6:55 PM
> To: Jennifer Schmitz 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit 
> VLS
> 
> Jennifer Schmitz  writes:
> > If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
> > ptrue predicate can be replaced by neon instructions (LDR and STR),
> > thus avoiding the predicate altogether. This also enables formation of
> > LDP/STP pairs.
> >
> > For example, the test cases
> >
> > svfloat64_t
> > ptrue_load (float64_t *x)
> > {
> >   svbool_t pg = svptrue_b64 ();
> >   return svld1_f64 (pg, x);
> > }
> > void
> > ptrue_store (float64_t *x, svfloat64_t data)
> > {
> >   svbool_t pg = svptrue_b64 ();
> >   return svst1_f64 (pg, x, data);
> > }
> >
> > were previously compiled to
> > (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):
> >
> > ptrue_load:
> > ptrue   p3.b, vl16
> > ld1dz0.d, p3/z, [x0]
> > ret
> > ptrue_store:
> > ptrue   p3.b, vl16
> > st1dz0.d, p3, [x0]
> > ret
> >
> > Now the are compiled to:
> >
> > ptrue_load:
> > ldr q0, [x0]
> > ret
> > ptrue_store:
> > str q0, [x0]
> > ret
> >
> > The implementation includes the if-statement
> > if (known_eq (BYTES_PER_SVE_VECTOR, 16)
> > && known_eq (GET_MODE_SIZE (mode), 16))
> >
> > which checks for 128-bit VLS and excludes partial modes with a
> > mode size < 128 (e.g. VNx2QI).
> 
> I think it would be better to use:
> 
> if (known_eq (GET_MODE_SIZE (mode), 16)
> && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA
> 
> to defend against any partial structure modes that might be added in future.
> 

Hi Both,

Just a suggestion so feel free to ignore, but I do wonder if this optimization 
shouldn't
look at the predicate bits rather than the mode size.  Since this is valid for 
any load
where the predicate uses the lower N bits where n corresponds to an Adv. SIMD 
register
size.

e.g. it should be valid for:

#include 

svfloat64_t
ptrue_load (float64_t *x)
{
  svbool_t pg = svptrue_pat_b8 (SV_VL16);
  return svld1_f64 (pg, x);
}

void
ptrue_store (float64_t *x, svfloat64_t data)
{
  svbool_t pg = svptrue_pat_b8 (SV_VL16);
  return svst1_f64 (pg, x, data);
}

In general, along with

#include 

svfloat64_t
ptrue_load (float64_t *x)
{
  svbool_t pg = svptrue_pat_b8 (SV_VL8);
  return svld1_f64 (pg, x);
}

void
ptrue_store (float64_t *x, svfloat64_t data)
{
  svbool_t pg = svptrue_pat_b8 (SV_VL8);
  return svst1_f64 (pg, x, data);
}

It just so happens that at VL128 the SV_VL16 == SV_ALL.  Looking at the 
predicate bits
instead would help optimize all codegen.

Thanks,
Tamar
> >
> > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> > OK for mainline?
> >
> > Signed-off-by: Jennifer Schmitz 
> >
> > gcc/
> > * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
> > Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.
> >
> > gcc/testsuite/
> > * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
> > * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
> > * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
> > * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
> > * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
> > ---
> >  gcc/config/aarch64/aarch64.cc | 27 ++--
> >  .../gcc.target/aarch64/sve/cond_arith_6.c |  3 +-
> >  .../aarch64/sve/ldst_ptrue_128_to_neon.c  | 36 +++
> >  .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 ---
> >  .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 ---
> >  .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 64 +--
> >  6 files changed, 102 insertions(+), 106 deletions(-)
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index f7bccf532f8..ac01149276b 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -6416,13 +6416,28 @@ aarch64_stack_protect_canary_mem
> (machine_mode mode, rtx decl_rtl,
> >  void
> >  aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
> >  {
> > -  expand_operand ops[3];
> >machine_mode mode = GET_MODE (dest);
> > -  create_output_operand (&ops[0], dest, mode);
> > -  create_input_operand (&ops[1], pred, GET_MODE(pred));
> > -  create_input_operand (&ops[2], src, mode);
> > -  temporary_volatile_ok v (true);
> > -  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
> > +  if ((MEM_P (dest) || MEM_P (src))
> > +  && known_eq (BYTES_PER_SVE_VECTOR, 16)
> > +  && known_eq (GET_MODE_SIZE (mode), 16)
> > +  && !BYTES_BIG_ENDIAN)
> > +{
> > +  rtx tmp = gen_reg_rtx (V16QImode);
> > +  emit_move_insn (tmp, lowpart_subreg (V16QImode, src, mode));
> > +  if (MEM_P (src))
> > +

[PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-04-27 Thread Josef Melcr

This patch enables constant propagation to outlined OpenMP kernels and
improves support for optimizing callback functions in general. It
implements the attribute 'callback' as found in clang, though argument
numbering is a bit different, as described below. The title says OpenMP,
but it can be used for any function which takes a callback argument, such
as pthread functions, qsort and others.

The attribute 'callback' captures the notion of a function calling one
of its arguments with some of its parameters as arguments. An OpenMP
example of such function is GOMP_parallel.
We implement the attribute with new callgraph edges called 'callback'
edges. They are imaginary edges pointing from the caller of the function
with the attribute (e.g. caller of GOMP_parallel) to the body function
itself (e.g. the outlined OpenMP body). They share their call statement
with the edge from which they are derived (direct edge caller -> GOMP_parallel
in this case). These edges allow passes such as ipa-cp to the see the
hidden call site to the body function and optimize the function accordingly.

To illustrate on an example, the body GOMP_parallel looks something
like this:

void GOMP_parallel (void (*fn) (void *), void *data, /* ... */)
{
  /* ... */
  fn (data);
  /* ... */
}


If we extend it with the attribute 'callback(1, 2)', we express that the
function calls its first argument and passes it its second argument.
This is represented in the call graph in this manner:

 direct indirect
caller -> GOMP_parallel ---> fn
  |
  --> fn
  callback

The direct edge is then the parent edge, with all callback edges being
the child edges.
While constant propagation is the main focus of this patch, callback
edges can be useful for different passes (for example, it improves icf
for OpenMP kernels), as they allow for address redirection.
If the outlined body function gets optimized and cloned, from body_fn to
body_fn.optimized, the callback edge allows us to replace the
address in the arguments list:

GOMP_parallel (body_fn, &data_struct, /* ... */);

becomes

GOMP_parallel (body_fn.optimized, &data_struct, /* ... */);

This redirection is possible for any function with the attribute.

This callback attribute implementation is partially compatible with
clang's implementation. Its semantics, arguments and argument indexing style are
the same, but we represent an unknown argument position with 0
(precedent set by attributes such as 'format'), while clang uses -1 or '?'.
We also allow for multiple callback attributes on the same function,
while clang only allows one.

The attribute allows us to propagate constants into body functions of
OpenMP constructs. Currently, GCC won't propagate the value 'c' into the
OpenMP body in the following example:

int a[100];
void test(int c) {
#pragma omp parallel for
  for (int i = 0; i < c; i++) {
if (!__builtin_constant_p(c)) {
  __builtin_abort();
}
a[i] = i;
  }
}
int main() {
  test(100);
  return a[5] - 5;
}

With this patch, the body function will get cloned and the constant 'c'
will get propagated.

Bootstrapped and regtested on x86_64-linux. OK for master?

Thanks,
Josef Melcr

gcc/ChangeLog:

* builtin-attrs.def (0): New int list.
(ATTR_CALLBACK): Callback attribute identifier.
(DEF_CALLBACK_ATTRIBUTE): Macro for callback attribute creation.
(GOMP): Attributes for libgomp functions.
(OACC): Attribute used for oacc functions.
(ATTR_CALLBACK_GOMP_LIST): ATTR_NOTHROW_LIST but with the
callback attribute added, used for many libgomp functions.
(ATTR_CALLBACK_GOMP_TASK_HELPER_LIST): Helper list for the
construction of ATTR_CALLBACK_GOMP_TASK_LIST.
(ATTR_CALLBACK_GOMP_TASK_LIST): New attribute list for
GOMP_task, includes two callback attributes.
(ATTR_CALLBACK_OACC_LIST): Same as ATTR_CALLBACK_GOMP_LIST, used
for oacc builtins.
* cgraph.cc (cgraph_add_edge_to_call_site_hash): When hashing
callback edges, always hash the parent edge.
(cgraph_node::get_edge): Always return callback parent edge.
(cgraph_edge::set_call_stmt): Add cascade for callback edges.
(symbol_table::create_edge): Allow callback edges to share the
same call statement.
(cgraph_edge::make_callback): New method, derives a callback
edge this method is called on.
(cgraph_edge::get_callback_parent_edge): New method.
(cgraph_edge::first_callback_target): New method.
(cgraph_edge::next_callback_target): New method.
(cgraph_edge::purge_callback_children): New method.
(cgraph_edge::redirect_call_stmt_to_callee): Add callback edge
redirection, set call statements for child edges when updating
the parent's statement.
(cgraph_node::remove_callers): Remove child edges when removing
their parent.
(c

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-04-27 Thread Andrew Pinski

On Sun, Apr 27, 2025 at 2:58 AM Josef Melcr  wrote:
>
> This patch enables constant propagation to outlined OpenMP kernels and
> improves support for optimizing callback functions in general. It
> implements the attribute 'callback' as found in clang, though argument
> numbering is a bit different, as described below. The title says OpenMP,
> but it can be used for any function which takes a callback argument, such
> as pthread functions, qsort and others.
>
> The attribute 'callback' captures the notion of a function calling one
> of its arguments with some of its parameters as arguments. An OpenMP
> example of such function is GOMP_parallel.
> We implement the attribute with new callgraph edges called 'callback'
> edges. They are imaginary edges pointing from the caller of the function
> with the attribute (e.g. caller of GOMP_parallel) to the body function
> itself (e.g. the outlined OpenMP body). They share their call statement
> with the edge from which they are derived (direct edge caller -> GOMP_parallel
> in this case). These edges allow passes such as ipa-cp to the see the
> hidden call site to the body function and optimize the function accordingly.
>
> To illustrate on an example, the body GOMP_parallel looks something
> like this:
>
> void GOMP_parallel (void (*fn) (void *), void *data, /* ... */)
> {
>   /* ... */
>   fn (data);
>   /* ... */
> }
>
>
> If we extend it with the attribute 'callback(1, 2)', we express that the
> function calls its first argument and passes it its second argument.
> This is represented in the call graph in this manner:
>
>  direct indirect
> caller -> GOMP_parallel ---> fn
>   |
>   --> fn
>   callback
>
> The direct edge is then the parent edge, with all callback edges being
> the child edges.
> While constant propagation is the main focus of this patch, callback
> edges can be useful for different passes (for example, it improves icf
> for OpenMP kernels), as they allow for address redirection.
> If the outlined body function gets optimized and cloned, from body_fn to
> body_fn.optimized, the callback edge allows us to replace the
> address in the arguments list:
>
> GOMP_parallel (body_fn, &data_struct, /* ... */);
>
> becomes
>
> GOMP_parallel (body_fn.optimized, &data_struct, /* ... */);
>
> This redirection is possible for any function with the attribute.
>
> This callback attribute implementation is partially compatible with
> clang's implementation. Its semantics, arguments and argument indexing style 
> are
> the same, but we represent an unknown argument position with 0
> (precedent set by attributes such as 'format'), while clang uses -1 or '?'.
> We also allow for multiple callback attributes on the same function,
> while clang only allows one.
>
> The attribute allows us to propagate constants into body functions of
> OpenMP constructs. Currently, GCC won't propagate the value 'c' into the
> OpenMP body in the following example:
>
> int a[100];
> void test(int c) {
> #pragma omp parallel for
>   for (int i = 0; i < c; i++) {
> if (!__builtin_constant_p(c)) {
>   __builtin_abort();
> }
> a[i] = i;
>   }
> }
> int main() {
>   test(100);
>   return a[5] - 5;
> }
>
> With this patch, the body function will get cloned and the constant 'c'
> will get propagated.
>
> Bootstrapped and regtested on x86_64-linux. OK for master?

This seems like it could also improve code dealing with C++ lambdas.
Have you thought of that?

Thanks,
Andrew


>
> Thanks,
> Josef Melcr
>
> gcc/ChangeLog:
>
> * builtin-attrs.def (0): New int list.
> (ATTR_CALLBACK): Callback attribute identifier.
> (DEF_CALLBACK_ATTRIBUTE): Macro for callback attribute creation.
> (GOMP): Attributes for libgomp functions.
> (OACC): Attribute used for oacc functions.
> (ATTR_CALLBACK_GOMP_LIST): ATTR_NOTHROW_LIST but with the
> callback attribute added, used for many libgomp functions.
> (ATTR_CALLBACK_GOMP_TASK_HELPER_LIST): Helper list for the
> construction of ATTR_CALLBACK_GOMP_TASK_LIST.
> (ATTR_CALLBACK_GOMP_TASK_LIST): New attribute list for
> GOMP_task, includes two callback attributes.
> (ATTR_CALLBACK_OACC_LIST): Same as ATTR_CALLBACK_GOMP_LIST, used
> for oacc builtins.
> * cgraph.cc (cgraph_add_edge_to_call_site_hash): When hashing
> callback edges, always hash the parent edge.
> (cgraph_node::get_edge): Always return callback parent edge.
> (cgraph_edge::set_call_stmt): Add cascade for callback edges.
> (symbol_table::create_edge): Allow callback edges to share the
> same call statement.
> (cgraph_edge::make_callback): New method, derives a callback
> edge this method is called on.
> (cgraph_edge::get_callback_parent_edge): New method.
> (cgraph_edge::first_callback_target): New method.
> (cg

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-04-27 Thread Josef Melcr

Lambdas have crossed my mind, but I have not yet had the time to look 
thoroughly into their implementation and the issues they face. I do plan 
to look into them once I am done with some incremental improvements for 
the attribute and callback edges, as lambdas seem like a good candidate 
for this sort of thing, given their use case.



Thanks,

Josef Melcr

On 4/27/25 19:36, Andrew Pinski wrote:

On Sun, Apr 27, 2025 at 2:58 AM Josef Melcr  wrote:

This patch enables constant propagation to outlined OpenMP kernels and
improves support for optimizing callback functions in general. It
implements the attribute 'callback' as found in clang, though argument
numbering is a bit different, as described below. The title says OpenMP,
but it can be used for any function which takes a callback argument, such
as pthread functions, qsort and others.

The attribute 'callback' captures the notion of a function calling one
of its arguments with some of its parameters as arguments. An OpenMP
example of such function is GOMP_parallel.
We implement the attribute with new callgraph edges called 'callback'
edges. They are imaginary edges pointing from the caller of the function
with the attribute (e.g. caller of GOMP_parallel) to the body function
itself (e.g. the outlined OpenMP body). They share their call statement
with the edge from which they are derived (direct edge caller -> GOMP_parallel
in this case). These edges allow passes such as ipa-cp to the see the
hidden call site to the body function and optimize the function accordingly.

To illustrate on an example, the body GOMP_parallel looks something
like this:

void GOMP_parallel (void (*fn) (void *), void *data, /* ... */)
{
   /* ... */
   fn (data);
   /* ... */
}


If we extend it with the attribute 'callback(1, 2)', we express that the
function calls its first argument and passes it its second argument.
This is represented in the call graph in this manner:

  direct indirect
caller -> GOMP_parallel ---> fn
   |
   --> fn
   callback

The direct edge is then the parent edge, with all callback edges being
the child edges.
While constant propagation is the main focus of this patch, callback
edges can be useful for different passes (for example, it improves icf
for OpenMP kernels), as they allow for address redirection.
If the outlined body function gets optimized and cloned, from body_fn to
body_fn.optimized, the callback edge allows us to replace the
address in the arguments list:

GOMP_parallel (body_fn, &data_struct, /* ... */);

becomes

GOMP_parallel (body_fn.optimized, &data_struct, /* ... */);

This redirection is possible for any function with the attribute.

This callback attribute implementation is partially compatible with
clang's implementation. Its semantics, arguments and argument indexing style are
the same, but we represent an unknown argument position with 0
(precedent set by attributes such as 'format'), while clang uses -1 or '?'.
We also allow for multiple callback attributes on the same function,
while clang only allows one.

The attribute allows us to propagate constants into body functions of
OpenMP constructs. Currently, GCC won't propagate the value 'c' into the
OpenMP body in the following example:

int a[100];
void test(int c) {
#pragma omp parallel for
   for (int i = 0; i < c; i++) {
 if (!__builtin_constant_p(c)) {
   __builtin_abort();
 }
 a[i] = i;
   }
}
int main() {
   test(100);
   return a[5] - 5;
}

With this patch, the body function will get cloned and the constant 'c'
will get propagated.

Bootstrapped and regtested on x86_64-linux. OK for master?

This seems like it could also improve code dealing with C++ lambdas.
Have you thought of that?

Thanks,
Andrew



Thanks,
Josef Melcr

gcc/ChangeLog:

 * builtin-attrs.def (0): New int list.
 (ATTR_CALLBACK): Callback attribute identifier.
 (DEF_CALLBACK_ATTRIBUTE): Macro for callback attribute creation.
 (GOMP): Attributes for libgomp functions.
 (OACC): Attribute used for oacc functions.
 (ATTR_CALLBACK_GOMP_LIST): ATTR_NOTHROW_LIST but with the
 callback attribute added, used for many libgomp functions.
 (ATTR_CALLBACK_GOMP_TASK_HELPER_LIST): Helper list for the
 construction of ATTR_CALLBACK_GOMP_TASK_LIST.
 (ATTR_CALLBACK_GOMP_TASK_LIST): New attribute list for
 GOMP_task, includes two callback attributes.
 (ATTR_CALLBACK_OACC_LIST): Same as ATTR_CALLBACK_GOMP_LIST, used
 for oacc builtins.
 * cgraph.cc (cgraph_add_edge_to_call_site_hash): When hashing
 callback edges, always hash the parent edge.
 (cgraph_node::get_edge): Always return callback parent edge.
 (cgraph_edge::set_call_stmt): Add cascade for callback edges.
 (symbol_table::create_edge): Allow callback edges to share the
 same call statement.
 (cgraph_ed

Re: [PATCH] cfgexpand: Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-04-27 Thread Iain Sandoe




> On 27 Apr 2025, at 00:06, Andrew Pinski  wrote:
> 
> When we have an empty function, things can go wrong with 
> cfi_startproc/cfi_endproc and a few other
> things like exceptions. So if the only thing the function does is a call to 
> __builtin_unreachable,
> let's expand that to a __builtin_trap instead. For most targets that is one 
> instruction wide so it
> won't hurt things that much and we get correct behavior for exceptions and 
> some linkers will be better
> for it.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

This also works to restore bootstrap for aarch64-darwin and is preferable
to the patch I suggested (since it is narrower in application).  A couple of
typographical nits below …

thanks
Iain

> 
>   PR middle-end/109267
> 
> gcc/ChangeLog:
> 
>   * cfgexpand.cc (expand_gimple_basic_block): If the first non debug 
> statement in the
>   first (and only) basic block is a call to __builtin_unreachable change 
> it to a call
>   to __builtin_trap.

some of these lines look quite long in the patch?

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr109267-1.c: New test.
>   * gcc.dg/pr109267-2.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/cfgexpand.cc  |  8 
> gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> gcc/testsuite/gcc.dg/pr109267-2.c | 13 +
> 3 files changed, 35 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index e84f12a5e93..e14df760b7a 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -6206,6 +6206,14 @@ expand_gimple_basic_block (basic_block bb, bool 
> disable_tail_calls)
>   basic_block new_bb;
> 
>   stmt = gsi_stmt (gsi);
> +
> +  /* If we are expanding the first (and only) bb and the only non debug
> +  statement is __builtin_unreachable call, then replace it with a trap
> +  so the function is at least one instruction in size.  */
> +  if (!nondebug_stmt_seen && bb->index == NUM_FIXED_BLOCKS
> +  && gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE))
^^ whitespace glitch?

> + gimple_call_set_fndecl(stmt, builtin_decl_implicit (BUILT_IN_TRAP));
> +
>   if (!is_gimple_debug (stmt))
>   nondebug_stmt_seen = true;
> 
> diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> b/gcc/testsuite/gcc.dg/pr109267-1.c
> new file mode 100644
> index 000..4f1da8b41e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +/* PR middle-end/109267 */
> +
> +int f(void)
> +{
> +  __builtin_unreachable();
> +}
> +
> +/* This unreachable should expand as trap. */
> +
> +/* { dg-final { scan-rtl-dump-times "__builtin_trap " 1 "expand"} } */
> +/* { dg-final { scan-rtl-dump-times "__builtin_unreachable " 1 "expand"} } */
> diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> b/gcc/testsuite/gcc.dg/pr109267-2.c
> new file mode 100644
> index 000..e6da4860998
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +/* PR middle-end/109267 */
> +void g(void);
> +int f(int *t)
> +{
> +  g();
> +  __builtin_unreachable();
> +}
> +
> +/* This should be expanded to unreachable so it should show up twice. */
> +/* { dg-final { scan-rtl-dump-times "__builtin_unreachable " 2 "expand"} } */
> -- 
> 2.43.0
>

Re: [PATCH] tailc: Improve tail recursion handling [PR119493]

2025-04-27 Thread Richard Biener

On Thu, 24 Apr 2025, Jakub Jelinek wrote:

> On Tue, Apr 01, 2025 at 11:51:49AM +0200, Jakub Jelinek wrote:
> > Here it is, ok if it passes bootstrap/regtest?  I'll queue the interdiff
> > between this patch and the previous one for GCC 16.
> 
> Here is the interdiff to improve the tail recursion handling also for
> non-musttail calls.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2025-04-24  Jakub Jelinek  
> 
>   PR tree-optimization/119493
>   * tree-tailcall.cc (find_tail_calls): Handle non-gimple_reg_type
>   arguments which aren't just passed through for tail recursions
>   even for non-musttail calls.
> 
> --- gcc/tree-tailcall.cc.jj   2025-04-01 16:47:30.373502796 +0200
> +++ gcc/tree-tailcall.cc  2025-04-01 20:08:34.578787921 +0200
> @@ -685,8 +685,7 @@ find_tail_calls (basic_block bb, struct
> ? !is_gimple_reg (param)
> : (!is_gimple_variable (param)
>|| TREE_THIS_VOLATILE (param)
> -  || may_be_aliased (param)
> -  || !gimple_call_must_tail_p (call)))
> +  || may_be_aliased (param)))
>   break;
>   }
>   }
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] gcc: do not apply store motion on loop with no exits.

2025-04-27 Thread Sam James

ywgrit  writes:

> I encountered one problem with loop-im pass.
> I compiled the program dhry2reg which belongs to 
> unixbench(https://github.com/kdlucas/byte-unixbench).
>
> The gcc used
> gcc (GCC) 12.3.0
>
> The commands executed as following
> make
> ./Run -c -i 1 dhry2reg
>
> The results are shown below.
> Dhrystone 2 using register variables  0.1 lps   (10.0 s, 1 
> samples)
>
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> Dhrystone 2 using register variables 116700.0  0.1  0.0
>
> System Benchmarks Index Score (Partial Only)   10.0
>
> Obviously, the "INDEX" is abnormal.
> I wrote a demo named dhry.c based on the dhry2reg logic.

It's best to file a bug report so we can:
a) discuss the validity of the testcase, and
b) reference it in review of the commit & even once it is in

.. but that said, I thought this looked familiar, and I found PR117695
which is marked as INVALID.

> [...]

FWIW, all the analysis should go in the commit message, as well as
including a testcase in the commit itself. The commit message should
also have a ChangeLog. See these links:
* https://gcc.gnu.org/contribute.html
* https://gcc.gnu.org/codingconventions.html

But these are general remarks. I can't approve the patch (that is for
others) but I'm not sure if the testcase is valid anyway.

> [...]

thanks,
sam

[PATCH] Fix name mismatch for fortran.

2025-04-27 Thread liuhongt

From: "hongtao.liu" 

Function name in afdo_string_table is step3d_t_tile.
but DECL_ASSEMBLER_NAME (edge->callee->decl))) gets
__step3d_t_mod_MOD_step3d_t_tile, Looks like the prefix is not in the
debug string table, so let's also check directly for
afdo_string_table->get_index_by_decl (edge->callee->decl).

Tested with autofdo enabled, the issue is fixed by the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR gcov-profile/118508
* auto-profile.cc
(autofdo_source_profile::get_callsite_total_count): Fix name
mismatch for fortran.
---
 gcc/auto-profile.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index aa4d1634f01..2d2d4a428f2 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -837,8 +837,10 @@ autofdo_source_profile::get_callsite_total_count (
 
   function_instance *s = get_function_instance_by_inline_stack (stack);
   if (s == NULL
-  || afdo_string_table->get_index (IDENTIFIER_POINTER (
- DECL_ASSEMBLER_NAME (edge->callee->decl))) != s->name ())
+  || (afdo_string_table->get_index (IDENTIFIER_POINTER (
+   DECL_ASSEMBLER_NAME (edge->callee->decl))) != s->name ()
+ && afdo_string_table->get_index_by_decl (edge->callee->decl)
+ != s->name()))
 return 0;
 
   return s->total_count ();
-- 
2.34.1

[PATCH] [autofdo] Annotate bb with all debug_stmt with location of phi in the single_succ.

2025-04-27 Thread liuhongt

From: "hongtao.liu" 

For BB with all debug_stmt, it will be ignored by afdo_set_bb_count,
but it can be set with count of single successors PHIs which edge from
the BB.(only nonzero count is annotatted).

Tested with -march=x86-64-v3 -O2 autofdo enabled, the issue in the PR
is fixed.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR gcov-profile/118581
* auto-profile.cc (autofdo_source_profile::get_count_info):
Overload the function with parameter gimple location instead
of stmt.
(afdo_set_bb_count): For !has_annotated BB, Check single
successors PHIs corresponding to the block and use those
count.
---
 gcc/auto-profile.cc | 53 ++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 2d2d4a428f2..1fa2946c065 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -303,6 +303,10 @@ public:
  in INFO and return true; otherwise return false.  */
   bool get_count_info (gimple *stmt, count_info *info) const;
 
+  /* Find count_info for a given gimple location GIMPLE_LOC. If found,
+ store the count_info in INFO and return true; otherwise return false.  */
+  bool get_count_info (location_t gimple_loc, count_info *info) const;
+
   /* Find total count of the callee of EDGE.  */
   gcov_type get_callsite_total_count (struct cgraph_edge *edge) const;
 
@@ -724,11 +728,18 @@ autofdo_source_profile::get_function_instance_by_decl 
(tree decl) const
 bool
 autofdo_source_profile::get_count_info (gimple *stmt, count_info *info) const
 {
-  if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus)
+  return get_count_info (gimple_location (stmt), info);
+}
+
+bool
+autofdo_source_profile::get_count_info (location_t gimple_loc,
+   count_info *info) const
+{
+  if (LOCATION_LOCUS (gimple_loc) == cfun->function_end_locus)
 return false;
 
   inline_stack stack;
-  get_inline_stack (gimple_location (stmt), &stack);
+  get_inline_stack (gimple_loc, &stack);
   if (stack.length () == 0)
 return false;
   function_instance *s = get_function_instance_by_inline_stack (stack);
@@ -1132,7 +1143,43 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
&promoted)
 }
 
   if (!has_annotated)
-return false;
+{
+  /* For BB with all debug stmt which assigne a value with constant,
+check successors PHIs corresponding to the block and
+use those counts.  */
+  edge tmp_e;
+  edge_iterator tmp_ei;
+  FOR_EACH_EDGE (tmp_e, tmp_ei, bb->succs)
+   {
+ basic_block bb_succ = tmp_e->dest;
+ for (gphi_iterator gpi = gsi_start_phis (bb_succ);
+  !gsi_end_p (gpi);
+  gsi_next (&gpi))
+   {
+ gphi *phi = gpi.phi ();
+ size_t i;
+ for (i = 0; i < gimple_phi_num_args (phi); i++)
+   {
+ edge e = gimple_phi_arg_edge (phi, i);
+ if (e->src != bb)
+   continue;
+ location_t phi_loc = gimple_phi_arg_location (phi, i);
+ inline_stack stack;
+ count_info info;
+ if (afdo_source_profile->get_count_info (phi_loc, &info)
+ && info.count != 0)
+   {
+ if (info.count > max_count)
+   max_count = info.count;
+ has_annotated = true;
+   }
+   }
+   }
+   }
+
+  if (!has_annotated)
+   return false;
+}
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 afdo_source_profile->mark_annotated (gimple_location (gsi_stmt (gsi)));
-- 
2.34.1

[PATCH] RISC-V: Implment H modifier for printing the next register name

2025-04-27 Thread Jin Ma

For RV32 inline assembly, when handling 64-bit integer data, it is
often necessary to process the lower and upper 32 bits separately.
Unfortunately, we can only output the current register name
(lower 32 bits) but not the next register name (upper 32 bits).

To address this, the modifier 'H' has been added to allow users
to handle the upper 32 bits of the data. While I believe the
modifier 'N' (representing the next register name) might be more
suitable for this functionality, 'N' is already in use.
Therefore, 'H' (representing the high register) was chosen instead.

Co-Authored-By: Dimitar Dimitrov 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add H.
* doc/extend.texi: Document for H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/modifier-H-error-1.c: New test.
* gcc.target/riscv/modifier-H-error-2.c: New test.
* gcc.target/riscv/modifier-H.c: New test.
---
 gcc/config/riscv/riscv.cc | 22 +++
 gcc/doc/extend.texi   |  1 +
 .../gcc.target/riscv/modifier-H-error-1.c | 13 +++
 .../gcc.target/riscv/modifier-H-error-2.c | 11 ++
 gcc/testsuite/gcc.target/riscv/modifier-H.c   | 22 +++
 5 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/modifier-H-error-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/modifier-H-error-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/modifier-H.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bad59e248d0..c5eec7a0136 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6879,6 +6879,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
'T' Print shift-index of inverted single-bit mask OP.
'~' Print w if TARGET_64BIT is true; otherwise not print anything.
'N'  Print register encoding as integer (0-31).
+   'H'  Print the name of the next register for integer.
 
Note please keep this list and the list in riscv.md in sync.  */
 
@@ -7174,6 +7175,27 @@ riscv_print_operand (FILE *file, rtx op, int letter)
asm_fprintf (file, "%u", (regno - offset));
break;
   }
+case 'H':
+  {
+   if (!REG_P (op))
+ {
+   output_operand_lossage ("modifier 'H' require register operand");
+   break;
+ }
+   if (REGNO (op) > 31)
+ {
+   output_operand_lossage ("modifier 'H' is for integer registers 
only");
+   break;
+ }
+   if (REGNO (op) == 31)
+ {
+   output_operand_lossage ("modifier 'H' cannot be applied to R31");
+   break;
+ }
+
+   fputs (reg_names[REGNO (op) + 1], file);
+   break;
+  }
 default:
   switch (code)
{
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0978c4c41b2..212d2487558 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12585,6 +12585,7 @@ The list below describes the supported modifiers and 
their effects for RISC-V.
 @item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an 
immediate with a value of zero.
 @item @code{i} @tab Print the character ''@code{i}'' if the operand is an 
immediate.
 @item @code{N} @tab Print the register encoding as integer (0 - 31).
+@item @code{H} @tab Print the name of the next register for integer.
 @end multitable
 
 @anchor{shOperandmodifiers}
diff --git a/gcc/testsuite/gcc.target/riscv/modifier-H-error-1.c 
b/gcc/testsuite/gcc.target/riscv/modifier-H-error-1.c
new file mode 100644
index 000..43ecff6498e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/modifier-H-error-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -O0" } */
+
+float foo ()
+{
+  float ret;
+  asm ("fld\t%H0,(a0)\n\t":"=f"(ret));
+
+  return ret;
+}
+
+/* { dg-error "modifier 'H' is for integer registers only" "" { target { 
"riscv*-*-*" } } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/modifier-H-error-2.c 
b/gcc/testsuite/gcc.target/riscv/modifier-H-error-2.c
new file mode 100644
index 000..db478b6ddf6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/modifier-H-error-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -O0 " } */
+
+void foo ()
+{
+  register int x31 __asm__ ("x31");
+  asm ("li\t%H0,1\n\t":"=r"(x31));
+}
+
+/* { dg-error "modifier 'H' cannot be applied to R31" "" { target { 
"riscv*-*-*" } } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/modifier-H.c 
b/gcc/testsuite/gcc.target/riscv/modifier-H.c
new file mode 100644
index 000..3571ea966f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/modifier-H.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+/

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

2025-04-27 Thread Xi Ruoyao

On Wed, 2025-04-23 at 12:43 +, Aleksandar Rakic wrote:
> From 16b3207aed5e4846fde4f3ffa1253c65ef6ba056 Mon Sep 17 00:00:00 2001
> From: Aleksandar Rakic 
> Date: Wed, 23 Apr 2025 14:14:17 +0200
> Subject: [PATCH] Make MSA and microMIPS R5 unsupported
> 
> There are no platforms nor simulators for MSA and microMIPS R5 so
> turning off this support for now.
> 
> gcc/ChangeLog:
> 
>   * config/mips/mips.cc (mips_option_override): Error out for
>   -mmicromips -mips32r5 -mmsa.
> 
> Cherry-picked 1009d6ff7a8d3b56e0224a6b193c5a7b3c29aa5f
> from https://github.com/MIPS/gcc
> 
> Signed-off-by: Matthew Fortune 
> Signed-off-by: Faraz Shahbazker 
> Signed-off-by: Aleksandar Rakic 
> ---
>  gcc/config/mips/mips.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 0d3d0263f2d..23205dfb616 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -20414,6 +20414,7 @@ static void
>  mips_option_override (void)
>  {
>    int i, regno, mode;
> +  unsigned int is_micromips;
>  
>    if (OPTION_SET_P (mips_isa_option))
>  mips_isa_option_info = &mips_cpu_info_table[mips_isa_option];
> @@ -20434,6 +20435,7 @@ mips_option_override (void)
>    /* Save the base compression state and process flags as though we
>   were generating uncompressed code.  */
>    mips_base_compression_flags = TARGET_COMPRESSION;
> +  is_micromips = TARGET_MICROMIPS;
>    target_flags &= ~TARGET_COMPRESSION;
>    mips_base_code_readable = mips_code_readable;
>  
> @@ -20678,7 +20680,7 @@ mips_option_override (void)
>     "-mcompact-branches=never");
>  }
>  
> -  if (is_micromips && TARGET_MSA)
> +  if (is_micromips && mips_isa_rev <= 5 && TARGET_MSA)

Why not just "TARGET_MICROMIPS && mips_isa_rev <= 5 && TARGET_MSA"?

>  error ("unsupported combination: %s", "-mmicromips -mmsa");

And should this line be updated too like "-mmicromips -mmsa is only
supported for MIPSr6"?

Unfortunately the original patch is already applied and breaking even a
non-bootstrapping build for MIPS.  Thus a fix is needed ASAP or we'd
revert the original patch.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Add testcase for bogus Warray-bounds warning dealing with __builtin_unreachable [PR100038]

2025-04-27 Thread Andrew Pinski

On Sat, Apr 5, 2025 at 4:56 AM Andrew Pinski  wrote:
>
> After EVRP was switched to the ranger (r12-2305-g398572c1544d8b), we are 
> better handling the case
> where __builtin_unreachable comes after a loop. Instead of removing 
> __builtin_unreachable and having
> the loop become an infinite one; it is kept around longer and allows GCC to 
> unroll the loop 2 times instead
> of 3 times. When GCC unrolled the loop 3 times, GCC would produce a bogus 
> Warray-bounds warning for the 3rd
> iteration.
> This adds the testcase to make sure we don't regress on this case. It is 
> originally extracted from LLVM source
> code too.

Ping? It would be useful to have this testcase so we don't regress the
warning; especially since this is extracted from LLVM (with asserts
turned off which IIRC is the default way of building LLVM these days).

Thanks,
Andrew

>
> PR tree-optimization/100038
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr100038.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr100038.C | 17 +
>  1 file changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr100038.C
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr100038.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr100038.C
> new file mode 100644
> index 000..7024c4db2b2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr100038.C
> @@ -0,0 +1,17 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -Wextra -Wall -Warray-bounds" }
> +
> +struct SparseBitVectorElement {
> +  long Bits[2];
> +  int find_first() const;
> +};
> +
> +// we should not get an `array subscript 2 is above array bounds of`
> +// warning here because we have an unreachable at that point
> +
> +int SparseBitVectorElement::find_first() const {
> +  for (unsigned i = 0; i < 2; ++i)
> +if (Bits[i]) // { dg-bogus "is above array bounds of" }
> +  return i;
> +  __builtin_unreachable();
> +}
> --
> 2.43.0
>

[PATCH] c++: Add attribute handles_virtual_move_assign

2025-04-27 Thread Owen Avery

This patch should make it easier to selectively disable
-Wvirtual-move-assign errors by adding an attribute
for move assignment operators which marks them as handling
duplicate calls.

gcc/cp/ChangeLog:

* method.cc: Include "attribs.h".
(synthesized_method_walk): Avoid outputting
-Wvirtual_move_assign when the base class' move assignment
operator has the handles_virtual_move_assign attribute.
* tree.cc
(handle_handles_virtual_move_assign): Add.
(cxx_gnu_attributes): Add handles_virtual_move_assign to the
attribute list.

gcc/ChangeLog:

* doc/extend.texi (C++-Specific Variable, Function, and Type
Attributes): Document handles_virtual_move_assign.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wvirtual-move-assign-1.C: New test.

Signed-off-by: Owen Avery 
---
 gcc/cp/method.cc  |  5 ++-
 gcc/cp/tree.cc| 28 
 gcc/doc/extend.texi   | 13 
 .../g++.dg/warn/Wvirtual-move-assign-1.C  | 32 +++
 4 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wvirtual-move-assign-1.C

diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 05c19cf0661..898f05c9b7d 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "toplev.h"
 #include "intl.h"
 #include "common/common-target.h"
+#include "attribs.h"
 
 static void do_build_copy_assign (tree);
 static void do_build_copy_constructor (tree);
@@ -2949,7 +2950,9 @@ synthesized_method_walk (tree ctype, 
special_function_kind sfk, bool const_p,
  && BINFO_VIRTUAL_P (base_binfo)
  && fn && TREE_CODE (fn) == FUNCTION_DECL
  && move_fn_p (fn) && !trivial_fn_p (fn)
- && vbase_has_user_provided_move_assign (BINFO_TYPE (base_binfo)))
+ && vbase_has_user_provided_move_assign (BINFO_TYPE (base_binfo))
+ && !lookup_attribute ("handles_virtual_move_assign",
+   DECL_ATTRIBUTES (fn)))
warning (OPT_Wvirtual_move_assign,
 "defaulted move assignment for %qT calls a non-trivial "
 "move assignment operator for virtual base %qT",
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 5863b6878f0..4efd5121319 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -48,6 +48,8 @@ static tree handle_init_priority_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_abi_tag_attribute (tree *, tree, tree, int, bool *);
 static tree handle_contract_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_dangling_attribute (tree *, tree, tree, int, bool *);
+static tree handle_handles_virtual_move_assign (tree *, tree, tree, int,
+   bool *);
 
 /* If REF is an lvalue, returns the kind of lvalue that REF is.
Otherwise, returns clk_none.  */
@@ -5234,6 +5236,8 @@ static const attribute_spec cxx_gnu_attributes[] =
 handle_abi_tag_attribute, NULL },
   { "no_dangling", 0, 1, false, true, false, false,
 handle_no_dangling_attribute, NULL },
+  { "handles_virtual_move_assign", 0, 0, false, false, false, false,
+handle_handles_virtual_move_assign, NULL },
 };
 
 const scoped_attribute_specs cxx_gnu_attribute_table =
@@ -5565,6 +5569,30 @@ handle_no_dangling_attribute (tree *node, tree name, 
tree args, int,
   return NULL_TREE;
 }
 
+/* Handle a "handles_virtual_move_assign" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+tree
+handle_handles_virtual_move_assign (tree *node, tree name, tree /*args*/,
+   int /*flags*/, bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL || DECL_CONSTRUCTOR_P (*node)
+  || !move_fn_p (*node))
+{
+  warning (
+   OPT_Wattributes,
+   "%qE attribute ignored; valid only for move assignment operators",
+   name);
+  *no_add_attrs = true;
+}
+  else
+{
+  *no_add_attrs = false;
+}
+
+  return NULL_TREE;
+}
+
 /* Return a new PTRMEM_CST of the indicated TYPE.  The MEMBER is the
thing pointed to by the constant.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0978c4c41b2..39b3455909d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -30412,6 +30412,19 @@ decltype(auto) foo(T&& t) @{
 @};
 @end smallexample
 
+@cindex @code{handles_virtual_move_assign} function attribute
+@item handles_virtual_move_assign
+
+If a C++ type has a default move assignment operator and virtually
+inherits from a base class with a non-trivial move assignment operator,
+the default move assignment operator may call the non-trivial assigment
+operator multiple times.  This causes gcc to emit a
+@code{virtual-move-assign} warning, even if the non-trivial assignment
+operator is written to handle this.  This attribute can be used on a
+base class' move as

Re: [PATCH] c++: Add attribute handles_virtual_move_assign

2025-04-27 Thread Owen Avery


I'm open to renaming the attribute and/or test file, of course.

Re: [PATCH v2 3/3] xtensa: Make large const_int legitimate during RTL instruction combination pass

2025-04-27 Thread Max Filippov

Hi Suwa-san,

On Thu, Apr 24, 2025 at 12:07 AM Takayuki 'January June' Suwa
 wrote:
>
> Recent gcc versions tend to convert constants for which
> TARGET_LEGITIMATE_CONSTANT_P returns false into references to literal pool
> entries during the RTL instruction combination pass for pattern matching.
>
> For example, the following pattern will currently never match unless either
> TARGET_CONST16 or TARGET_AUTO_LITPOOLS is enabled:
>
>[(set (match_operand:SI 0 "register_operand" "=a")
> (match_operator:SI 2 "boolean_operator"
> [(match_operand:SI 1 "register_operand" "r")
>  (const_int -2147483648)]))]
>
> because INT_MIN will be put into literal pool during the combination.
>
> This patch avoids the above problem in the way described in the title.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_legitimate_constant_p):
> Add a logical OR with !xtensa_split1_finished_p() to the condition
> that returns true to include the RTL instruction combination pass.
> (xtensa_emit_move_sequence): Make its behavior consistent with
> the above change.
> ---
>   gcc/config/xtensa/xtensa.cc | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)

This change results in the following regression in the gcc testsuite:

-PASS: gcc.c-torture/execute/pr68328.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (test for excess errors)
-PASS: gcc.c-torture/execute/pr68328.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
+FAIL: gcc.c-torture/execute/pr68328.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error: in
extract_constrain_insn, at recog.cc:2783)
+FAIL: gcc.c-torture/execute/pr68328.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (test for excess errors)

with the following diagnostics:

/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:
In function 'bar':
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:16:1:
error: unrecognizable insn:
(insn 7 4 8 2 (parallel [
   (asm_operands/v ("") ("") 0 [
   (const_int 1193046 [0x123456])
   (const_int 0 [0])
   ]
[
   (asm_input:SI ("g")
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:13)
   (asm_input:SI ("g")
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:13)
   ]
[]
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:13)
   (clobber (mem:BLK (scratch) [0  A8]))
   ]) 
"/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c":13:3
-1
(nil))
during RTL pass: reload
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.c-torture/execute/pr68328.c:16:1:
internal compiler error: in extract_constrain_insn, at recog.cc:2783
0x1b1de5f internal_error(char const*, ...)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/diagnostic-global-context.cc:517
0x8242ea fancy_abort(char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/diagnostic.cc:1749
0x6c3448 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:108
0x6c3464 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:116
0x6c1ff9 extract_constrain_insn(rtx_insn*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/recog.cc:2783
0xc20fd7 check_rtl
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lra.cc:2202
0xc2517b lra(_IO_FILE*, int)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lra.cc:2636
0xbdbe77 do_reload
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ira.cc:5987
0xbdbe77 execute
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ira.cc:6175

-- 
Thanks.
-- Max

[PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-04-27 Thread LIU Hao


Hello, I'm sending this patch again after GCC 15 has been released.

This patch was sent in February and but there were no comments:
https://patchwork.sourceware.org/project/gcc/patch/eca6660c-6578-4e39-8aa9-be9fdd013...@126.com/



--
Best regards,
LIU Hao


From f6c09e9397d5fe9c0dd1f7a02c90536732aed3df Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Sat, 22 Feb 2025 13:11:51 +0800
Subject: [PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

With `-masm=intel`, GCC generates registers without % prefixes. If a
user-declared symbol happens to match a register, it will confuse the
assembler. User-defined symbols should be quoted, so they are not to
be mistaken for registers or operators.

Support for quoted symbols were added in Binutils 2.26, originally
for ARM assembly, where registers are also unprefixed:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b

This change is required for `@SECREL32` to work in Intel syntax when
targeting Windows, where `@` is allowed as part of a symbol. GNU AS
fails to parse a plain symbol with that suffix:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881#c79

gcc/config/:
PR target/53929
PR target/80881
* gcc/config/i386/i386-protos.h (ix86_asm_output_labelref): Declare new
function for quoting user-defined symbols in Intel syntax.
* gcc/config/i386/i386.cc (ix86_asm_output_labelref): Implement it.
* gcc/config/i386/i386.h (ASM_OUTPUT_LABELREF): Use it.
* gcc/config/i386/cygming.h (ASM_OUTPUT_LABELREF): Use it.
---
 gcc/config/i386/cygming.h |  5 +++--
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.cc   | 13 +
 gcc/config/i386/i386.h|  7 +++
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..4a192900045a 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -247,9 +247,10 @@ do {   
\
 #undef ASM_OUTPUT_LABELREF
 #define  ASM_OUTPUT_LABELREF(STREAM, NAME) \
 do {   \
+  const char* prefix = "";   \
   if ((NAME)[0] != FASTCALL_PREFIX)\
-fputs (user_label_prefix, (STREAM));   \
-  fputs ((NAME), (STREAM));\
+prefix = user_label_prefix;\
+  ix86_asm_output_labelref ((STREAM), prefix, (NAME)); \
 } while (0)

 /* This does much the same in memory rather than to a stream.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bea3fd4b2e2a..3b9e28ced91c 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -198,6 +198,7 @@ extern int ix86_attr_length_vex_default (rtx_insn *, bool, 
bool);
 extern rtx ix86_libcall_value (machine_mode);
 extern bool ix86_function_arg_regno_p (int);
 extern void ix86_asm_output_function_label (FILE *, const char *, tree);
+extern void ix86_asm_output_labelref (FILE *, const char *, const char *);
 extern void ix86_call_abi_override (const_tree);
 extern int ix86_reg_parm_stack_space (const_tree);

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3128973ba79c..d9dea86afa9c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1709,6 +1709,19 @@ ix86_asm_output_function_label (FILE *out_file, const 
char *fname,
 }
 }

+/* Output a user-defined label.  In AT&T syntax, registers are prefixed
+   with %, so labels require no punctuation.  In Intel syntax, registers
+   are unprefixed, so labels may clash with registers or other operators,
+   and require quoting.  */
+void
+ix86_asm_output_labelref (FILE *file, const char *prefix, const char *label)
+{
+  if (ASSEMBLER_DIALECT == ASM_ATT)
+fprintf (file, "%s%s", prefix, label);
+  else
+fprintf (file, "\"%s%s\"", prefix, label);
+}
+
 /* Implementation of call abi switching target hook. Specific to FNDECL
the specific call register sets are set.  See also
ix86_conditional_register_usage for more details.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 40b1aa4e6dfe..79a1afdde02c 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2251,6 +2251,13 @@ extern unsigned int const 
svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
   } while (0)
 #endif

+/* In Intel syntax, we have to quote user-defined labels that would
+   match (unprefixed) registers or operators.  */
+
+#undef ASM_OUTPUT_LABELREF
+#define ASM_OUTPUT_LABELREF(STREAM, NAME)  \
+  ix86_asm_output_labelref ((STREAM), user_label_prefix, (NAME))
+
 /* Under some conditions we need jump tables in the text section,
because the assembler cannot handle label differences between
sections.  */
--
2.48.1



From f6c09e9397d5fe9c0dd1f7a02c90536732aed3df Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Sat, 22 Feb 2025 13:11:51 +0800
Subject:

RE: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-04-27 Thread Li, Pan2

Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Wednesday, April 16, 2025 10:57 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Li, Pan2 

Subject: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore 
volatile define_insn

From: Pan Li 

After we add the frm register to the global_regs, we may not need to
define_insn that volatile to emit the frm restore insns.  The
cooperatively-managed global register will help to handle this, instead
of emit the volatile define_insn explicitly.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_frm_mode_set): Refactor
the frm mode set by removing fsrmsi_restore_volatile.
* config/riscv/vector-iterators.md (unspecv): Remove as
unnecessary.
* config/riscv/vector.md (fsrmsi_restore_volatile): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Adjust
the asm dump check times.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 43 ++-
 gcc/config/riscv/vector-iterators.md  |  4 --
 gcc/config/riscv/vector.md| 13 --
 .../rvv/base/float-point-dynamic-frm-49.c |  2 +-
 .../rvv/base/float-point-dynamic-frm-50.c |  2 +-
 .../rvv/base/float-point-dynamic-frm-52.c |  2 +-
 .../rvv/base/float-point-dynamic-frm-74.c |  2 +-
 .../rvv/base/float-point-dynamic-frm-75.c |  2 +-
 8 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38f3ae7cd84..3878702e3a1 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12047,27 +12047,30 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
   if (prev_mode == riscv_vector::FRM_DYN_CALL)
 emit_insn (gen_frrmsi (backup_reg)); /* Backup frm when DYN_CALL.  */
 
-  if (mode != prev_mode)
-{
-  rtx frm = gen_int_mode (mode, SImode);
-
-  if (mode == riscv_vector::FRM_DYN_CALL
-   && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
-   /* No need to emit when prev mode is DYN already.  */
-   emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
-  else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
-   && prev_mode != riscv_vector::FRM_DYN
-   && prev_mode != riscv_vector::FRM_DYN_CALL)
-   /* No need to emit when prev mode is DYN or DYN_CALL already.  */
-   emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
-  else if (mode == riscv_vector::FRM_DYN
-   && prev_mode != riscv_vector::FRM_DYN_CALL)
-   /* Restore frm value from backup when switch to DYN mode.  */
-   emit_insn (gen_fsrmsi_restore (backup_reg));
-  else if (riscv_static_frm_mode_p (mode))
-   /* Set frm value when switch to static mode.  */
-   emit_insn (gen_fsrmsi_restore (frm));
+  if (mode == prev_mode)
+return;
+
+  if (riscv_static_frm_mode_p (mode))
+{
+  /* Set frm value when switch to static mode.  */
+  emit_insn (gen_fsrmsi_restore (gen_int_mode (mode, SImode)));
+  return;
 }
+
+  bool restore_p
+= /* No need to emit when prev mode is DYN.  */
+  (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_CALL
+   && prev_mode != riscv_vector::FRM_DYN)
+  /* No need to emit if prev mode is DYN or DYN_CALL.  */
+  || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_EXIT
+ && prev_mode != riscv_vector::FRM_DYN
+ && prev_mode != riscv_vector::FRM_DYN_CALL)
+  /* Restore frm value when switch to DYN mode.  */
+  || (mode == riscv_vector::FRM_DYN
+ && prev_mode != riscv_vector::FRM_DYN_CALL);
+
+  if (restore_p)
+emit_insn (gen_fsrmsi_restore (backup_reg));
 }
 
 /* Implement Mode switching.  */
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f8da71b1d65..28f52481952 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -122,10 +122,6 @@ (define_c_enum "unspec" [
   UNSPEC_SF_VFNRCLIPU
 ])
 
-(define_c_enum "unspecv" [
-  UNSPECV_FRM_RESTORE_EXIT
-])
-
 ;; Subset of VI with fractional LMUL types
 (define_mode_iterator VI_FRAC [
   RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_VECTOR_ELEN_64")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 51eb64fb122..9dae11a7849 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1116,19 +1116,6 @@ (define_insn "fsrmsi_restore"
(set_attr "mode" "SI")]
  )
 
-;; The volatile fsrmsi restore is used for the exit point for the
-;; dynamic mode switchin

[PATCH v3] x86: Properly find the maximum stack slot alignment

2025-04-27 Thread H.J. Lu

On Wed, Apr 23, 2025 at 1:56 PM Uros Bizjak  wrote:

> +static void
> +ix86_find_all_reg_uses_1 (HARD_REG_SET ®set,
> +  rtx set, unsigned int regno,
> +  auto_bitmap &worklist)
> +{
> +  rtx dest = SET_DEST (set);
> +
> +  if (!REG_P (dest))
> +return;
> +
> +  /* Reject non-Pmode modes.  */
> +  if (GET_MODE (dest) != Pmode)
> +return;
>
> We can reject non-Pmode modes.
>
> OTOH, if the patch is OK for you, I think it is good to go forward.
>

Here is the v3 patch.  The only change I made is

 rtx set = single_set (insn);
  if (set)
{
  ix86_find_all_reg_uses_1 (regset, set,
ref_regno, worklist);
  continue;   I added.
}

  rtx pat = PATTERN (insn);
  if (GET_CODE (pat) != PARALLEL)
continue;

OK for master?  Thanks.

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

PR target/109780
PR target/109093
* config/i386/i386.cc (stack_access_data): New.
(ix86_update_stack_alignment): Likewise.
(ix86_find_all_reg_use_1): Likewise.
(ix86_find_all_reg_use): Likewise.
(ix86_find_max_used_stack_alignment): Also check memory accesses
from registers defined by stack or frame registers.

gcc/testsuite/

PR target/109780
PR target/109093
* g++.target/i386/pr109780-1.C: New test.
* gcc.target/i386/pr109093-1.c: Likewise.
* gcc.target/i386/pr109780-1.c: Likewise.
* gcc.target/i386/pr109780-2.c: Likewise.
* gcc.target/i386/pr109780-3.c: Likewise.


-- 
H.J.
From 2233834e398711b65c8b8eeefbf6fa830a6c2974 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 14 Mar 2023 11:41:51 -0700
Subject: [PATCH] x86: Properly find the maximum stack slot alignment

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

	PR target/109780
	PR target/109093
	* config/i386/i386.cc (stack_access_data): New.
	(ix86_update_stack_alignment): Likewise.
	(ix86_find_all_reg_use_1): Likewise.
	(ix86_find_all_reg_use): Likewise.
	(ix86_find_max_used_stack_alignment): Also check memory accesses
	from registers defined by stack or frame registers.

gcc/testsuite/

	PR target/109780
	PR target/109093
	* g++.target/i386/pr109780-1.C: New test.
	* gcc.target/i386/pr109093-1.c: Likewise.
	* gcc.target/i386/pr109780-1.c: Likewise.
	* gcc.target/i386/pr109780-2.c: Likewise.
	* gcc.target/i386/pr109780-3.c: Likewise.

Signed-off-by: H.J. Lu 
Co-Authored-By: Uros Bizjak 
---
 gcc/config/i386/i386.cc| 195 ++---
 gcc/testsuite/g++.target/i386/pr109780-1.C |  72 
 gcc/testsuite/gcc.target/i386/pr109093-1.c |  33 
 gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 +++
 gcc/testsuite/gcc.target/i386/pr109780-3.c |  46 +
 6 files changed, 360 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-3.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3171d6e0ad4..dd076242177 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -8473,6 +8473,123 @@ output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
+/* Data passed to ix86_update_stack_alignment.  */
+struct stack_access_data
+{
+  /* The stack access register.  */
+  const_rtx reg;
+  /* Pointer to stack alignment.  */
+  unsigned int *stack_alignment;
+};
+
+/* Update the maximum stack slot alignment from memory alignment in PAT.  */
+
+static void
+ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
+{
+  /* This insn may reference stack slot.  Update the maximum stack slot
+ alignment if the memory is referenced by the stack access register. */
+  stack_access_data *p = (stack_access_data *) data;
+
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, pat, ALL)
+{
+  auto op = *iter;
+  if (MEM_P (op) && reg_mentioned_p (p->reg, op))
+	{
+	  unsigned int alignment = MEM_ALIGN (op);
+
+	  if (alignment > *p->stack_alignment)
+	*p->stack_alignment = alignment;
+	  break;
+	}
+}
+}
+
+/* Helper function for ix86_find_all_reg_uses.  */
+
+static void
+ix86_find_all_reg_uses_1 (HARD_REG_SET ®set,
+			  rtx set, unsigned int regno,
+			  auto_bitmap &worklist)
+{
+  rtx dest = SET_DEST (set);
+
+  if (!REG_P (dest))
+return;
+
+  /* Reject non-Pmode modes.  */
+  if (GET_MODE (dest) != Pmode)
+return;
+
+  unsigned int dst_regno = REGNO (dest);
+
+  if (TEST_HARD_REG_

Re: [PATCH] c++: Add attribute handles_virtual_move_assign

2025-04-27 Thread Sandra Loosemore


On 4/27/25 15:57, Owen Avery wrote:

This patch should make it easier to selectively disable
-Wvirtual-move-assign errors by adding an attribute
for move assignment operators which marks them as handling
duplicate calls.


I'm only qualified to comment on the documentation part of the patch.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0978c4c41b2..39b3455909d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -30412,6 +30412,19 @@ decltype(auto) foo(T&& t) @{
  @};
  @end smallexample
  
+@cindex @code{handles_virtual_move_assign} function attribute

+@item handles_virtual_move_assign
+
+If a C++ type has a default move assignment operator and virtually
+inherits from a base class with a non-trivial move assignment operator,
+the default move assignment operator may call the non-trivial assigment


s/assigment/assignment/


+operator multiple times.  This causes gcc to emit a


s/gcc/GCC/


+@code{virtual-move-assign} warning, even if the non-trivial assignment


I don't think the manual names warnings like that or refers to them with 
that kind of markup.  I'd just say "emit a warning" here.  Instead, you 
need to mention and cross-reference @option{-Wvirtual-move-assign} (or 
its negative form), and make corresponding changes to the docs for that 
option to link here.



+operator is written to handle this.  This attribute can be used on a
+base class' move assignment operator declaration to indicate that it


s/class'/class's/


+can handle the described situation, and that gcc should avoid emitting


s/gcc/GCC/


+a warning.
+
  @cindex @code{warn_unused} type attribute
  @item warn_unused
  


-Sandra

Re: [PATCH] c-family: Improve location for -Wunknown-pragmas in a _Pragma [PR118838]

2025-04-27 Thread Lewis Hyatt

On Mon, Apr 07, 2025 at 01:58:08PM -0400, Marek Polacek wrote:
> On Wed, Feb 12, 2025 at 08:27:37PM -0500, Lewis Hyatt wrote:
> > Hello-
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118838
> > 
> > This patch addresses the issue mentioned in the PR (another instance of
> > _Pragma string location issues). bootstrap + regtest all languages on
> > aarch64 looks good. Is it OK please for now or for stage 1?  Note, it is not
> > a regression, since this never worked in C or C++ frontends; but on the
> > other hand, r15-4505 for GCC 15 fixed some related issues, so it could be
> > nice if this one gets in along with it. Thanks!
> > 
> > -Lewis
> > 
> > -- >8 --
> > 
> > The warning for -Wunknown-pragmas is issued at the location provided by
> > libcpp to the def_pragma() callback. This location is
> > cpp_reader::directive_line, which is a location for the start of the line
> > only; it is also not a valid location in case the unknown pragma was lexed
> > from a _Pragma string. These factors make it impossible to suppress
> > -Wunknown-pragmas via _Pragma("GCC diagnostic...") directives on the same
> > source line, as in the PR and the test case. Address that by issuing the
> > warning at a better location returned by cpp_get_diagnostic_override_loc().
> > libcpp already maintains this location to handle _Pragma-related diagnostics
> > internally; it was needed also to make a publicly accessible version of it.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > PR c/118838
> > * c-lex.cc (cb_def_pragma): Call cpp_get_diagnostic_override_loc()
> > to get a valid location at which to issue -Wunknown-pragmas, in case
> > it was triggered from a _Pragma.
> > 
> > libcpp/ChangeLog:
> > 
> > PR c/118838
> > * errors.cc (cpp_get_diagnostic_override_loc): New function.
> > * include/cpplib.h (cpp_get_diagnostic_override_loc): Declare.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c/118838
> > * c-c++-common/cpp/pragma-diagnostic-loc-2.c: New test.
> > * g++.dg/gomp/macro-4.C: Adjust expected output.
> > * gcc.dg/gomp/macro-4.c: Likewise.
> > * gcc.dg/cpp/Wunknown-pragmas-1.c: Likewise.
> > ---
> >  libcpp/errors.cc  | 10 +
> >  libcpp/include/cpplib.h   |  5 +
> >  gcc/c-family/c-lex.cc |  7 +-
> >  .../cpp/pragma-diagnostic-loc-2.c | 15 +
> >  gcc/testsuite/g++.dg/gomp/macro-4.C   |  8 +++
> >  gcc/testsuite/gcc.dg/cpp/Wunknown-pragmas-1.c | 22 +++
> >  gcc/testsuite/gcc.dg/gomp/macro-4.c   |  8 +++
> >  7 files changed, 57 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc-2.c
> > 
> > diff --git a/libcpp/errors.cc b/libcpp/errors.cc
> > index 9621c4b66ea..d9efb6acd30 100644
> > --- a/libcpp/errors.cc
> > +++ b/libcpp/errors.cc
> > @@ -52,6 +52,16 @@ cpp_diagnostic_get_current_location (cpp_reader *pfile)
> >  }
> >  }
> >  
> > +/* Sometimes a diagnostic needs to be generated before libcpp has been able
> > +   to generate a valid location for the current token; in that case, the
> > +   non-zero location returned by this function is the preferred one to 
> > use.  */
> > +
> > +location_t
> > +cpp_get_diagnostic_override_loc (const cpp_reader *pfile)
> > +{
> > +  return pfile->diagnostic_override_loc;
> > +}
> > +
> >  /* Print a diagnostic at the given location.  */
> >  
> >  ATTRIBUTE_CPP_PPDIAG (5, 0)
> > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> > index 90aa3160ebf..04d4621da3c 100644
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -1168,6 +1168,11 @@ extern const char *cpp_probe_header_unit (cpp_reader 
> > *, const char *file,
> >  extern const char *cpp_get_narrow_charset_name (cpp_reader *) 
> > ATTRIBUTE_PURE;
> >  extern const char *cpp_get_wide_charset_name (cpp_reader *) ATTRIBUTE_PURE;
> >  
> > +/* Sometimes a diagnostic needs to be generated before libcpp has been able
> > +   to generate a valid location for the current token; in that case, the
> > +   non-zero location returned by this function is the preferred one to 
> > use.  */
> 
> I don't love duplicating the comment like this, it's going to get out of sync.
> 
> > +extern location_t cpp_get_diagnostic_override_loc (const cpp_reader *);
> > +
> >  /* This function reads the file, but does not start preprocessing.  It
> > returns the name of the original file; this is the same as the
> > input file, except for preprocessed input.  This will generate at
> > diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> > index e450c9a57f0..df84020de62 100644
> > --- a/gcc/c-family/c-lex.cc
> > +++ b/gcc/c-family/c-lex.cc
> > @@ -248,7 +248,12 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
> >  {
> >const unsigned char *space, *name;
> >const cpp_token *s;
> > -  location_t fe_loc = loc;
> > +
> > +

Re: [PATCH] Fix size_t in id-15.c and infoleak-net-ethtool-ioctl.c for llp64

2025-04-27 Thread Jonathan Yong


On 4/24/25 7:49 AM, Jonathan Yong wrote:

Attached patch OK for master branch?
Will push soon if there are no objections.

gcc/testsuite/ChangeLog:

     * gcc.dg/graphite/id-15.c: Use __SIZE_TYPE__ instead of
   unsigned long.
     * gcc.dg/plugin/infoleak-net-ethtool-ioctl.c: ditto.


Pushed to master branch.

Re: [PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

2025-04-27 Thread Jonathan Yong


On 4/27/25 2:49 PM, Eric Botcazou wrote:

For Windows x86-32 targets, the Microsoft ABI only guarantees that the
stack is aligned to 4-byte boundaries. GCC knows about the default
alignment of the stack. However, before this commit, it did not realign the
stack unless SSE was also enabled.

When a stricter (larger) alignment is requested, it's always necessary to
realign the stack, as what Solaris does.


Yes, or else if you configure the compiler --with-fpmath=sse (which is IMO the
right thing to do for native 32-bit x86 platforms nowadays).


PR target/07
* config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h.


FWIW looks good to me.



Thanks, pushed to master branch.

[PATCH] RISC-V: Fix register move cost for SIBCALL_REGS/JALR_REGS

2025-04-27 Thread 曾治金

Hi, according to Jeff's requirement 
(https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681864.html), I divide 
the change of riscv_register_move_cost into separate patch. Please help to 
review. Thanks.


Zhijin


From b4c581393e864619192034bd8000c7e89443c19a Mon Sep 17 00:00:00 2001
From: Zhijin Zeng

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

Re: [PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

[PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

RE: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

[PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

Re: [PATCH] cfgexpand: Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

Re: [PATCH] tailc: Improve tail recursion handling [PR119493]

Re: [PATCH v2] gcc: do not apply store motion on loop with no exits.

[PATCH] Fix name mismatch for fortran.

[PATCH] [autofdo] Annotate bb with all debug_stmt with location of phi in the single_succ.

[PATCH] RISC-V: Implment H modifier for printing the next register name

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

Re: [PATCH] Add testcase for bogus Warray-bounds warning dealing with __builtin_unreachable [PR100038]

[PATCH] c++: Add attribute handles_virtual_move_assign

Re: [PATCH] c++: Add attribute handles_virtual_move_assign

Re: [PATCH v2 3/3] xtensa: Make large const_int legitimate during RTL instruction combination pass

[PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

RE: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

[PATCH v3] x86: Properly find the maximum stack slot alignment

Re: [PATCH] c++: Add attribute handles_virtual_move_assign

Re: [PATCH] c-family: Improve location for -Wunknown-pragmas in a _Pragma [PR118838]

Re: [PATCH] Fix size_t in id-15.c and infoleak-net-ethtool-ioctl.c for llp64

Re: [PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

[PATCH] RISC-V: Fix register move cost for SIBCALL_REGS/JALR_REGS

26 matches

Site Navigation

Mail list logo

Footer information