Re: [patch, fortran] Add modular exponentiation for unsigned

2025-02-03 Thread Thomas Koenig

Hello world,

with the following patch to the failing test case

diff --git a/gcc/testsuite/gfortran.dg/unsigned_15.f90 
b/gcc/testsuite/gfortran.dg/unsigned_15.f90

index da4ccd2dc17..80a7a54e380 100644
--- a/gcc/testsuite/gfortran.dg/unsigned_15.f90
+++ b/gcc/testsuite/gfortran.dg/unsigned_15.f90
@@ -6,8 +6,8 @@ program main
   unsigned :: u
   print *,1 + 2u   ! { dg-error "Operands of binary numeric operator" }
   print *,2u + 1   ! { dg-error "Operands of binary numeric operator" }
-  print *,2u ** 1  ! { dg-error "Exponentiation not valid" }
-  print *,2u ** 1u ! { dg-error "Exponentiation not valid" }
+  print *,2u ** 1  ! { dg-error "Operands of binary numeric operator" }
+  print *,2u ** 1u
   print *,1u < 2   ! { dg-error "Inconsistent types" }
   print *,int(1u) < 2
 end program main

the patch posted to

https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html

and

https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html

passes (I don't want to re-submit the whole thing).

OK for trunk?

Best regards

Thomas



[PATCH] IBM zSystems: Do not use @PLT with larl

2025-02-03 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?


Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in
64-bit mode") made GCC call both static and non-static functions and
load both static and non-static function addresses with the @PLT
suffix.  This made it difficult for linkers to distinguish calling and
address taking instructions [1].  It is currently assumed that the
R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used
only for calling, and the R_390_PC32DBL relocation, corresponding to
the empty suffix, is used only for address taking.

Linkers needs to make this distinction in order to decide whether to
ask ld.so to use canonical PLT entries.  Normally GOT entries in shared
objects contain addresses of the respective functions, with one notable
exception: when a no-pie executable calls the respective function and
also takes its address.  Such executables assume that all addresses are
known in advance, so they use addresses of the respective PLT entries.
For consistency reasons, all respective GOT entries in the process must
also use them.

When a linker sees that a no-pie executable both calls a function and
also takes its address, it creates a PLT entry and asks ld.so to
consider it canonical by setting the respective undefined symbol's
address, which is normally 0, to the address of this PLT entry.

Improve the situation by not using @PLT with larl.

Now that @PLT is not used with larl, also drop the 31-bit handling,
which was required because 31-bit PLT entries require %r12 to point to
the respective object's GOT, and this requirement is not satisfied when
calling them by pointer from another object.

Also drop the weak symbol handling, which was required because it is
not possible to load an undefined weak symbol address (0) using larl.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29655

gcc/ChangeLog:

* config/s390/s390.cc (print_operand): Remove the no longer
necessary 31-bit and weak symbol handling.
* config/s390/s390.md (*movdi_64): Do not use @PLT with larl.
(*movsi_larl): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/call-z10-pic-nodatarel.c: Adjust
expectations.
* gcc.target/s390/call-z10-pic.c: Likewise.
* gcc.target/s390/call-z10.c: Likewise.
* gcc.target/s390/call-z9-pic-nodatarel.c: Likewise.
* gcc.target/s390/call-z9-pic.c: Likewise.
* gcc.target/s390/call-z9.c: Likewise.
---
 gcc/config/s390/s390.cc  | 16 +++-
 gcc/config/s390/s390.md  |  8 
 .../gcc.target/s390/call-z10-pic-nodatarel.c |  6 ++
 gcc/testsuite/gcc.target/s390/call-z10-pic.c |  6 ++
 gcc/testsuite/gcc.target/s390/call-z10.c | 14 +-
 .../gcc.target/s390/call-z9-pic-nodatarel.c  |  6 ++
 gcc/testsuite/gcc.target/s390/call-z9-pic.c  |  6 ++
 gcc/testsuite/gcc.target/s390/call-z9.c  | 14 +-
 8 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 86a5f059b85..1d96df49fea 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8585,7 +8585,7 @@ print_operand_address (FILE *file, rtx addr)
 'E': print opcode suffix for branch on index instruction.
 'G': print the size of the operand in bytes.
 'J': print tls_load/tls_gdcall/tls_ldcall suffix
-'K': print @PLT suffix for call targets and load address values.
+'K': print @PLT suffix for branch targets; do not use with larl.
 'M': print the second word of a TImode operand.
 'N': print the second word of a DImode operand.
 'O': print only the displacement of a memory reference or address.
@@ -8854,19 +8854,9 @@ print_operand (FILE *file, rtx x, int code)
 call even static functions via PLT.  ld will optimize @PLT away for
 normal code, and keep it for patches.
 
-Do not indiscriminately add @PLT in 31-bit mode due to the %r12
-restriction, use UNSPEC_PLT31 instead.
-
 @PLT only makes sense for functions, data is taken care of by
--mno-pic-data-is-text-relative.
-
-Adding @PLT interferes with handling of weak symbols in non-PIC code,
-since their addresses are loaded with larl, which then always produces
-a non-NULL result, so skip them here as well.  */
-  if (TARGET_64BIT
- && GET_CODE (x) == SYMBOL_REF
- && SYMBOL_REF_FUNCTION_P (x)
- && !(SYMBOL_REF_WEAK (x) && !flag_pic))
+-mno-pic-data-is-text-relative.  */
+  if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_FUNCTION_P (x))
fprintf (file, "@PLT");
   return;
 }
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index c164ea72c78..9d495803387 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2001,7 +2

Re: [PATCH v2] c++: Properly detect calls to digest_init in build_vec_init [PR114619]

2025-02-03 Thread Jason Merrill

On 2/3/25 8:29 AM, Simon Martin wrote:

Hi Jason,

On 16 Jan 2025, at 23:28, Jason Merrill wrote:


On 10/19/24 5:09 AM, Simon Martin wrote:

We currently ICE in checking mode with cxx_dialect < 17 on the
following
valid code

=== cut here ===
struct X {
X(const X&) {}
};
extern X x;
void foo () {
new X[1]{x};
}
=== cut here ===

The problem is that cp_gimplify_expr gcc_checking_asserts that a
TARGET_EXPR is not TARGET_EXPR_ELIDING_P (or cannot be elided), while
in
this case with cxx_dialect < 17, it is TARGET_EXPR_ELIDING_P but we
have
not even tried to elide.

This patch relaxes that gcc_checking_assert to not fail when using
cxx_dialect < 17 and -fno-elide-constructors (I considered being more



clever at setting TARGET_EXPR_ELIDING_P appropriately but it looks



more
risky and not worth the extra complexity for a checking assert).


The problem is that in that case we end up with two copy constructor
calls instead of one: one built in massage_init_elt, and the other in
expand_default_init.  The result of the first copy is marked
TARGET_EXPR_ELIDING_P, so when we try to pass it to the second copy we
hit the assert.  I think the assert is catching a real bug: even with
-fno-elide-constructors we should only copy once, not twice.

That’s right, thanks for pointing me in the right direction.


This seems to be because 'digested' has the wrong value in
build_vec_init; we did just call digest_init in build_new_1, but
build_vec_init doesn't understand that.

The test to determine whether digest_init has been called is indeed
incorrect, in that it will work if BASE is a reference to the array but
not if it’s a pointer to its first element. The attached updated patch
fixes this.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?


OK.

Jason



Re: [PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

2025-02-03 Thread Jason Merrill

On 2/3/25 7:38 AM, Jakub Jelinek wrote:

On Mon, Feb 03, 2025 at 11:33:38AM +0100, Richard Biener wrote:

The first argument is supposed to be a type, not a decl.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK?

PR c++/79786
gcc/cp/
* rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.


LGTM.


OK.


--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl)
/* Avoid targets optionally bumping up the alignment to improve
 vector instruction accesses, tinfo are never accessed this way.  */
  #ifdef DATA_ABI_ALIGNMENT
-  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE 
(decl;
+  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl),
+   TYPE_ALIGN (TREE_TYPE (decl;
DECL_USER_ALIGN (decl) = true;
  #endif
return true;
--
2.43.0


Jakub





RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-03 Thread Tamar Christina
Looks like a last minute change I made accidentally blocked SVE. Fixed and 
re-sending:

Hi All,

This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.

This patch does add two new restrictions:

1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
   group sizes, as they are unaligned every n % 2 iterations and so may cross
   a page unwittingly.

2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
   we cannot peel for alignment, as the alignment requirement is quite large at
   GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
   don't support it for now.

There are other steps documented inside the code itself so that the reasoning
is next to the code.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks
like LOAD_LANES is often misaligned. I need to debug those a bit more to see if
it's the patch or backend.

For now I think the patch itself is fine.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
vectorizable_load.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_peeling_alignment): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
* g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.

-- inline copy of patch --

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in 
a basic block.
 Work bound when discovering transitive relations from existing relations.
 
 @item min-pagesize
-Minimum page size for warning purposes.
+Minimum page size for warning and early break vectorization purposes.
 
 @item openacc-kernels
 Specify mode of OpenACC `kernels' constructs handling.
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc 
b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
new file mode 100644
index 
..5e50e56ad17515e278c05c92263af120c3ab2c21
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+struct ts1 {
+  int spans[6][2];
+};
+struct gg {
+  int t[6];
+};
+ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
+  ts1 ret;
+  for (size_t i = 0; i != t; i++) {
+if (!(i < t)) __builtin_abort();
+ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
+ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
+  }
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 
9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554
 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -55,7 +55,9 @@ int main()
  }
 }
   rephase ();
+#pragma GCC novector
   for (i = 0; i < 32; ++i)
+#pragma GCC novector
 for

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Peter Bergner
On 2/3/25 7:14 AM, Jeff Law wrote:
> On 2/3/25 2:31 AM, H.J. Lu wrote:
>> I believe the original patch should be reverted.  Then my patch isn't needed.
>
> That patch had significant improvements across the board for RISC-V.
> I wouldn't want to see it reverted without a strong explanation of why it was 
> wrong.

In my opinion, the patch is not wrong, but rather has exposed latent issues
that need to be worked on and fixed.  Ive asked Surya to continue working on
the fallout (see her other patches), but help from others is always appreciated.

Peter




Re: [patch, fortran] Add modular exponentiation for unsigned

2025-02-03 Thread Jerry D

On 2/3/25 11:55 AM, Thomas Koenig wrote:

Hello world,

with the following patch to the failing test case

diff --git a/gcc/testsuite/gfortran.dg/unsigned_15.f90 b/gcc/testsuite/ 
gfortran.dg/unsigned_15.f90

index da4ccd2dc17..80a7a54e380 100644
--- a/gcc/testsuite/gfortran.dg/unsigned_15.f90
+++ b/gcc/testsuite/gfortran.dg/unsigned_15.f90
@@ -6,8 +6,8 @@ program main
    unsigned :: u
    print *,1 + 2u   ! { dg-error "Operands of binary numeric operator" }
    print *,2u + 1   ! { dg-error "Operands of binary numeric operator" }
-  print *,2u ** 1  ! { dg-error "Exponentiation not valid" }
-  print *,2u ** 1u ! { dg-error "Exponentiation not valid" }
+  print *,2u ** 1  ! { dg-error "Operands of binary numeric operator" }
+  print *,2u ** 1u
    print *,1u < 2   ! { dg-error "Inconsistent types" }
    print *,int(1u) < 2
  end program main

the patch posted to

https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html

and

https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html

passes (I don't want to re-submit the whole thing).

OK for trunk?

Best regards

 Thomas



Yes, please proceed.

Jerry


Re: [PATCH v3] c++, coroutines: Fix awaiter var creation [PR116506].

2025-02-03 Thread Jason Merrill

On 12/9/24 7:53 PM, Jason Merrill wrote:

On 12/9/24 1:52 PM, Iain Sandoe wrote:




On 9 Dec 2024, at 17:41, Jason Merrill  wrote:

On 10/31/24 4:40 AM, Iain Sandoe wrote:

This version tested on x86_64-darwin,linux, powerpc64-linux, on folly
and by Sam on wider codebases,
Why don't you need a variable to preserve o across suspensions if 
it's a call returning lvalue reference?
We always need a space for the awaiter, unless it is already a 
variable/parameter (or part of one).
I suspect that the simple case is not lvalue_p, but ! 
TREE_SIDE_EFFECTS.
That is likely where I’m going wrong - we must not generate a 
variable for any case that already has one (or a parm), but we 
must for any case that is a temporary.

So, I should adjust the logic to use !TREE_SIDE_EFFECTS.
Or perhaps DECL_P.  The difference would be for compound lvalues 
like *p or a[n]; if the value of p or a or n could change across 
suspension, the same side-effect-free lvalue expression could refer 
to a different object.
Right, part of the code that was elided catered for the compound 
values by
making a reference to the original entity and placing that in the 
frame. We

restore that behaviour here.
Note that there is no point in making a reference to an xvalue (we'd 
only
have to save the expiring value in the frame anyway), so we just go 
ahead

and build that var directly.


Hmm, is there a defect report about this?


I don’t believe there’s any defect here.


My reading of https://eel.is/c++draft/expr#await-3 is that our 
current behavior in these testcases conforms to the WP: we evaluate 
o, do temporary materialization, then treat the result as an lvalue.  
Nothing that I can see specifies making a copy of an xvalue; it reads 
to me more like initializing a && variable, i.e.


Awaiter&& e = p.await_transform(Awaiter{}); // dangling reference

I'm only finding 2472, which doesn't cover this case.


This comment specifically relates to the final sentence of https:// 
eel.is/c++draft/expr#await-3.3.


IFF, as per that, we materialise a temporary for the awaiter, we know 
(in advance) that its lifetime must persist across the suspension, 
therefore it will be “promoted” to a frame entry.  We could make a 
reference to it (but that would become a frame entry reference to 
another frame entry which is a waste).  Therefore, we make it into a 
frame candidate right away.


Perhaps I’m still missing something...


3.3 materialization applies if o is a prvalue, but it's an xvalue, so it 
doesn't apply.  Temporary materialization for Awaiter{} happened 
earlier, for passing it to await_transform.  And I'd think 
flatten_await_stmt should handle preserving the temporary.


But reading more closely I see that you aren't actually making a copy of 
the object in this case, because of what you do with INDIRECT_REF_P; 
here o_type is Awaiter&&, so you do create a frame variable like my 
declaration above.


But I think messing with INDIRECT_REF_P is unnecessary; we should deal 
with glvalues the same regardless of whether they are directly 
REFERENCE_REF_P.


We certainly want to exclude the case in this testcase from the "use the 
existing entity" handling, but the lvalue_p handling in the "we need a 
var" case should also be fine for xvalues.


It seemed like this was stalled, so I went ahead and made the changes 
myself.  Applying this:


From 4c743798b1d4530b327dad7c606c610f3811fdbf Mon Sep 17 00:00:00 2001
From: Iain Sandoe 
Date: Thu, 31 Oct 2024 08:40:08 +
Subject: [PATCH] c++/coroutines: Fix awaiter var creation [PR116506]
To: gcc-patches@gcc.gnu.org

Awaiters always need to have a coroutine state frame copy since
they persist across potential supensions.  It simplifies the later
analysis considerably to assign these early which we do when
building co_await expressions.

The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
processing used to cater for cases where the var created from an
xvalue, or is a pointer/reference type.

Corrected thus.

	PR c++/116506
	PR c++/116880

gcc/cp/ChangeLog:

	* coroutines.cc (build_co_await): Ensure that xvalues are
	materialised.  Handle references/pointer values in awaiter
	access expressions.
	(is_stable_lvalue): New.
	* decl.cc (cxx_maybe_build_cleanup): Handle null arg.

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/pr116506.C: New test.
	* g++.dg/coroutines/pr116880.C: New test.

Signed-off-by: Iain Sandoe 
Co-authored-by: Jason Merrill 
---
 gcc/cp/coroutines.cc   | 59 ++
 gcc/cp/decl.cc |  2 +-
 gcc/testsuite/g++.dg/coroutines/pr116506.C | 53 +++
 gcc/testsuite/g++.dg/coroutines/pr116880.C | 36 +
 4 files changed, 139 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116506.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116880.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 1dee3d25b9b..d3c7ff3bd72 100644
--- a/gcc/cp/corou

[pushed] c++: coroutines and range for [PR118491]

2025-02-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The implementation of extended range-for temporaries in r15-3840 confused
coroutines, because await_statement_walker and the like get confused by the
EXPR_STMT into thinking that the whole for-loop is a single expression
statement and try to process it accordingly.  Fixing this seems to be a
simple matter of dropping the EXPR_STMT.

PR c++/116914
PR c++/117231
PR c++/118470
PR c++/118491

gcc/cp/ChangeLog:

* semantics.cc (finish_for_stmt): Don't wrap the result of
pop_stmt_list in EXPR_STMT.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-range-for1.C: New test.
---
 gcc/cp/semantics.cc   |  1 -
 .../g++.dg/coroutines/coro-range-for1.C   | 38 +++
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/coro-range-for1.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index ad9864c3a91..73b49174de4 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1709,7 +1709,6 @@ finish_for_stmt (tree for_stmt)
 {
   tree stmt = pop_stmt_list (FOR_INIT_STMT (for_stmt));
   FOR_INIT_STMT (for_stmt) = NULL_TREE;
-  stmt = build_stmt (EXPR_LOCATION (for_stmt), EXPR_STMT, stmt);
   stmt = maybe_cleanup_point_expr_void (stmt);
   add_stmt (stmt);
 }
diff --git a/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C 
b/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C
new file mode 100644
index 000..eaf4d19e62c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C
@@ -0,0 +1,38 @@
+// PR c++/118491
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {}; }
+std::suspend_always initial_suspend() { return {}; }
+std::suspend_always final_suspend() noexcept { return {}; }
+std::suspend_always yield_value(double value) { return {}; }
+void unhandled_exception() { throw; }
+  };
+};
+
+task do_task() {
+  const int arr[]{1, 2, 3};
+
+  // No ICE if classic loop and not range-based one.
+  // for (auto i = 0; i < 10; ++i) {
+
+  // No ICE if these are moved out of the loop.
+  // auto x = std::suspend_always{};
+  // co_await x;
+
+  for (auto _ : arr) {
+auto bar = std::suspend_always{};
+co_await bar;
+
+// Alternatively:
+// auto bar = 42.;
+// co_yield bar;
+
+// No ICE if r-values:
+// co_await std::suspend_always{};
+// co_yield 42.;
+  }
+}

base-commit: f3a41e6cb5d70f0c94cc8273a118b8542fb5c2fa
-- 
2.48.0



[PATCH] c++: ICE on invalid 'tor with =default [PR118304]

2025-02-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this PR we crash in maybe_delete_defaulted_fn because the switch
doesn't expect a cfk_constructor/_destructor.  But we can get there:

  struct A {
*A() = default;
  };

is invalid due to the void/void* mismatch, so we get to m_d_d_fn:

  if (!same_type_p (TREE_TYPE (TREE_TYPE (fn)),
TREE_TYPE (TREE_TYPE (implicit_fn)))
  maybe_delete_defaulted_fn (fn, implicit_fn);

Currently, we give no error (subject to c++/118306), but even if we
did, we should probably return early in maybe_delete_defaulted_fn.

PR c++/118304

gcc/cp/ChangeLog:

* method.cc (maybe_delete_defaulted_fn): Return early for cdtors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted70.C: New test.
---
 gcc/cp/method.cc | 10 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted70.C |  9 +
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted70.C

diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 3914bbb1ef2..99e247125c3 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -3531,10 +3531,18 @@ maybe_delete_defaulted_fn (tree fn, tree implicit_fn)
   if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn))
 return;
 
+  const special_function_kind kind = special_function_p (fn);
+  if (kind == sfk_constructor || kind == sfk_destructor)
+{
+  /* FIXME: This is ill-formed, and we should have given an error.
+But this is only going to be fixed in GCC 16 via c++/118306.  */
+  gcc_assert (true || seen_error ());
+  return;
+}
+
   DECL_DELETED_FN (fn) = true;
 
   auto_diagnostic_group d;
-  const special_function_kind kind = special_function_p (fn);
   tree parmtype
 = TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn)
  ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn)))
diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted70.C 
b/gcc/testsuite/g++.dg/cpp0x/defaulted70.C
new file mode 100644
index 000..e269d9bc6a5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/defaulted70.C
@@ -0,0 +1,9 @@
+// PR c++/118304
+// { dg-do compile { target c++11 } }
+
+struct A {
+  *A() = default; // { dg-error "invalid" "PR118306" { xfail *-*-* } }
+ *~A() = default; // { dg-error "invalid" "PR118306" { xfail *-*-* } }
+};
+
+A a;

base-commit: 214224c4973bfb76f73a7efff29c5823eef31194
-- 
2.48.1



Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]

2025-02-03 Thread Vincent Vanlaer

Hi all,

Gentle ping for the patch below: 
https://gcc.gnu.org/pipermail/fortran/2024-December/061467.html


Best wishes,
Vincent

On 30/12/2024 00:19, Vincent Vanlaer wrote:

The -MT and -MQ options should replace the default target in the
generated dependency file. deps_add_target needs to be called before
cpp_read_main_file, otherwise the original object name is added.

gcc/fortran/
PR fortran/47485
* cpp.cc: fix -MT/-MQ adding additional target instead of
  replacing the default

gcc/testsuite/
PR fortran/47485
* gfortran.dg/dependency_generation_1.f90: New test

Signed-off-by: Vincent Vanlaer 
---
  gcc/fortran/cpp.cc | 18 --
  .../gfortran.dg/dependency_generation_1.f90| 15 +++
  2 files changed, 27 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/dependency_generation_1.f90

diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
index 7c5f00cfd69..3b93d17b90f 100644
--- a/gcc/fortran/cpp.cc
+++ b/gcc/fortran/cpp.cc
@@ -96,6 +96,8 @@ struct gfc_cpp_option_data
int deps_skip_system; /* -MM */
const char *deps_filename;/* -M[M]D */
const char *deps_filename_user;   /* -MF  */
+  const char *deps_target_filename; /* -MT / -MQ  */
+  bool quote_deps_target_filename;  /* -MQ */
int deps_missing_are_generated;   /* -MG */
int deps_phony;   /* -MP */
int warn_date_time;   /* -Wdate-time */
@@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int decoded_options_count,
gfc_cpp_option.deps_missing_are_generated = 0;
gfc_cpp_option.deps_filename = NULL;
gfc_cpp_option.deps_filename_user = NULL;
+  gfc_cpp_option.deps_target_filename = NULL;
+  gfc_cpp_option.quote_deps_target_filename = false;
  
gfc_cpp_option.multilib = NULL;

gfc_cpp_option.prefix = NULL;
@@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char *arg, int 
value ATTRIBUTE_UNUSED
  
  case OPT_MQ:

  case OPT_MT:
-  gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code = 
code;
-  gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg = arg;
-  gfc_cpp_option.deferred_opt_count++;
+  gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ);
+  gfc_cpp_option.deps_target_filename = arg;
break;
  
  case OPT_P:

@@ -593,6 +596,12 @@ gfc_cpp_init_0 (void)
  }
  
gcc_assert(cpp_in);

+
+  if (gfc_cpp_option.deps_target_filename)
+if (mkdeps *deps = cpp_get_deps (cpp_in))
+  deps_add_target (deps, gfc_cpp_option.deps_target_filename,
+  gfc_cpp_option.quote_deps_target_filename);
+
if (!cpp_read_main_file (cpp_in, gfc_source_file))
  errorcount++;
  }
@@ -635,9 +644,6 @@ gfc_cpp_init (void)
  else
cpp_assert (cpp_in, opt->arg);
}
-  else if (opt->code == OPT_MT || opt->code == OPT_MQ)
-   if (mkdeps *deps = cpp_get_deps (cpp_in))
- deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
  }
  
/* Pre-defined macros for non-required INTEGER kind types.  */

diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 
b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90
new file mode 100644
index 000..d42a257f83a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90
@@ -0,0 +1,15 @@
+! This test case ensures that the -MT flag is correctly replacing the object 
name in the dependency file.
+! See PR 47485
+!
+! Contributed by Vincent Vanlaer 
+!
+! { dg-do preprocess }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-M" }
+! { dg-additional-options "-MF deps" }
+! { dg-additional-options "-MT obj.o" }
+
+module test
+end module
+
+! { dg-final { scan-file "deps" "obj.o:.*" } }




Re: [PATCH v2] c++: Add tree walk case to reach A pack from B in ...B> [PR118265]

2025-02-03 Thread Jason Merrill

On 2/2/25 5:26 PM, A J Ryan Solutions Ltd wrote:
This version has all the updates as per feedback from version 1. It 
makes a minor correction to the code styling, reformats the commit 
message and moves the test into the cpp1z directory.
In addition I've updated the test to conform with c++17 for better 
coverage. Andrew Pinski had put one up on the ticket to use, it would be 
c++20, I can switch it to that if there was another reason to use it 
that I've overlooked.


Pushed, thanks!

Jason



Re: [PATCH v2] c++: auto in trailing-return-type in parameter [PR117778]

2025-02-03 Thread Jason Merrill

On 1/31/25 4:23 PM, Marek Polacek wrote:

On Fri, Jan 31, 2025 at 09:34:52AM -0500, Jason Merrill wrote:

On 1/30/25 5:24 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

-- >8 --
This PR describes a few issues, both ICE and rejects-valid, but
ultimately the problem is that we don't properly synthesize the
second auto in:

int
g (auto fp() -> auto)
{
  return fp ();
}

since r12-5860, which disabled auto_is_implicit_function_template_parm_p
in cp_parser_parameter_declaration after parsing the decl-specifier-seq.

If there is no trailing auto, there is no problem.

So we have to make sure auto_is_implicit_function_template_parm_p is
properly set when parsing the trailing auto.  A complication is that
one can write:

auto f (auto fp(auto fp2() -> auto) -> auto) -> auto;
~~~

where only the underlined auto should be synthesized.  So when we
parse a parameter-declaration-clause inside another
parameter-declaration-clause, we should not enable the flag.  We
have no flags to keep track of such nesting, but I think I can walk
current_binding_level to see if we find ourselves in such an unlikely
scenario.

PR c++/117778

gcc/cp/ChangeLog:

* parser.cc (cp_parser_late_return_type_opt): Maybe override
auto_is_implicit_function_template_parm_p.
(cp_parser_parameter_declaration): Update commentary.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-117778.C: New test.
* g++.dg/cpp2a/abbrev-fn2.C: New test.
* g++.dg/cpp2a/abbrev-fn3.C: New test.
---
   gcc/cp/parser.cc  | 24 -
   .../g++.dg/cpp1y/lambda-generic-117778.C  | 12 +
   gcc/testsuite/g++.dg/cpp2a/abbrev-fn2.C   | 49 +++
   gcc/testsuite/g++.dg/cpp2a/abbrev-fn3.C   |  7 +++
   4 files changed, 90 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-117778.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/abbrev-fn2.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/abbrev-fn3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 44515bb9074..89c5c2721a7 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25514,6 +25514,25 @@ cp_parser_late_return_type_opt (cp_parser *parser, 
cp_declarator *declarator,
 /* Consume the ->.  */
 cp_lexer_consume_token (parser->lexer);
+  /* We may be in the context of parsing a parameter declaration,
+namely, its declarator.  auto_is_implicit_function_template_parm_p
+will be disabled in that case.  But for code like
+
+  int g (auto fp() -> auto);
+
+we have to re-enable the flag for the trailing auto.  However, that
+only applies for the outermost trailing auto in a parameter clause; in
+
+  int f2 (auto fp(auto fp2() -> auto) -> auto);
+
+the inner -> auto should not be synthesized.  */
+  int i = 0;
+  for (cp_binding_level *b = current_binding_level;
+  b->kind == sk_function_parms; b = b->level_chain)
+   ++i;
+  auto cleanup = make_temp_override
+   (parser->auto_is_implicit_function_template_parm_p, i == 2);


This looks like it will wrongly allow declaring an implicit template within
a function; you need a testcase with local extern declarations.


Ah right, I didn't check that so it was broken.  We should check
!current_function_decl.
  

Incidentally, it seems odd that the override in
cp_parser_parameter_declaration is before an error early exit a few lines
below, moving it after that would avoid needing to clean it up on that path.


Good point, adjusted.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
This PR describes a few issues, both ICE and rejects-valid, but
ultimately the problem is that we don't properly synthesize the
second auto in:

   int
   g (auto fp() -> auto)
   {
 return fp ();
   }

since r12-5860, which disabled auto_is_implicit_function_template_parm_p
in cp_parser_parameter_declaration after parsing the decl-specifier-seq.

If there is no trailing auto, there is no problem.

So we have to make sure auto_is_implicit_function_template_parm_p is
properly set when parsing the trailing auto.  A complication is that
one can write:

   auto f (auto fp(auto fp2() -> auto) -> auto) -> auto;
   ~~~

where only the underlined auto should be synthesized.  So when we
parse a parameter-declaration-clause inside another
parameter-declaration-clause, we should not enable the flag.  We
have no flags to keep track of such nesting, but I think I can walk
current_binding_level to see if we find ourselves in such an unlikely
scenario.

PR c++/117778

gcc/cp/ChangeLog:

* parser.cc (cp_parser_late_return_type_opt): Maybe override
auto_is_implicit_function_template_parm_p.
(cp_parser_parameter_declaration): Move a make_temp

Re: [PATCH] c++: Modularise start_cleanup_fn [PR98893]

2025-02-03 Thread Jason Merrill

On 2/1/25 5:29 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

'start_cleanup_fn' is not currently viable in modules, due to generating
functions relying on the 'start_cleanup_cnt' counter which is reset to 0
with each new TU.  This means that cleanup functions declared in a TU
will conflict with any imported cleanup functions.

This patch mitigates the problem by using the mangled name of the decl
we're destroying as part of the name of the function.  This should avoid
clashes unless the decls would have clashed anyway.

PR c++/98893

gcc/cp/ChangeLog:

* decl.cc (start_cleanup_fn): Make name from the mangled name of
the passed-in decl.
(register_dtor_fn): Pass decl to start_cleanup_fn.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr98893_a.H: New test.
* g++.dg/modules/pr98893_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc   | 18 +-
  gcc/testsuite/g++.dg/modules/pr98893_a.H |  9 +
  gcc/testsuite/g++.dg/modules/pr98893_b.C | 10 ++
  3 files changed, 28 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr98893_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr98893_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index cf5e055e146..7219543823b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -96,7 +96,7 @@ static void record_key_method_defined (tree);
  static tree create_array_type_for_decl (tree, tree, tree, location_t);
  static tree get_atexit_node (void);
  static tree get_dso_handle_node (void);
-static tree start_cleanup_fn (bool);
+static tree start_cleanup_fn (tree, bool);
  static void end_cleanup_fn (void);
  static tree cp_make_fname_decl (location_t, tree, int);
  static void initialize_predefined_identifiers (void);
@@ -10373,23 +10373,23 @@ get_dso_handle_node (void)
  }
  
  /* Begin a new function with internal linkage whose job will be simply

-   to destroy some particular variable.  OB_PARM is true if object pointer
+   to destroy some particular DECL.  OB_PARM is true if object pointer
 is passed to the cleanup function, otherwise no argument is passed.  */
  
-static GTY(()) int start_cleanup_cnt;

-
  static tree
-start_cleanup_fn (bool ob_parm)
+start_cleanup_fn (tree decl, bool ob_parm)
  {
-  char name[32];
-
push_to_top_level ();
  
/* No need to mangle this.  */

push_lang_context (lang_name_c);
  
/* Build the name of the function.  */

-  sprintf (name, "__tcf_%d", start_cleanup_cnt++);
+  gcc_checking_assert (HAS_DECL_ASSEMBLER_NAME_P (decl));
+  const char *dname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  dname = targetm.strip_name_encoding (dname);
+  char *name = ACONCAT (("__tcf", dname, NULL));
+
tree fntype = TREE_TYPE (ob_parm ? get_cxa_atexit_fn_ptr_type ()
   : get_atexit_fn_ptr_type ());
/* Build the function declaration.  */
@@ -10482,7 +10482,7 @@ register_dtor_fn (tree decl)
build_cleanup (decl);
  
/* Now start the function.  */

-  cleanup = start_cleanup_fn (ob_parm);
+  cleanup = start_cleanup_fn (decl, ob_parm);
  
/* Now, recompute the cleanup.  It may contain SAVE_EXPRs that refer

 to the original function, rather than the anonymous one.  That
diff --git a/gcc/testsuite/g++.dg/modules/pr98893_a.H 
b/gcc/testsuite/g++.dg/modules/pr98893_a.H
new file mode 100644
index 000..062ab6d9ccc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr98893_a.H
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+struct S {
+  ~S() {}
+};
+inline void foo() {
+  static S a[1];
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr98893_b.C 
b/gcc/testsuite/g++.dg/modules/pr98893_b.C
new file mode 100644
index 000..9065589bdfb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr98893_b.C
@@ -0,0 +1,10 @@
+// { dg-additional-options "-fmodules" }
+
+import "pr98893_a.H";
+static S b[1];
+int main() {
+  foo();
+}
+
+// { dg-final { scan-assembler {__tcf_ZZ3foovE1a:} } }
+// { dg-final { scan-assembler {__tcf_ZL1b:} } }




Re: [PATCH] c++: Improve contracts support in modules [PR108205]

2025-02-03 Thread Jason Merrill

On 2/1/25 7:03 AM, Nathaniel Shead wrote:

Regtested on x86_64-pc-linux-gnu (so far just "dg.exp=contract*
modules.exp=contract*"), OK for trunk if full bootstrap+regtest passes?

-- >8 --

Modules makes some assumptions about types that currently aren't
fulfilled by the types created in contracts logic.  This patch ensures
that exporting inline functions using contracts works again with
modules.

PR c++/108205

gcc/cp/ChangeLog:

* contracts.cc (get_pseudo_contract_violation_type): Give names
to generated FIELD_DECLs.
(declare_handle_contract_violation): Mark contract_violation
type as external linkage.
(build_contract_handler_call): Ensure any builtin declarations
created here aren't treated as attached to the current module.


OK, but now I'm curious why we don't need this sort of thing in rtti.cc?


gcc/testsuite/ChangeLog:

* g++.dg/modules/contracts-5_a.C: New test.
* g++.dg/modules/contracts-5_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/contracts.cc  | 27 +---
  gcc/testsuite/g++.dg/modules/contracts-5_a.C |  8 ++
  gcc/testsuite/g++.dg/modules/contracts-5_b.C | 20 +++
  3 files changed, 46 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_b.C

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 5782ec8bf29..f2b126c8d6b 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -1633,19 +1633,22 @@ get_pseudo_contract_violation_type ()
   signed char _M_continue;
 If this changes, also update the initializer in
 build_contract_violation.  */
-  const tree types[] = { const_string_type_node,
-const_string_type_node,
-const_string_type_node,
-const_string_type_node,
-const_string_type_node,
-uint_least32_type_node,
-signed_char_type_node };
+  struct field_info { tree type; const char* name; };
+  const field_info info[] = {
+   { const_string_type_node, "_M_file" },
+   { const_string_type_node, "_M_function" },
+   { const_string_type_node, "_M_comment" },
+   { const_string_type_node, "_M_level" },
+   { const_string_type_node, "_M_role" },
+   { uint_least32_type_node, "_M_line" },
+   { signed_char_type_node, "_M_continue" }
+  };
tree fields = NULL_TREE;
-  for (tree type : types)
+  for (const field_info& i : info)
{
  /* finish_builtin_struct wants fieldss chained in reverse.  */
  tree next = build_decl (BUILTINS_LOCATION, FIELD_DECL,
- NULL_TREE, type);
+ get_identifier (i.name), i.type);
  DECL_CHAIN (next) = fields;
  fields = next;
}
@@ -1737,6 +1740,7 @@ declare_handle_contract_violation ()
create_implicit_typedef (viol_name, violation);
DECL_SOURCE_LOCATION (TYPE_NAME (violation)) = BUILTINS_LOCATION;
DECL_CONTEXT (TYPE_NAME (violation)) = current_namespace;
+  TREE_PUBLIC (TYPE_NAME (violation)) = true;
pushdecl_namespace_level (TYPE_NAME (violation), /*hidden*/true);
pop_namespace ();
pop_nested_namespace (std_node);
@@ -1761,6 +1765,11 @@ static void
  build_contract_handler_call (tree contract,
 contract_continuation cmode)
  {
+  /* We may need to declare new types, ensure they are not considered
+ attached to a named module.  */
+  auto module_kind_override = make_temp_override
+(module_kind, module_kind & ~(MK_PURVIEW | MK_ATTACH | MK_EXPORTING));
+
tree violation = build_contract_violation (contract, cmode);
tree violation_fn = declare_handle_contract_violation ();
tree call = build_call_n (violation_fn, 1, build_address (violation));
diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_a.C 
b/gcc/testsuite/g++.dg/modules/contracts-5_a.C
new file mode 100644
index 000..2ff6701ff3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/contracts-5_a.C
@@ -0,0 +1,8 @@
+// PR c++/108205
+// Test that the implicitly declared handle_contract_violation function is
+// properly matched with a later declaration in an importing TU.
+// { dg-additional-options "-fmodules -fcontracts 
-fcontract-continuation-mode=on" }
+// { dg-module-cmi test }
+
+export module test;
+export inline void foo(int x) noexcept [[ pre: x != 0 ]] {}
diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_b.C 
b/gcc/testsuite/g++.dg/modules/contracts-5_b.C
new file mode 100644
index 000..0e794b8ae45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/contracts-5_b.C
@@ -0,0 +1,20 @@
+// PR c++/108205
+// { dg-module-do run }
+// { dg-additional-options "-fmodules -fcontracts 
-fcontract-con

Re: [PATCH] c++: bogus -Wvexing-parse with trailing-return-type [PR118718]

2025-02-03 Thread Jason Merrill

On 1/31/25 4:21 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
This warning should not warn for

   auto f1 () -> auto;

because that cannot be confused with initializing a variable.

PR c++/118718

gcc/cp/ChangeLog:

* parser.cc (warn_about_ambiguous_parse): Don't warn when a trailing
return type is present.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wvexing-parse10.C: New test.
---
  gcc/cp/parser.cc| 4 
  gcc/testsuite/g++.dg/warn/Wvexing-parse10.C | 9 +
  2 files changed, 13 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wvexing-parse10.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 44515bb9074..1da881e295b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -23617,6 +23617,10 @@ warn_about_ambiguous_parse (const 
cp_decl_specifier_seq *decl_specifiers,
(const_cast(declarator
  return;
  
+  /* Don't warn for auto f () -> auto.  */

+  if (declarator->u.function.late_return_type)
+return;
+
/* Don't warn when the whole declarator (not just the declarator-id!)
   was parenthesized.  That is, don't warn for int(n()) but do warn
   for int(f)().  */
diff --git a/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C 
b/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C
new file mode 100644
index 000..3fbe88b7d00
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C
@@ -0,0 +1,9 @@
+// PR c++/118718
+// { dg-do compile { target c++14 } }
+
+void
+fn ()
+{
+  auto f1 () -> auto;
+  auto f2 (); // { dg-warning "empty parentheses" }
+}

base-commit: d6418fe22684f9335474d1fd405ade45954c069d




Re: [PATCH v11] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2025-02-03 Thread Jason Merrill

On 1/31/25 12:11 PM, Simon Martin wrote:

Hi Jason,

On 27 Jan 2025, at 16:49, Jason Merrill wrote:


On 1/27/25 10:41 AM, Simon Martin wrote:

Hi Jason,

On 17 Jan 2025, at 23:33, Jason Merrill wrote:


On 1/17/25 9:52 AM, Simon Martin wrote:

Hi Jason,

On 16 Jan 2025, at 22:49, Jason Merrill wrote:


On 10/16/24 11:43 AM, Simon Martin wrote:

As you know the patch had to be reverted due to PR117114, that
highlighted a bunch of issues with comparing DECL_VINDEXes: it
might
give false positives in case of multiple inheritance (the case in



PR117114), but also if there’s single inheritance by the



hierarchy
has
more than two levels (another issue I found while bootstrapping
with
rust enabled).


Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead
you.


The attached updated patch introduces an overrides_p function,
based
on
the existing check_final_overrider, and uses it when the
signatures



match.


That seems unnecessary.  It seems like removing that only breaks
Woverloaded-virt11.C, and making that work again only requires
bringing back the check that DECL_VINDEX (fndecl) is set (to any
value).  Or remembering that fndecl was a template, so it can't
really
have the same signature as a non-template, whatever
same_signature_p



says.

That’s right, only Woverloaded-virt11.C fails without the
check_final_overrider call.

Thanks for the suggestion to check whether fndecl is a template.



This
is
what the updated attached patch does, successfully tested on
x86_64-pc-linux-gnu.

OK for GCC 15? And if so, thoughts on backporting to release
branches
(technically it’s a regression but it’s “just” an incorrect
warning fix, so probably not worth the risk)?


Right, I wouldn't backport.


+   if (warn_overloaded_virtual == 1
+   && overrider_fndecls.elements () == num_fns)
+ /* All the fns override a base virtual.  */
+ continue;


This looks like the only use of the overrider_fndecls hash_set.  A
hash_set seems a bit overkill for checking whether everything in fns



is an overrider; keeping track of how many times the old
any_override
was set should work just as well?

Yeah you’re right :-/ I’ve changed my latest patch to simply
count
overriders.


+   /* fndecls hides base_fndecls[k].  */
+   auto_vec &hiders =
+ hidden_base_fndecls.get_or_insert (base_fndecls[k]);
+   if (!hiders.contains (fndecl))
+ hiders.safe_push (fndecl);


Hmm, do you think users want a full list of the overloads that don't



override?  I'd think the problem is more the overload that doesn't



exist rather than the ones that do.  The current code ends up in the



OVERLOAD handling of dump_decl that just prints scope::name.

Indeed, the full list is probably not super useful... One problem
with
the current code is that for conversion operators, it will give a
note
such as “note:   by 'operator’”, so I propose to keep track of
at
least one of the hiders, and use it to show the note (and get a
proper
“by 'virtual B::operator char()'” note for conversion operators).

Hence the updated patch, successfully tested on x86_64-pc-linux-gnu.
Ok
for trunk?



+   else if (!template_p /* Template methods don't override.  */
+&& same_signature_p (fndecl, base_fndecls[k]))
+ {
+   overriden_base_fndecls.add (base_fndecls[k]);
+   ++num_overriders;
+ }


I'm concerned that this will increment num_overriders multiple times
for a single fndecl if it overrides functions in multiple bases.

Such a case is covered by the new Woverloaded-virt11.C and does not
warn, but it’s true that we don’t take the “if
(warn_overloaded_virtual == 1 && num_overriders == num_fns)” continue,
and we should - thanks.

I have updated the patch to only increment num_overriders at the end of
the loop iterating on base functions if we’ve seen at least one
overridden base function. Successfully tested on x86_64-pc-linux-gnu. OK
for trunk?



@@ -3402,7 +3402,8 @@ location_of (tree t)
return input_location;
 }
   else if (TREE_CODE (t) == OVERLOAD)
-t = OVL_FIRST (t);
+t = OVL_FIRST (t) != conv_op_marker ? OVL_FIRST (t)
+  : OVL_FIRST (OVL_CHAIN (t));


Please add parentheses around the ?: expression to preserve the 
indentation.  OK with that tweak.


Jason



Re: [PATCH v2] c++: Don't merge friend declarations that specify default arguments [PR118319]

2025-02-03 Thread Jason Merrill

On 1/31/25 11:12 AM, Simon Martin wrote:

Hi Jason,

On 31 Jan 2025, at 16:29, Jason Merrill wrote:


On 1/31/25 9:52 AM, Simon Martin wrote:

Hi Jason,

On 9 Jan 2025, at 22:55, Jason Merrill wrote:


On 1/9/25 8:25 AM, Simon Martin wrote:

We segfault upon the following invalid code

=== cut here ===
template  struct S {
 friend void foo (int a = []{}());
};
void foo (int a) {}
int main () {
 S<0> t;
 foo ();
}
=== cut here ===

The problem is that we end up with a LAMBDA_EXPR callee in
set_flags_from_callee, and dereference its NULL_TREE
TREE_TYPE (TREE_TYPE ( )).

This patch simply sets the default argument to error_mark_node for
friend functions that do not meet the requirement in C++17
11.3.6/4.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118319

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Inspect all friend function parameters,
and set them to error_mark_node if invalid.

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg18.C: New test.

---
gcc/cp/decl.cc| 13 +---
gcc/testsuite/g++.dg/parse/defarg18.C | 48
+++
2 files changed, 57 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/parse/defarg18.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 503ecd9387e..b2761c23d3e 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -11134,14 +11134,19 @@ grokfndecl (tree ctype,
 expression, that declaration shall be a definition..."  */
  if (friendp && !funcdef_flag)
{
+  bool has_permerrored = false;
  for (tree t = FUNCTION_FIRST_USER_PARMTYPE (decl);
   t && t != void_list_node; t = TREE_CHAIN (t))
if (TREE_PURPOSE (t))
  {
-   permerror (DECL_SOURCE_LOCATION (decl),
-  "friend declaration of %qD specifies default "
-  "arguments and isn%'t a definition", decl);
-   break;
+   if (!has_permerrored)
+ {
+   has_permerrored = true;
+   permerror (DECL_SOURCE_LOCATION (decl),
+  "friend declaration of %qD specifies default "
+  "arguments and isn%'t a definition", decl);
+ }
+   TREE_PURPOSE (t) = error_mark_node;


If we're going to unconditionally change TREE_PURPOSE, then
permerror
needs to strengthen to error.  But I'd think we could leave the
current state in a non-template class, only changing the template
case.

Thanks. It’s true that setting the argument to error_mark_node is
contradictory with the fact that we accept the code with
-fpermissive,
even if only under processing_template_decl, so I checked if
there’s
not a better way of approaching this PR.

After a bit of investigation, I think that the real problem is that
duplicate_decls tries to merge the two declarations, even though they
don’t meet the constraint about friend functions and default
arguments.


I disagree; in this testcase the friend is the (lexically) first
declaration, the problem is that it's a non-defining friend (in a
template) that specifies default args, as addressed by your first
patch.

Fair.


I still think my earlier comments are the way forward here: leave the
non-template case alone (permerror, don't change TREE_PURPOSE), in a
template give a hard error and change to error_mark_node.

Thanks, understood. The reason I looked for another “solution” is
that it felt strange to be permissive in non-templates and stricter in
templates. For example, if we do so, we’ll regress the case I added in
defarg19.C in -fpermissive (also available at
https://godbolt.org/z/YT3dexGjM).

I’m probably splitting hair, and I’m happy to go ahead with your
suggestion if you think it’s fine. Otherwise I’ll see if I find some
better fix.


That's fine, it's common to be stricter in templates.

Jason



[PATCH] RTEMS: Add Cortex-M33 multilib

2025-02-03 Thread Sebastian Huber
Enable use of Armv8-M instruction set.

Account for CVE-2021-35465 mitigation [PR102035].  The
-mfix-cmse-cve-2021-35465 enabled by default, if -mcpu=cortex-m33 is
used.

gcc/

* config/arm/t-rtems: Add Cortex-M33 multilib.
---
 gcc/config/arm/t-rtems | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/t-rtems b/gcc/config/arm/t-rtems
index b2fcf572bca..797640bd4f4 100644
--- a/gcc/config/arm/t-rtems
+++ b/gcc/config/arm/t-rtems
@@ -17,8 +17,8 @@ MULTILIB_DIRNAMES += eb
 MULTILIB_OPTIONS   += mthumb
 MULTILIB_DIRNAMES  += thumb
 
-MULTILIB_OPTIONS   += 
march=armv5te+fp/march=armv6-m/march=armv7-a/march=armv7-a+simd/march=armv7-r/march=armv7-r+fp/mcpu=cortex-r52/mcpu=cortex-m3/mcpu=cortex-m4/mcpu=cortex-m4+nofp/mcpu=cortex-m7
-MULTILIB_DIRNAMES  += armv5te+fp   armv6-m   armv7-a   
armv7-a+simd   armv7-r   armv7-r+fp   cortex-r52  cortex-m3 
 cortex-m4  cortex-m4+nofp  cortex-m7
+MULTILIB_OPTIONS   += 
march=armv5te+fp/march=armv6-m/march=armv7-a/march=armv7-a+simd/march=armv7-r/march=armv7-r+fp/mcpu=cortex-r52/mcpu=cortex-m3/mcpu=cortex-m33/mcpu=cortex-m4/mcpu=cortex-m4+nofp/mcpu=cortex-m7
+MULTILIB_DIRNAMES  += armv5te+fp   armv6-m   armv7-a   
armv7-a+simd   armv7-r   armv7-r+fp   cortex-r52  cortex-m3 
 cortex-m33  cortex-m4  cortex-m4+nofp  cortex-m7
 
 MULTILIB_OPTIONS   += mfloat-abi=hard
 MULTILIB_DIRNAMES  += hard
@@ -33,6 +33,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv7-r+fp/mfloat-abi=hard
 MULTILIB_REQUIRED  += mthumb/march=armv7-r
 MULTILIB_REQUIRED  += mthumb/mcpu=cortex-r52/mfloat-abi=hard
 MULTILIB_REQUIRED  += mthumb/mcpu=cortex-m3
+MULTILIB_REQUIRED  += mthumb/mcpu=cortex-m33
 MULTILIB_REQUIRED  += mthumb/mcpu=cortex-m4/mfloat-abi=hard
 MULTILIB_REQUIRED  += mthumb/mcpu=cortex-m4+nofp
 MULTILIB_REQUIRED  += mthumb/mcpu=cortex-m7/mfloat-abi=hard
-- 
2.43.0



Re: [PATCH] RTEMS: Add Cortex-M33 multilib

2025-02-03 Thread Sebastian Huber
- Am 4. Feb 2025 um 4:15 schrieb Sebastian Huber 
sebastian.hu...@embedded-brains.de:

> Enable use of Armv8-M instruction set.
> 
> Account for CVE-2021-35465 mitigation [PR102035].  The
> -mfix-cmse-cve-2021-35465 enabled by default, if -mcpu=cortex-m33 is
> used.
> 
> gcc/
> 
>   * config/arm/t-rtems: Add Cortex-M33 multilib.
> ---
> gcc/config/arm/t-rtems | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)

I would like to back port this change to the GCC 13 and 14 branches.

-- 
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]

2025-02-03 Thread Harald Anlauf

Am 03.02.25 um 19:31 schrieb Jerry D:

On 2/3/25 2:49 AM, Richard Sandiford wrote:

Steve Kargl  writes:

On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote:

Am 01.02.25 um 21:03 schrieb Steve Kargl:

On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote:


the attached patch downgrades different constant character lengths 
in an
array constructor from a GNU to a legacy extension, so that users 
get a
warning with -std=gnu.  We continue to generate an error when 
standard

conformance is requested.

Regtested on x86_64-pc-linux-gnu (found one testcase where this
triggered... :)

OK for mainline?



My vote is 'no'.

This is either a GNU extension or an error.  It is certainly
not a legacy issue as array constructors simple cannot appear
old moldy *legacy* codes.


legacy /= moldy.

My intention is to downgrade existing, potentially dangerous
GNU extensions (like this one) carefully to "legacy", but not
with an axe.


I would be in favor of making it a hard error.  If you believe
gfortan must be able to compile invalid source, then add an option
such as -fallow-invalid-scalar-character-entities-in-array- 
constructor.


I don't see why we shall scare users by making code that is currently
accepted silently, because it is a GNU extension, suddenly to a hard
error.

So why must we be so tough?



Because -std=legacy allows a whole bunch of garbage.

Instead of fixing broken code, a user will slap -std=legacy
in a Makefile and move on.  Then years from now, you'll see
-std=legacy in a whole bunch of Makefiles whether it is needed
or not.  See -maligned-double and -fallow-argument-mismatch as
poster children.


I agree that this is what will happen.  But for people running 
benchmarks,

it's kind-of (kind-of) a feature.  Benchmarks tend to include relatively
old code by the time that they're released, and benchmarks continue to be
relevant (or at least widely tested) after they're out of maintenance.

So it has been really useful to have -std=legacy accept old, dangerous 
code,

since it means that we can continue to test old benchmarks with newer
compilers.  Improving the benchmark source to avoid the dangerous 
constructs

would invalidate the test and make it harder to compare with historical
results.


Again, just my $0.02.


Same here, just wanted to raise the benchmark use case.

Thanks,
Richard


I think we have had good discussion and for sake of the good of the 
order I recommend we push this for now.  The work has been done.


Regards,

Jerry



Thanks, Jerry!

Pushed: r15-7336-gf3a41e6cb5d70f




Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Peter Bergner
On 2/3/25 3:44 AM, Richard Biener wrote:
> On Mon, Feb 3, 2025 at 10:32 AM H.J. Lu  wrote:
>> I believe the original patch should be reverted.  Then my patch isn't needed.
> 
> I'm OK with that, but it's not my call.  I do wonder why the contributor did 
> not
> address any of the fallout.  Maybe he's gone?  Peter?

Surya (she) is working on the fallout.  In fact, one patch earlier this year
was committed and reverted due to some aarch64 fallout.  That said, Andrew
mentioned on IRC that he was interested in getting that patch back in for 
aarch64
because it helps shrink-wrapping and he believes the patch itself wasn't bad,
but exposed a latent issue that was causing the bootstrap issue on aarch64.

Surya also just recently submitted another patch to help with the original 
fallout:

  [PATCH] lra: initialize allocated_hard_reg_p[] for hard regs referenced in 
RTL [PR118533]

...which you commented on.  She is working on them.


I disagree with H.J.'s comment.  I have said before at the Cauldron and in
some bugzilla's, that Surya's fix is a correct fix.  The issues encountered
here seem to be latent issues exposed by Surya's fix (read also Matz's reply)
and as such, this patch should stay.  The correct path here is to track down
those latent issues and fix those.  I've asked Surya to continue to work on
the fallout, but any help from other's is greatly appreciated!

Reverting now would also cause performance regressions on Power, RISC-V and ARM.

Peter




Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]

2025-02-03 Thread Jerry D

On 2/3/25 2:14 PM, Vincent Vanlaer wrote:

Hi all,

Gentle ping for the patch below: https://gcc.gnu.org/pipermail/ 
fortran/2024-December/061467.html


Best wishes,
Vincent

On 30/12/2024 00:19, Vincent Vanlaer wrote:

The -MT and -MQ options should replace the default target in the
generated dependency file. deps_add_target needs to be called before
cpp_read_main_file, otherwise the original object name is added.

gcc/fortran/
PR fortran/47485
* cpp.cc: fix -MT/-MQ adding additional target instead of
  replacing the default

gcc/testsuite/
PR fortran/47485
* gfortran.dg/dependency_generation_1.f90: New test

Signed-off-by: Vincent Vanlaer 
---
  gcc/fortran/cpp.cc | 18 --
  .../gfortran.dg/dependency_generation_1.f90    | 15 +++
  2 files changed, 27 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/ 
dependency_generation_1.f90


diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
index 7c5f00cfd69..3b93d17b90f 100644
--- a/gcc/fortran/cpp.cc
+++ b/gcc/fortran/cpp.cc
@@ -96,6 +96,8 @@ struct gfc_cpp_option_data
    int deps_skip_system; /* -MM */
    const char *deps_filename;    /* -M[M]D */
    const char *deps_filename_user;   /* -MF  */
+  const char *deps_target_filename; /* -MT / -MQ  */
+  bool quote_deps_target_filename;  /* -MQ */
    int deps_missing_are_generated;   /* -MG */
    int deps_phony;   /* -MP */
    int warn_date_time;   /* -Wdate-time */
@@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int 
decoded_options_count,

    gfc_cpp_option.deps_missing_are_generated = 0;
    gfc_cpp_option.deps_filename = NULL;
    gfc_cpp_option.deps_filename_user = NULL;
+  gfc_cpp_option.deps_target_filename = NULL;
+  gfc_cpp_option.quote_deps_target_filename = false;
    gfc_cpp_option.multilib = NULL;
    gfc_cpp_option.prefix = NULL;
@@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char 
*arg, int value ATTRIBUTE_UNUSED

  case OPT_MQ:
  case OPT_MT:
-  
gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code = 
code;
-  
gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg = arg;

-  gfc_cpp_option.deferred_opt_count++;
+  gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ);
+  gfc_cpp_option.deps_target_filename = arg;
    break;
  case OPT_P:
@@ -593,6 +596,12 @@ gfc_cpp_init_0 (void)
  }
    gcc_assert(cpp_in);
+
+  if (gfc_cpp_option.deps_target_filename)
+    if (mkdeps *deps = cpp_get_deps (cpp_in))
+  deps_add_target (deps, gfc_cpp_option.deps_target_filename,
+   gfc_cpp_option.quote_deps_target_filename);
+
    if (!cpp_read_main_file (cpp_in, gfc_source_file))
  errorcount++;
  }
@@ -635,9 +644,6 @@ gfc_cpp_init (void)
    else
  cpp_assert (cpp_in, opt->arg);
  }
-  else if (opt->code == OPT_MT || opt->code == OPT_MQ)
-    if (mkdeps *deps = cpp_get_deps (cpp_in))
-  deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
  }
    /* Pre-defined macros for non-required INTEGER kind types.  */
diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 b/ 
gcc/testsuite/gfortran.dg/dependency_generation_1.f90

new file mode 100644
index 000..d42a257f83a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90
@@ -0,0 +1,15 @@
+! This test case ensures that the -MT flag is correctly replacing the 
object name in the dependency file.

+! See PR 47485
+!
+! Contributed by Vincent Vanlaer 
+!
+! { dg-do preprocess }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-M" }
+! { dg-additional-options "-MF deps" }
+! { dg-additional-options "-MT obj.o" }
+
+module test
+end module
+
+! { dg-final { scan-file "deps" "obj.o:.*" } }




Do you have commit rights to gcc? I did not catch your original post.

Jerry


Re: [PATCH] c++: Improve contracts support in modules [PR108205]

2025-02-03 Thread Nathaniel Shead
On Mon, Feb 03, 2025 at 06:57:14PM -0500, Jason Merrill wrote:
> On 2/1/25 7:03 AM, Nathaniel Shead wrote:
> > Regtested on x86_64-pc-linux-gnu (so far just "dg.exp=contract*
> > modules.exp=contract*"), OK for trunk if full bootstrap+regtest passes?
> > 
> > -- >8 --
> > 
> > Modules makes some assumptions about types that currently aren't
> > fulfilled by the types created in contracts logic.  This patch ensures
> > that exporting inline functions using contracts works again with
> > modules.
> > 
> > PR c++/108205
> > 
> > gcc/cp/ChangeLog:
> > 
> > * contracts.cc (get_pseudo_contract_violation_type): Give names
> > to generated FIELD_DECLs.
> > (declare_handle_contract_violation): Mark contract_violation
> > type as external linkage.
> > (build_contract_handler_call): Ensure any builtin declarations
> > created here aren't treated as attached to the current module.
> 
> OK, but now I'm curious why we don't need this sort of thing in rtti.cc?
> 

Modules streaming ignores the types built for RTTI because DECL_TINFO_P
is handled specially in trees_out::decl_node (it just writes enough
information for the importer to rebuild the type itself).  But it might
be worth at least forcing global attachment just in case the types
having module attachment causes something else to go wrong; thoughts?

That said, we will definitely need something like this for the types
built for ubsan (PR98735), which I have some ideas on how to fix but
probably won't get to for GCC15 since there's some other complications
there.

Nathaniel

> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/contracts-5_a.C: New test.
> > * g++.dg/modules/contracts-5_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/contracts.cc  | 27 +---
> >   gcc/testsuite/g++.dg/modules/contracts-5_a.C |  8 ++
> >   gcc/testsuite/g++.dg/modules/contracts-5_b.C | 20 +++
> >   3 files changed, 46 insertions(+), 9 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_a.C
> >   create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_b.C
> > 
> > diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
> > index 5782ec8bf29..f2b126c8d6b 100644
> > --- a/gcc/cp/contracts.cc
> > +++ b/gcc/cp/contracts.cc
> > @@ -1633,19 +1633,22 @@ get_pseudo_contract_violation_type ()
> >signed char _M_continue;
> >  If this changes, also update the initializer in
> >  build_contract_violation.  */
> > -  const tree types[] = { const_string_type_node,
> > -const_string_type_node,
> > -const_string_type_node,
> > -const_string_type_node,
> > -const_string_type_node,
> > -uint_least32_type_node,
> > -signed_char_type_node };
> > +  struct field_info { tree type; const char* name; };
> > +  const field_info info[] = {
> > +   { const_string_type_node, "_M_file" },
> > +   { const_string_type_node, "_M_function" },
> > +   { const_string_type_node, "_M_comment" },
> > +   { const_string_type_node, "_M_level" },
> > +   { const_string_type_node, "_M_role" },
> > +   { uint_least32_type_node, "_M_line" },
> > +   { signed_char_type_node, "_M_continue" }
> > +  };
> > tree fields = NULL_TREE;
> > -  for (tree type : types)
> > +  for (const field_info& i : info)
> > {
> >   /* finish_builtin_struct wants fieldss chained in reverse.  */
> >   tree next = build_decl (BUILTINS_LOCATION, FIELD_DECL,
> > - NULL_TREE, type);
> > + get_identifier (i.name), i.type);
> >   DECL_CHAIN (next) = fields;
> >   fields = next;
> > }
> > @@ -1737,6 +1740,7 @@ declare_handle_contract_violation ()
> > create_implicit_typedef (viol_name, violation);
> > DECL_SOURCE_LOCATION (TYPE_NAME (violation)) = BUILTINS_LOCATION;
> > DECL_CONTEXT (TYPE_NAME (violation)) = current_namespace;
> > +  TREE_PUBLIC (TYPE_NAME (violation)) = true;
> > pushdecl_namespace_level (TYPE_NAME (violation), /*hidden*/true);
> > pop_namespace ();
> > pop_nested_namespace (std_node);
> > @@ -1761,6 +1765,11 @@ static void
> >   build_contract_handler_call (tree contract,
> >  contract_continuation cmode)
> >   {
> > +  /* We may need to declare new types, ensure they are not considered
> > + attached to a named module.  */
> > +  auto module_kind_override = make_temp_override
> > +(module_kind, module_kind & ~(MK_PURVIEW | MK_ATTACH | MK_EXPORTING));
> > +
> > tree violation = build_contract_violation (contract, cmode);
> > tree violation_fn = declare_handle_contract_violation ();
> > tree call = build_call_n (violation_fn, 1, build_address (violation));
> > diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_a.C 
> > b/gcc/tes

Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]

2025-02-03 Thread Vincent Vanlaer





On 4/02/2025 01:42, Jerry D wrote:

On 2/3/25 2:14 PM, Vincent Vanlaer wrote:

Hi all,

Gentle ping for the patch below: https://gcc.gnu.org/pipermail/ 
fortran/2024-December/061467.html


Best wishes,
Vincent

On 30/12/2024 00:19, Vincent Vanlaer wrote:

The -MT and -MQ options should replace the default target in the
generated dependency file. deps_add_target needs to be called before
cpp_read_main_file, otherwise the original object name is added.

gcc/fortran/
PR fortran/47485
* cpp.cc: fix -MT/-MQ adding additional target instead of
  replacing the default

gcc/testsuite/
PR fortran/47485
* gfortran.dg/dependency_generation_1.f90: New test

Signed-off-by: Vincent Vanlaer 
---
  gcc/fortran/cpp.cc | 18 
--

  .../gfortran.dg/dependency_generation_1.f90    | 15 +++
  2 files changed, 27 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/ 
dependency_generation_1.f90


diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
index 7c5f00cfd69..3b93d17b90f 100644
--- a/gcc/fortran/cpp.cc
+++ b/gcc/fortran/cpp.cc
@@ -96,6 +96,8 @@ struct gfc_cpp_option_data
    int deps_skip_system; /* -MM */
    const char *deps_filename;    /* -M[M]D */
    const char *deps_filename_user;   /* -MF  */
+  const char *deps_target_filename; /* -MT / -MQ  */
+  bool quote_deps_target_filename;  /* -MQ */
    int deps_missing_are_generated;   /* -MG */
    int deps_phony;   /* -MP */
    int warn_date_time;   /* -Wdate-time */
@@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int 
decoded_options_count,

    gfc_cpp_option.deps_missing_are_generated = 0;
    gfc_cpp_option.deps_filename = NULL;
    gfc_cpp_option.deps_filename_user = NULL;
+  gfc_cpp_option.deps_target_filename = NULL;
+  gfc_cpp_option.quote_deps_target_filename = false;
    gfc_cpp_option.multilib = NULL;
    gfc_cpp_option.prefix = NULL;
@@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char 
*arg, int value ATTRIBUTE_UNUSED

  case OPT_MQ:
  case OPT_MT:
- 
gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code 
= code;
- gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg 
= arg;

-  gfc_cpp_option.deferred_opt_count++;
+  gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ);
+  gfc_cpp_option.deps_target_filename = arg;
    break;
  case OPT_P:
@@ -593,6 +596,12 @@ gfc_cpp_init_0 (void)
  }
    gcc_assert(cpp_in);
+
+  if (gfc_cpp_option.deps_target_filename)
+    if (mkdeps *deps = cpp_get_deps (cpp_in))
+  deps_add_target (deps, gfc_cpp_option.deps_target_filename,
+   gfc_cpp_option.quote_deps_target_filename);
+
    if (!cpp_read_main_file (cpp_in, gfc_source_file))
  errorcount++;
  }
@@ -635,9 +644,6 @@ gfc_cpp_init (void)
    else
  cpp_assert (cpp_in, opt->arg);
  }
-  else if (opt->code == OPT_MT || opt->code == OPT_MQ)
-    if (mkdeps *deps = cpp_get_deps (cpp_in))
-  deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
  }
    /* Pre-defined macros for non-required INTEGER kind types.  */
diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 
b/ gcc/testsuite/gfortran.dg/dependency_generation_1.f90

new file mode 100644
index 000..d42a257f83a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90
@@ -0,0 +1,15 @@
+! This test case ensures that the -MT flag is correctly replacing 
the object name in the dependency file.

+! See PR 47485
+!
+! Contributed by Vincent Vanlaer 
+!
+! { dg-do preprocess }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-M" }
+! { dg-additional-options "-MF deps" }
+! { dg-additional-options "-MT obj.o" }
+
+module test
+end module
+
+! { dg-final { scan-file "deps" "obj.o:.*" } }




Do you have commit rights to gcc? I did not catch your original post.

Jerry


I do not, this is my first time contributing to GCC.

Vincent


Re: [PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]

2025-02-03 Thread Jason Merrill

On 2/2/25 5:14 AM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, this pedwarni is desirable for the implicit or
explicit capturing of structured bindings in C++17, but in the case of
init-captures the initializer is just some expression and that can include
structured bindings.

So, the following patch limits the warning to non-explicit_init_p.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2025-02-02  Jakub Jelinek  

PR c++/118719
* lambda.cc (add_capture): Only pedwarn about capturing structured
binding if !explicit_init_p.

* g++.dg/cpp1z/decomp63.C: New test.

--- gcc/cp/lambda.cc.jj 2025-01-24 17:37:49.004457905 +0100
+++ gcc/cp/lambda.cc2025-01-31 23:47:08.907034696 +0100
@@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree
return error_mark_node;
}
  
-  if (cxx_dialect < cxx20)

+  if (cxx_dialect < cxx20 && !explicit_init_p)
{
  auto_diagnostic_group d;
  tree stripped_init = tree_strip_any_location_wrapper (initializer);
--- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj2025-01-31 23:54:15.480699418 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp63.C   2025-01-31 23:53:02.998578507 
+0100
@@ -0,0 +1,18 @@
+// PR c++/118719
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+int
+main ()
+{
+  int a[] = { 42 };
+  auto [x] = a;// { dg-warning "structured bindings 
only available with" "" { target c++14_down } }
+   // { dg-message "declared here" 
"" { target c++17_down } .-1 }
+  [=] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings 
are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [&] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are 
a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [x] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings 
are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [&x] () { int b = x; (void) b; };// { dg-warning "captured structured 
bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [x = x] () { int b = x; (void) b; }; // { dg-warning "lambda capture initializers 
only available with" "" { target c++11_only } }
+  [y = x] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers 
only available with" "" { target c++11_only } }
+  [y = x * 2] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers 
only available with" "" { target c++11_only } }
+}

Jakub





[committed] i386: Fix and improve TARGET_INDIRECT_BRANCH_REGISTER handling some more

2025-02-03 Thread Uros Bizjak
gcc/ChangeLog:

* config/i386/i386.md (*sibcall_pop_memory):
Disable for TARGET_INDIRECT_BRANCH_REGISTER
* config/i386/predicates.md (call_insn_operand): Enable when
"satisfies_constraint_Bw (op)" is true, instead of open-coding
constraint here.
(sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)".

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d6ae3ee378a..cb37b2af50b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20244,7 +20244,7 @@ (define_insn "*sibcall_pop_memory"
(plus:SI (reg:SI SP_REG)
 (match_operand:SI 2 "immediate_operand" "i")))
(unspec [(const_int 0)] UNSPEC_PEEPSIB)]
-  "!TARGET_64BIT"
+  "!TARGET_64BIT && !TARGET_INDIRECT_BRANCH_REGISTER"
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 9a9101ed374..8631588f78e 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -781,22 +781,14 @@ (define_special_predicate "call_insn_operand"
   (ior (match_test "constant_call_address_operand
 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_operand")
-   (and (not (match_test "TARGET_INDIRECT_BRANCH_REGISTER"))
-   (ior (and (not (match_test "TARGET_X32"))
- (match_operand 0 "memory_operand"))
-(and (match_test "TARGET_X32 && Pmode == DImode")
- (match_operand 0 "GOT_memory_operand"))
+   (match_test "satisfies_constraint_Bw (op)")))
 
 ;; Similarly, but for tail calls, in which we cannot allow memory references.
 (define_special_predicate "sibcall_insn_operand"
   (ior (match_test "constant_call_address_operand
 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "register_no_elim_operand")
-   (and (not (match_test "TARGET_INDIRECT_BRANCH_REGISTER"))
-   (ior (and (not (match_test "TARGET_X32"))
- (match_operand 0 "sibcall_memory_operand"))
-(and (match_test "TARGET_X32 && Pmode == DImode")
- (match_operand 0 "GOT_memory_operand"))
+   (match_test "satisfies_constraint_Bs (op)")))
 
 ;; Return true if OP is a 32-bit GOT symbol operand.
 (define_predicate "GOT32_symbol_operand"


Re: [PATCH] IBM zSystems: Do not use @PLT with larl

2025-02-03 Thread Andreas Krebbel

gcc/ChangeLog:

* config/s390/s390.cc (print_operand): Remove the no longer
necessary 31-bit and weak symbol handling.
* config/s390/s390.md (*movdi_64): Do not use @PLT with larl.
(*movsi_larl): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/call-z10-pic-nodatarel.c: Adjust
expectations.
* gcc.target/s390/call-z10-pic.c: Likewise.
* gcc.target/s390/call-z10.c: Likewise.
* gcc.target/s390/call-z9-pic-nodatarel.c: Likewise.
* gcc.target/s390/call-z9-pic.c: Likewise.
* gcc.target/s390/call-z9.c: Likewise.


Ok. Thanks!


Andreas



---
  gcc/config/s390/s390.cc  | 16 +++-
  gcc/config/s390/s390.md  |  8 
  .../gcc.target/s390/call-z10-pic-nodatarel.c |  6 ++
  gcc/testsuite/gcc.target/s390/call-z10-pic.c |  6 ++
  gcc/testsuite/gcc.target/s390/call-z10.c | 14 +-
  .../gcc.target/s390/call-z9-pic-nodatarel.c  |  6 ++
  gcc/testsuite/gcc.target/s390/call-z9-pic.c  |  6 ++
  gcc/testsuite/gcc.target/s390/call-z9.c  | 14 +-
  8 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 86a5f059b85..1d96df49fea 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8585,7 +8585,7 @@ print_operand_address (FILE *file, rtx addr)
  'E': print opcode suffix for branch on index instruction.
  'G': print the size of the operand in bytes.
  'J': print tls_load/tls_gdcall/tls_ldcall suffix
-'K': print @PLT suffix for call targets and load address values.
+'K': print @PLT suffix for branch targets; do not use with larl.
  'M': print the second word of a TImode operand.
  'N': print the second word of a DImode operand.
  'O': print only the displacement of a memory reference or address.
@@ -8854,19 +8854,9 @@ print_operand (FILE *file, rtx x, int code)
 call even static functions via PLT.  ld will optimize @PLT away for
 normal code, and keep it for patches.
  
-	 Do not indiscriminately add @PLT in 31-bit mode due to the %r12

-restriction, use UNSPEC_PLT31 instead.
-
 @PLT only makes sense for functions, data is taken care of by
--mno-pic-data-is-text-relative.
-
-Adding @PLT interferes with handling of weak symbols in non-PIC code,
-since their addresses are loaded with larl, which then always produces
-a non-NULL result, so skip them here as well.  */
-  if (TARGET_64BIT
- && GET_CODE (x) == SYMBOL_REF
- && SYMBOL_REF_FUNCTION_P (x)
- && !(SYMBOL_REF_WEAK (x) && !flag_pic))
+-mno-pic-data-is-text-relative.  */
+  if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_FUNCTION_P (x))
fprintf (file, "@PLT");
return;
  }
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index c164ea72c78..9d495803387 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2001,7 +2001,7 @@
 vlgvg\t%0,%v1,0
 vleg\t%v0,%1,0
 vsteg\t%v1,%0,0
-   larl\t%0,%1%K1"
+   larl\t%0,%1"
[(set_attr "op_type" 
"RI,RI,RI,RI,RI,RIL,RIL,RIL,RRE,RRE,RRE,RXY,RIL,RRE,RXY,
  
RXY,RR,RX,RXY,RX,RXY,RIL,SIL,*,*,RS,RS,VRI,VRR,VRS,VRS,
  VRX,VRX,RIL")
@@ -2390,7 +2390,7 @@
  (match_operand:SI 1 "larl_operand" "X"))]
"!TARGET_64BIT
 && !FP_REG_P (operands[0])"
-  "larl\t%0,%1%K1"
+  "larl\t%0,%1"
 [(set_attr "op_type" "RIL")
  (set_attr "type""larl")
  (set_attr "z10prop" "z10_fwd_A1")
@@ -11735,7 +11735,7 @@
[(set (match_operand 0 "register_operand" "=a")
  (unspec [(label_ref (match_operand 1 "" ""))] UNSPEC_MAIN_BASE))]
"GET_MODE (operands[0]) == Pmode"
-  "larl\t%0,%1%K1"
+  "larl\t%0,%1"
[(set_attr "op_type" "RIL")
 (set_attr "type""larl")
 (set_attr "z10prop" "z10_fwd_A1")
@@ -11755,7 +11755,7 @@
[(set (match_operand 0 "register_operand" "=a")
  (unspec [(label_ref (match_operand 1 "" ""))] UNSPEC_RELOAD_BASE))]
"GET_MODE (operands[0]) == Pmode"
-  "larl\t%0,%1%K1"
+  "larl\t%0,%1"
[(set_attr "op_type" "RIL")
 (set_attr "type""larl")
 (set_attr "z10prop" "z10_fwd_A1")])
diff --git a/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c 
b/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
index 49984614bc6..6df0c75584f 100644
--- a/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
+++ b/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
@@ -7,10 +7,8 @@
  /* { dg-final { scan-assembler {lgrl\t%r2,foo@GOTENT\n} { target lp64 } } } */
  /* { dg-final { scan-assembler {lrl\t%r2,foo@GOTENT\n} { target { ! lp64 } } 
} } */
  
-/* { dg-final { scan-assembler {brasl\t%r\d+,foostatic@PLT\n} { target lp64 } } } */

-/* { dg-final { scan-assembler {bra

Re: [PATCH 61/61] Fix pr54240

2025-02-03 Thread Richard Biener
On Fri, Jan 31, 2025 at 7:18 PM Aleksandar Rakic
 wrote:
>
> From: Chao-ying Fu 

OK

> gcc/testsuite/
> * gcc.target/mips/pr54240.c: Scan phiopt2.
>
> Cherry-picked 02dd052d4822ca187af075f1fb5301c954844144
> from https://github.com/MIPS/gcc
>
> Signed-off-by: Chao-ying Fu 
> Signed-off-by: Aleksandar Rakic 
> ---
>  gcc/testsuite/gcc.target/mips/pr54240.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/mips/pr54240.c 
> b/gcc/testsuite/gcc.target/mips/pr54240.c
> index d3976f6cfef..31b793bb8c6 100644
> --- a/gcc/testsuite/gcc.target/mips/pr54240.c
> +++ b/gcc/testsuite/gcc.target/mips/pr54240.c
> @@ -27,4 +27,4 @@ NOMIPS16 int foo(S *s)
>return next->v;
>  }
>
> -/* { dg-final { scan-tree-dump "Hoisting adjacent loads" "phiopt1" } } */
> +/* { dg-final { scan-tree-dump "Hoisting adjacent loads" "phiopt2" } } */
> --
> 2.34.1


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Richard Biener
On Mon, Feb 3, 2025 at 10:32 AM H.J. Lu  wrote:
>
> On Mon, Feb 3, 2025 at 5:27 PM Richard Biener
>  wrote:
> >
> > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu  wrote:
> > >
> > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > > Author: Surya Kumari Jangala 
> > > Date:   Tue Jun 25 08:37:49 2024 -0500
> > >
> > > ira: Scale save/restore costs of callee save registers with block 
> > > frequency
> > >
> > > scales the cost of saving/restoring a callee-save hard register in 
> > > epilogue
> > > and prologue with the entry block frequency, which, if not optimizing for
> > > size, is 1, for all targets.  As the result, callee-saved registers
> > > may not be used to preserve local variable values across calls on some
> > > targets, like x86.  Add a target hook for the callee-saved register cost
> > > scale in epilogue and prologue used by IRA.  The default version of this
> > > target hook returns 1 if optimizing for size, otherwise returns the entry
> > > block frequency.  Add an x86 version of this target hook to restore the
> > > old behavior prior to the above commit.
> > >
> > > PR rtl-optimization/111673
> > > PR rtl-optimization/115932
> > > PR rtl-optimization/116028
> > > PR rtl-optimization/117081
> > > PR rtl-optimization/117082
> > > PR rtl-optimization/118497
> > > * ira-color.cc (assign_hard_reg): Call the target hook for the
> > > callee-saved register cost scale in epilogue and prologue.
> > > * target.def (ira_callee_saved_register_cost_scale): New target
> > > hook.
> > > * targhooks.cc (default_ira_callee_saved_register_cost_scale):
> > > New.
> > > * targhooks.h (default_ira_callee_saved_register_cost_scale):
> > > Likewise.
> > > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> > > New.
> > > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
> > > * doc/tm.texi: Regenerated.
> > > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
> > > New.
> > >
> > > Signed-off-by: H.J. Lu 
> > > ---
> > >  gcc/config/i386/i386.cc | 11 +++
> > >  gcc/doc/tm.texi |  8 
> > >  gcc/doc/tm.texi.in  |  2 ++
> > >  gcc/ira-color.cc|  3 +--
> > >  gcc/target.def  | 12 
> > >  gcc/targhooks.cc|  8 
> > >  gcc/targhooks.h |  1 +
> > >  7 files changed, 43 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index f89201684a8..3128973ba79 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
> > >return false;
> > >  }
> > >
> > > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> > > +
> > > +static int
> > > +ix86_ira_callee_saved_register_cost_scale (int)
> > > +{
> > > +  return 1;
> > > +}
> > > +
> > >  /* Return true if a set of DST by the expression SRC should be allowed.
> > > This prevents complex sets of likely_spilled hard regs before split1. 
> > >  */
> > >
> > > @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
> > >  #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
> > > ix86_preferred_output_reload_class
> > >  #undef TARGET_CLASS_LIKELY_SPILLED_P
> > >  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> > > +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> > > +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> > > +  ix86_ira_callee_saved_register_cost_scale
> > >
> > >  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
> > >  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > index 0de24eda6f0..9f42913a4ef 100644
> > > --- a/gcc/doc/tm.texi
> > > +++ b/gcc/doc/tm.texi
> > > @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for 
> > > given pseudo from
> > >The default version of this target hook always returns given class.
> > >  @end deftypefn
> > >
> > > +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> > > (int @var{hard_regno})
> > > +A target hook which returns the callee-saved register @var{hard_regno}
> > > +cost scale in epilogue and prologue used by IRA.
> > > +
> > > +The default version of this target hook returns 1 if optimizing for
> > > +size, otherwise returns the entry block frequency.
> > > +@end deftypefn
> > > +
> > >  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
> > >  A target hook which returns true if we use LRA instead of reload pass.
> > >
> > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > > index 631d04131e3..6dbe22581ca 100644
> > > --- a/gcc/doc/tm.texi.in
> > > +++ b/gcc/doc/tm.texi.in
> > > @@ -2388,6 +2388,8 @@ in the reload pass.
> > >
> > >  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > >
> > > +@hook TARGET_IRA_CALLEE_SAVED_REGIST

Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Andreas Schwab
On Feb 02 2025, Thomas Koenig wrote:

> I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
> to gcc-patches also, as normal, but got back an e-mail that it
> was too large. and that a moderator would look at it.

The mail has been accepted anyway:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH 31/61] Improve aligned straight line memcpy

2025-02-03 Thread Jakub Jelinek
On Mon, Feb 03, 2025 at 10:36:15AM +0100, Richard Biener wrote:
> > --- a/gcc/config/mips/mips.cc
> > +++ b/gcc/config/mips/mips.cc
> > @@ -9631,7 +9631,13 @@ mips_expand_block_move (rtx dest, rtx src, rtx 
> > length, rtx alignment)
> >  {
> >if (ISA_HAS_COPY)
> >   return mips16_expand_copy (dest, src, length, alignment);
> > -  else if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER)
> > +  else if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER
> > +   /* We increase slightly the maximum number of bytes in
> > + a straight-line block if the source and destination
> > + are aligned to the register width.  */
> > +   || (!optimize_size
> > +  && INTVAL (alignment) == UNITS_PER_WORD
> > +  && INTVAL (length) <= MIPS_MAX_MOVE_MEM_STRAIGHT))

The formatting here doesn't follow the coding conventions.
Dunno if this is in the MUA say replacing a tab with spaces, but unlikely,
e.g. the || line should be indented by one tab and 7 spaces to go under
INTVAL in else if line.
See https://gcc.gnu.org/contribute.html#standards for details.

Jakub



Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Mark Wielaard
Hi Thomas,

On Sun, Feb 02, 2025 at 07:09:14PM +0100, Thomas Koenig via Gcc wrote:
> I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
> to gcc-patches also, as normal, but got back an e-mail that it
> was too large. and that a moderator would look at it.

I think that was done, since the message is here:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html
I also have in in my local gcc-patches inbox.
Didn't you receive it yourself through the list?

> Maybe the limits can be increased a bit, sometimes patches can
> be quite large, especially if they contain large test cases
> or a large number of generated files.

The problem is, as always spam... Do you find the current limit (400K)
restricts you often from fast posting to the gcc-patches list?

> (Does anybody actually look at the messages, as promised in the e-mail?=

I think it is done multiple times each day. The current moderators are
Jeff and Marc, with help from the Sourceware volunteers monitoring
postmaster. I know some of these people, including Marc and myself
were at Fosdem this weekend. How long did you have to wait for your
message to get to the list?

Cheers,

Mark


[PATCH] tree-optimization/118717 - store commoning vs. abnormals

2025-02-03 Thread Richard Biener
When we sink common stores in cselim or the sink pass we have to
make sure to not introduce overlapping lifetimes for abnormals
used in the ref.  The easiest is to avoid sinking stmts which
reference abnormals at all which is what the following does.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/118717
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1):
Do not common stores referencing abnormal SSA names.
* tree-ssa-sink.cc (sink_common_stores_to_bb): Likewise.

* gcc.dg/torture/pr118717.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr118717.c | 41 +
 gcc/tree-ssa-phiopt.cc  |  4 ++-
 gcc/tree-ssa-sink.cc|  4 ++-
 3 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118717.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr118717.c 
b/gcc/testsuite/gcc.dg/torture/pr118717.c
new file mode 100644
index 000..42dc5ec84f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118717.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+
+void jj(void);
+int ff1(void) __attribute__((__returns_twice__));
+struct s2 {
+  int prev;
+};
+typedef struct s1 {
+  unsigned interrupt_flag;
+  unsigned interrupt_mask;
+  int tag;
+  int state;
+}s1;
+int ff(void);
+static inline
+int mm(s1 *ec) {
+  if (ff())
+if (ec->interrupt_flag & ~(ec)->interrupt_mask)
+  return 0;
+}
+void ll(s1 *ec) {
+  int t = 1;
+  int state;
+  if (t)
+  {
+{
+  s1 *const _ec = ec;
+  struct s2 _tag = {0};
+  if (ff1())
+   state = ec->state;
+  else
+   state = 0;
+  if (!state)
+   mm (ec);
+  _ec->tag = _tag.prev;
+}
+if (state)
+  __builtin_exit(0);
+  }
+  jj();
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 64d3ba9e160..f67f52d2d69 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3646,7 +3646,9 @@ cond_if_else_store_replacement_1 (basic_block then_bb, 
basic_block else_bb,
   || else_assign == NULL
   || !gimple_assign_single_p (else_assign)
   || gimple_clobber_p (else_assign)
-  || gimple_has_volatile_ops (else_assign))
+  || gimple_has_volatile_ops (else_assign)
+  || stmt_references_abnormal_ssa_name (then_assign)
+  || stmt_references_abnormal_ssa_name (else_assign))
 return false;
 
   lhs = gimple_assign_lhs (then_assign);
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index e79762b9848..959e0d5c6be 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-eh.h"
 #include "tree-ssa-live.h"
+#include "tree-dfa.h"
 
 /* TODO:
1. Sinking store only using scalar promotion (IE without moving the RHS):
@@ -516,7 +517,8 @@ sink_common_stores_to_bb (basic_block bb)
  gimple *def = SSA_NAME_DEF_STMT (arg);
  if (! is_gimple_assign (def)
  || stmt_can_throw_internal (cfun, def)
- || (gimple_phi_arg_edge (phi, i)->flags & EDGE_ABNORMAL))
+ || (gimple_phi_arg_edge (phi, i)->flags & EDGE_ABNORMAL)
+ || stmt_references_abnormal_ssa_name (def))
{
  /* ???  We could handle some cascading with the def being
 another PHI.  We'd have to insert multiple PHIs for
-- 
2.43.0


Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Marc Poulhiès
February 3, 2025 at 11:02 AM, "Mark Wielaard" mailto:m...@klomp.org?to=%22Mark%20Wielaard%22%20%3Cmark%40klomp.org%3E > wrote:

> > (Does anybody actually look at the messages, as promised in the e-mail?=
> > 
> I think it is done multiple times each day. The current moderators are
> Jeff and Marc, with help from the Sourceware volunteers monitoring
> postmaster. I know some of these people, including Marc and myself
> were at Fosdem this weekend. How long did you have to wait for your
> message to get to the list?

Hello,

I usually look at the queue a few times a day (working day)... So at least in 
my case, I may not be very active during the weekends (even less so this 
weekend)...
As for unlocking too-big patches, I happen to accept the ones that are "close" 
to the limit. I think I asked last year about the big translation patches and 
someone (Jospeh IIRC) told me that it was ok to accept them. Should I be more 
strict and reject anything above the limit?


Marc

PS: would be nice if git-send-email could take care of this...


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu  wrote:
>>
>> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
>> Author: Surya Kumari Jangala 
>> Date:   Tue Jun 25 08:37:49 2024 -0500
>>
>> ira: Scale save/restore costs of callee save registers with block 
>> frequency
>>
>> scales the cost of saving/restoring a callee-save hard register in epilogue
>> and prologue with the entry block frequency, which, if not optimizing for
>> size, is 1, for all targets.  As the result, callee-saved registers
>> may not be used to preserve local variable values across calls on some
>> targets, like x86.  Add a target hook for the callee-saved register cost
>> scale in epilogue and prologue used by IRA.  The default version of this
>> target hook returns 1 if optimizing for size, otherwise returns the entry
>> block frequency.  Add an x86 version of this target hook to restore the
>> old behavior prior to the above commit.
>>
>> PR rtl-optimization/111673
>> PR rtl-optimization/115932
>> PR rtl-optimization/116028
>> PR rtl-optimization/117081
>> PR rtl-optimization/117082
>> PR rtl-optimization/118497
>> * ira-color.cc (assign_hard_reg): Call the target hook for the
>> callee-saved register cost scale in epilogue and prologue.
>> * target.def (ira_callee_saved_register_cost_scale): New target
>> hook.
>> * targhooks.cc (default_ira_callee_saved_register_cost_scale):
>> New.
>> * targhooks.h (default_ira_callee_saved_register_cost_scale):
>> Likewise.
>> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
>> New.
>> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
>> * doc/tm.texi: Regenerated.
>> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
>> New.
>>
>> Signed-off-by: H.J. Lu 
>> ---
>>  gcc/config/i386/i386.cc | 11 +++
>>  gcc/doc/tm.texi |  8 
>>  gcc/doc/tm.texi.in  |  2 ++
>>  gcc/ira-color.cc|  3 +--
>>  gcc/target.def  | 12 
>>  gcc/targhooks.cc|  8 
>>  gcc/targhooks.h |  1 +
>>  7 files changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> index f89201684a8..3128973ba79 100644
>> --- a/gcc/config/i386/i386.cc
>> +++ b/gcc/config/i386/i386.cc
>> @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
>>return false;
>>  }
>>
>> +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
>> +
>> +static int
>> +ix86_ira_callee_saved_register_cost_scale (int)
>> +{
>> +  return 1;
>> +}
>> +
>>  /* Return true if a set of DST by the expression SRC should be allowed.
>> This prevents complex sets of likely_spilled hard regs before split1.  */
>>
>> @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
>>  #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
>> ix86_preferred_output_reload_class
>>  #undef TARGET_CLASS_LIKELY_SPILLED_P
>>  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
>> +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
>> +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
>> +  ix86_ira_callee_saved_register_cost_scale
>>
>>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 0de24eda6f0..9f42913a4ef 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for 
>> given pseudo from
>>The default version of this target hook always returns given class.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
>> (int @var{hard_regno})
>> +A target hook which returns the callee-saved register @var{hard_regno}
>> +cost scale in epilogue and prologue used by IRA.
>> +
>> +The default version of this target hook returns 1 if optimizing for
>> +size, otherwise returns the entry block frequency.
>> +@end deftypefn
>> +
>>  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
>>  A target hook which returns true if we use LRA instead of reload pass.
>>
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 631d04131e3..6dbe22581ca 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -2388,6 +2388,8 @@ in the reload pass.
>>
>>  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
>>
>> +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
>> +
>>  @hook TARGET_LRA_P
>>
>>  @hook TARGET_REGISTER_PRIORITY
>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
>> index 0699b349a1a..233060e1587 100644
>> --- a/gcc/ira-color.cc
>> +++ b/gcc/ira-color.cc
>> @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>>  + ira_memory_move_cost[mode][rclass][1])
>> * saved_nregs / hard_regno_nregs (hard_r

[PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

2025-02-03 Thread Richard Biener
The first argument is supposed to be a type, not a decl.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK?

PR c++/79786
gcc/cp/
* rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.
---
 gcc/cp/rtti.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index 2dfc2e3d7c5..dcf84f17163 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl)
   /* Avoid targets optionally bumping up the alignment to improve
 vector instruction accesses, tinfo are never accessed this way.  */
 #ifdef DATA_ABI_ALIGNMENT
-  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE 
(decl;
+  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl),
+   TYPE_ALIGN (TREE_TYPE (decl;
   DECL_USER_ALIGN (decl) = true;
 #endif
   return true;
-- 
2.43.0


Re: [PATCH 0/61] Improve Mips target

2025-02-03 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic
>  wrote:
>>
>> This patch series improves the support for the mips64r6 target in GCC,
>> includes the enhancements to the general bug fixes and contains other
>> MIPS ISA and processor enablement.
>>
>> These patches are cherry-picked from the mips_rel/11_2_0/master
>> and mips_rel/9_3_0/master branches from the MIPS' repository:
>> https://github.com/MIPS/gcc .
>> Further details on the individual changes are included in the
>> respective patches.
>
> Please split up this series at least into patches that solely affect mips/
> and send patches that touch middle-end parts separately.  A 61 patches
> series is unlikely to be looked at this way.

Sorry to ask, but what about the copyright assignment/DCO side of things?
Is it ok to assume that all these patches are covered by MTI's copyright
assignment with the FSF, even though MTI didn't submit the patches
themselves?  (Genuine question, not trying to imply a particular answer.)

Thanks,
Richard


Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]

2025-02-03 Thread Richard Sandiford
Steve Kargl  writes:
> On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote:
>> Am 01.02.25 um 21:03 schrieb Steve Kargl:
>> > On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote:
>> > > 
>> > > the attached patch downgrades different constant character lengths in an
>> > > array constructor from a GNU to a legacy extension, so that users get a
>> > > warning with -std=gnu.  We continue to generate an error when standard
>> > > conformance is requested.
>> > > 
>> > > Regtested on x86_64-pc-linux-gnu (found one testcase where this
>> > > triggered... :)
>> > > 
>> > > OK for mainline?
>> > > 
>> > 
>> > My vote is 'no'.
>> > 
>> > This is either a GNU extension or an error.  It is certainly
>> > not a legacy issue as array constructors simple cannot appear
>> > old moldy *legacy* codes.
>> 
>> legacy /= moldy.
>> 
>> My intention is to downgrade existing, potentially dangerous
>> GNU extensions (like this one) carefully to "legacy", but not
>> with an axe.
>> 
>> > I would be in favor of making it a hard error.  If you believe
>> > gfortan must be able to compile invalid source, then add an option
>> > such as -fallow-invalid-scalar-character-entities-in-array-constructor.
>> 
>> I don't see why we shall scare users by making code that is currently
>> accepted silently, because it is a GNU extension, suddenly to a hard
>> error.
>> 
>> So why must we be so tough?
>> 
>
> Because -std=legacy allows a whole bunch of garbage.
>
> Instead of fixing broken code, a user will slap -std=legacy
> in a Makefile and move on.  Then years from now, you'll see
> -std=legacy in a whole bunch of Makefiles whether it is needed
> or not.  See -maligned-double and -fallow-argument-mismatch as
> poster children.

I agree that this is what will happen.  But for people running benchmarks,
it's kind-of (kind-of) a feature.  Benchmarks tend to include relatively
old code by the time that they're released, and benchmarks continue to be
relevant (or at least widely tested) after they're out of maintenance.

So it has been really useful to have -std=legacy accept old, dangerous code,
since it means that we can continue to test old benchmarks with newer
compilers.  Improving the benchmark source to avoid the dangerous constructs
would invalidate the test and make it harder to compare with historical
results.

> Again, just my $0.02.

Same here, just wanted to raise the benchmark use case.

Thanks,
Richard


Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Jonathan Wakely
On Mon, 3 Feb 2025 at 10:27, Marc Poulhiès wrote:
>
> I usually look at the queue a few times a day (working day)... So at least in 
> my case, I may not be very active during the weekends (even less so this 
> weekend)...
> As for unlocking too-big patches, I happen to accept the ones that are 
> "close" to the limit. I think I asked last year about the big translation 
> patches and someone (Jospeh IIRC) told me that it was ok to accept them. 
> Should I be more strict and reject anything above the limit?

I think if it's a real patch, not spam, then it's OK to accept it. The
limit is there partly to stop spam with large PDF/docx/exe attachments
that we never want on the lists. The fact that the limits might also
make people think twice before sending half a megabyte of text to
hundreds of people's inbox is a useful secondary effect IMHO :-)

Very, very few people who receive 500kB of generated code or testcases
are actually going to review all of that.

On the other hand, if patchwork and the automated CI can't handle
compressed attachments (can they?) then gzipping things causes other
problems.


Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Jakub Jelinek
On Mon, Feb 03, 2025 at 10:55:10AM +, Jonathan Wakely via Gcc wrote:
> On Mon, 3 Feb 2025 at 10:27, Marc Poulhiès wrote:
> >
> > I usually look at the queue a few times a day (working day)... So at least 
> > in my case, I may not be very active during the weekends (even less so this 
> > weekend)...
> > As for unlocking too-big patches, I happen to accept the ones that are 
> > "close" to the limit. I think I asked last year about the big translation 
> > patches and someone (Jospeh IIRC) told me that it was ok to accept them. 
> > Should I be more strict and reject anything above the limit?
> 
> I think if it's a real patch, not spam, then it's OK to accept it. The

And if the sender has not tried to send it (almost) immediately split up
as a patch series or gzipped etc.  In that case letting the large patch
through would be just waste of bandwidth.

Jakub



Re: [PATCH] arm: testsuite: Adapt mve-vabs.c to improved codegen

2025-02-03 Thread Christophe Lyon
On Sun, 2 Feb 2025 at 21:18, Thiago Jung Bauermann
 wrote:
>
> Since commit r15-491-gc290e6a0b7a9de this failure happens on on
> armv8l-linux-gnueabihf and arm-eabi:
>
> Running gcc:gcc.target/arm/simd/simd.exp ...
> gcc.target/arm/simd/mve-vabs.c: memmove found 0 times
> FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3
>
> In PR PR target/116010, Andrew Pinski noted that
> "gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict
> instead of memmove. That should be a simple fix there."
>
> Therefore change the test to expect memcpy rather than memmove.
>
> Another change is that memcpy is inlined rather than called, so also change
> the test to check the optimized tree dump rather than the generated
> assembly.
>
> Tested on armv8l-linux-gnueabihf and arm-eabi.
>

LGTM, thanks.

Christophe

> gcc/testsuite/ChangeLog:
> PR target/116010
> * gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new
> code.
>
> Suggested-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vabs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
> index f2f9ee349906..e85d0b18ee71 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
> @@ -1,7 +1,7 @@
>  /* { dg-do assemble } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
> -/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> +/* { dg-additional-options "-O3 -funsafe-math-optimizations 
> -fdump-tree-optimized" } */
>
>  #include 
>  #include 
> @@ -35,10 +35,10 @@ FUNC_FLOAT(f, float, 32, 4, vabs)
>  FUNC(f, float, 16, 8, vabs)
>
>  /* Taking the absolute value of an unsigned value is a no-op, so half of the
> -   integer optimizations actually generate a call to memmove, the other ones 
> a
> +   integer optimizations actually generate a call to memcpy, the other ones a
> 'vabs'.  */
>  /* { dg-final { scan-assembler-times {vabs.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
>  /* { dg-final { scan-assembler-times {vabs.f[0-9]+\tq[0-9]+, q[0-9]+} 2 } } 
> */
>  /* { dg-final { scan-assembler-times {vldr[bhw].[0-9]+\tq[0-9]+} 5 } } */
>  /* { dg-final { scan-assembler-times {vstr[bhw].[0-9]+\tq[0-9]+} 5 } } */
> -/* { dg-final { scan-assembler-times {memmove} 3 } } */
> +/* { dg-final { scan-tree-dump-times "memcpy" 3 "optimized" } } */


Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

2025-02-03 Thread Richard Sandiford
Jeff Law  writes:
>>> Focusing on this insn:
>>>
 (insn 77 75 80 6 (parallel [
  (set (reg:DI 75 [ _32 ])
  (plus:DI (reg:DI 73 [ _31 ])
  (subreg:DI (reg/v:SI 41 [ __n ]) 0)))
  (clobber (scratch:SI))
  ]) "j.C":50:38 discrim 1 155 {adddi3}
   (expr_list:REG_DEAD (reg:DI 73 [ _31 ])
  (expr_list:REG_DEAD (reg/v:SI 41 [ __n ])
  (nil
>>>
>>> Not surprisingly we're focused on the subreg expression in there.
>>>
>>> The first checkpoint in my mind is IRA's allocation where we assign it
>>> to reg 0.
>>>
>>>
Popping a0(r41,l0)  -- assign reg 0
>>>
>>>
>>> So given the use inside a paradoxical subreg, do we consider this valid?
>>>
>>> After the discussion from last week, I'm leaning a bit more towards no
>>> than before.
>> 
>> I thought it wasn't valid.  AIUI, there are two mechanisms that try
>> to prevent it:
>> 
>> - valid_mode_changes_for_regno, which says which hard registers can
>>form all subregs required by a pseudo.  This is only used to restrict
>>class choices though, rather than forbid individual registers.
>> 
>> - This code in ira_build_conflicts:
>> 
>>/* Now we deal with paradoxical subreg cases where certain registers
>>   cannot be accessed in the widest mode.  */
>>machine_mode outer_mode = ALLOCNO_WMODE (a);
>>machine_mode inner_mode = ALLOCNO_MODE (a);
>>if (paradoxical_subreg_p (outer_mode, inner_mode))
>>  {
>>enum reg_class aclass = ALLOCNO_CLASS (a);
>>for (int j = ira_class_hard_regs_num[aclass] - 1; j >= 0; --j)
>>  {
>> int inner_regno = ira_class_hard_regs[aclass][j];
>> int outer_regno = simplify_subreg_regno (inner_regno,
>>  inner_mode, 0,
>>  outer_mode);
>> if (outer_regno < 0
>> || !in_hard_reg_set_p (reg_class_contents[aclass],
>>outer_mode, outer_regno))
>>   {
>> SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
>>   inner_regno);
>> SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj),
>>   inner_regno);
>>   }
>>  }
>>  }
>> 
>>which operates at the level of individual registers.
>> 
>> So yeah, I think the first question is why ira_build_conflicts isn't
>> kicking in for this register or (if it is) why we still get register 0.
> So pulling on this thread leads me into the code that sets up 
> ALLOCNO_WMODE in create_insn_allocnos:
>
>>   if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
>> {
>>   a = ira_create_allocno (regno, false, ira_curr_loop_tree_node);
>>   if (outer != NULL && GET_CODE (outer) == SUBREG)
>> {
>>   machine_mode wmode = GET_MODE (outer);
>>   if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
>> ALLOCNO_WMODE (a) = wmode;
>> }
>> }
> Note how we only set ALLOCNO_MODE only at allocno creation, so it'll 
> work as intended if and only if the first reference is via a SUBREG.

Huh, yeah, I agree that that looks wrong.

> ISTM the fix here is to always do the check and set ALLOCNO_WMODE.
>
> The other bug I see is that we may potentially have paradoxicals in 
> different modes.  ie, on a 32 bit target, we could in theory have a 
> paradoxical in DI and another in TI.  So in addition to pulling that 
> code out of the conditional so that it executes every time, the 
> assignment would look like
>
> if (partial_subreg_p (ALLCONO_WMODE (a), wmode)
>  && wmode > ALLOCNO_WMODE (a))
>ALLOCNO_WMODE (a) = wmode;
>
> Or something along those lines.

Not sure about this part though.  The construct:

  if (partial_subreg_p (ALLCONO_WMODE (a), wmode))
ALLOCNO_WMODE (a) = wmode;

is effectively:

  ALLOCNO_WMODE (a) = MAX_SIZE (ALLOCNO_WMODE (a), wmode);

and so already picks the single widest mode, if there is one.
For things like DI vs DF, it will use the existing mode as a tie-breaker.

So ISTM that moving the code out of the "if (... == NULL)" should be
enough on its own.

> And it all makes sense that you caught this.  You and another colleague 
> at ARM were trying to address this exact problem ~11 years ago ;-)

Heh, thought it sounded familiar :)

Richard


Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-03 Thread Richard Biener
On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu  wrote:
>
> On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
>  wrote:
> >
> >
> >
> > > Am 02.02.2025 um 08:59 schrieb H.J. Lu :
> > >
> > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > >  wrote:
> > >>
> > >>
> > >>
> >  Am 02.02.2025 um 08:00 schrieb H.J. Lu :
> > >>>
> > >>> Don't increase callee-saved register cost by 1000x, which leads to that
> > >>> callee-saved registers aren't used to preserve local variable values
> > >>> across calls, by capping the scale to 300.
> > >>
> > >>>   PR rtl-optimization/111673
> > >>>   PR rtl-optimization/115932
> > >>>   PR rtl-optimization/116028
> > >>>   PR rtl-optimization/117081
> > >>>   PR rtl-optimization/118497
> > >>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > >>>   scale to 300.
> > >>>
> > >>> Signed-off-by: H.J. Lu 
> > >>> ---
> > >>> gcc/ira-color.cc | 16 ++--
> > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > >>> index 0699b349a1a..707ff188250 100644
> > >>> --- a/gcc/ira-color.cc
> > >>> +++ b/gcc/ira-color.cc
> > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> > >>> /* We need to save/restore the hard register in
> > >>>epilogue/prologue.  Therefore we increase the cost.  */
> > >>> {
> > >>> +int scale;
> > >>> +if (optimize_size)
> > >>> +  scale = 1;
> > >>> +else
> > >>> +  {
> > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > >>> +/* Don't increase callee-saved register cost by 1000x,
> > >>> +   which leads to that callee-saved registers aren't
> > >>> +   used to preserve local variable values across calls,
> > >>> +   by capping the scale to 300.  */
> > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > >>> +  scale = 300;
> > >>
> > >> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have 
> > >> expected to scale this down to [0, 300] or is MAX a magic value?
> > >
> > > There are
> > >
> > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > >   This constant does not need to be high, as in infrequently executed
> > >   regions we want to count instructions equivalently to optimize for
> > >   size instead of speed.  */
> > > #define REG_FREQ_MAX 1000
> > >
> > > /* Compute register frequency from the BB frequency.  When optimizing for 
> > > size,
> > >   or profile driven feedback is available and the function is never 
> > > executed,
> > >   frequency is always equivalent.  Otherwise rescale the basic block
> > >   frequency.  */
> > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)
> > > \
> > >   || !cfun->cfg->count_max.initialized_p ())  
> > >\
> > >  ? REG_FREQ_MAX   
> > >\
> > >  : ((bb)->count.to_frequency (cfun)   
> > >\
> > >* REG_FREQ_MAX / BB_FREQ_MAX)  
> > >\
> > >  ? ((bb)->count.to_frequency (cfun)   
> > >\
> > > * REG_FREQ_MAX / BB_FREQ_MAX) 
> > >\
> > >  : 1)
> > >
> > > 1000 is the default.  If it isn't 1000, it isn't the default.  I only want
> > > to get a more reasonable default scale, instead of 1000.   Lower
> > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> >
> > I see.  Why not adjust the above macro then?  That would be a bit more 
> > obvious.  Like use MAX/2 or so?
>
> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> Author: Surya Kumari Jangala 
> Date:   Tue Jun 25 08:37:49 2024 -0500
>
> ira: Scale save/restore costs of callee save registers with block 
> frequency
>
> uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
> I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> not as a cost scale.  Maybe the above commit should be reverted and we add
> a target hook for callee-saved register cost scale.  Each target can choose
> a proper cost scale, install of increasing the cost by 1000x for everyone.

I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
as it doesn't seem to be used.  The comment talks about profile feedback,
but for example with -fprofile-correction or -fpartial-profile this
test looks odd.
In fact optimize_function_for_size_p should already handle this correctly.

Also REG_FREQ_FROM_BB simply documents that in this case the
frequency will be equivalent for all BBs and not any particular value.  The new
use might indeed not have the same constraints as others, instead of
a target hook making the "same value" another macro argument might be
a good first step.

That said - does removing the || !cfun->cfg->count

[PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1

2025-02-03 Thread Richard Biener
The following checks we have a scalar int shift mode before
enforcing it.  As AVR shows the mode can be a signed _Accum mode
as well.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR rtl-optimization/117611
* combine.cc (simplify_shift_const_1): Bail if not
scalar int mode.

* gcc.target/avr/pr117611.c: New testcase.
---
 gcc/combine.cc  | 6 --
 gcc/testsuite/gcc.target/avr/pr117611.c | 7 +++
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/avr/pr117611.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 90828108ba4..3beeb514b81 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10635,8 +10635,10 @@ simplify_shift_const_1 (enum rtx_code code, 
machine_mode result_mode,
 outer_op, outer_const);
}
 
-  scalar_int_mode shift_unit_mode
-   = as_a  (GET_MODE_INNER (shift_mode));
+  scalar_int_mode shift_unit_mode;
+  if (!is_a  (GET_MODE_INNER (shift_mode),
+  &shift_unit_mode))
+   return NULL_RTX;
 
   /* Handle cases where the count is greater than the size of the mode
 minus 1.  For ASHIFT, use the size minus one as the count (this can
diff --git a/gcc/testsuite/gcc.target/avr/pr117611.c 
b/gcc/testsuite/gcc.target/avr/pr117611.c
new file mode 100644
index 000..c76093f12d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr117611.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+_Accum acc1 (_Accum x)
+{
+return x << 16;
+}
-- 
2.43.0


Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]

2025-02-03 Thread Jerry D

On 2/3/25 2:49 AM, Richard Sandiford wrote:

Steve Kargl  writes:

On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote:

Am 01.02.25 um 21:03 schrieb Steve Kargl:

On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote:


the attached patch downgrades different constant character lengths in an
array constructor from a GNU to a legacy extension, so that users get a
warning with -std=gnu.  We continue to generate an error when standard
conformance is requested.

Regtested on x86_64-pc-linux-gnu (found one testcase where this
triggered... :)

OK for mainline?



My vote is 'no'.

This is either a GNU extension or an error.  It is certainly
not a legacy issue as array constructors simple cannot appear
old moldy *legacy* codes.


legacy /= moldy.

My intention is to downgrade existing, potentially dangerous
GNU extensions (like this one) carefully to "legacy", but not
with an axe.


I would be in favor of making it a hard error.  If you believe
gfortan must be able to compile invalid source, then add an option
such as -fallow-invalid-scalar-character-entities-in-array-constructor.


I don't see why we shall scare users by making code that is currently
accepted silently, because it is a GNU extension, suddenly to a hard
error.

So why must we be so tough?



Because -std=legacy allows a whole bunch of garbage.

Instead of fixing broken code, a user will slap -std=legacy
in a Makefile and move on.  Then years from now, you'll see
-std=legacy in a whole bunch of Makefiles whether it is needed
or not.  See -maligned-double and -fallow-argument-mismatch as
poster children.


I agree that this is what will happen.  But for people running benchmarks,
it's kind-of (kind-of) a feature.  Benchmarks tend to include relatively
old code by the time that they're released, and benchmarks continue to be
relevant (or at least widely tested) after they're out of maintenance.

So it has been really useful to have -std=legacy accept old, dangerous code,
since it means that we can continue to test old benchmarks with newer
compilers.  Improving the benchmark source to avoid the dangerous constructs
would invalidate the test and make it harder to compare with historical
results.


Again, just my $0.02.


Same here, just wanted to raise the benchmark use case.

Thanks,
Richard


I think we have had good discussion and for sake of the good of the 
order I recommend we push this for now.  The work has been done.


Regards,

Jerry


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 6:29 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu  wrote:
> >>
> >> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> >> Author: Surya Kumari Jangala 
> >> Date:   Tue Jun 25 08:37:49 2024 -0500
> >>
> >> ira: Scale save/restore costs of callee save registers with block 
> >> frequency
> >>
> >> scales the cost of saving/restoring a callee-save hard register in epilogue
> >> and prologue with the entry block frequency, which, if not optimizing for
> >> size, is 1, for all targets.  As the result, callee-saved registers
> >> may not be used to preserve local variable values across calls on some
> >> targets, like x86.  Add a target hook for the callee-saved register cost
> >> scale in epilogue and prologue used by IRA.  The default version of this
> >> target hook returns 1 if optimizing for size, otherwise returns the entry
> >> block frequency.  Add an x86 version of this target hook to restore the
> >> old behavior prior to the above commit.
> >>
> >> PR rtl-optimization/111673
> >> PR rtl-optimization/115932
> >> PR rtl-optimization/116028
> >> PR rtl-optimization/117081
> >> PR rtl-optimization/117082
> >> PR rtl-optimization/118497
> >> * ira-color.cc (assign_hard_reg): Call the target hook for the
> >> callee-saved register cost scale in epilogue and prologue.
> >> * target.def (ira_callee_saved_register_cost_scale): New target
> >> hook.
> >> * targhooks.cc (default_ira_callee_saved_register_cost_scale):
> >> New.
> >> * targhooks.h (default_ira_callee_saved_register_cost_scale):
> >> Likewise.
> >> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> >> New.
> >> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
> >> * doc/tm.texi: Regenerated.
> >> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
> >> New.
> >>
> >> Signed-off-by: H.J. Lu 
> >> ---
> >>  gcc/config/i386/i386.cc | 11 +++
> >>  gcc/doc/tm.texi |  8 
> >>  gcc/doc/tm.texi.in  |  2 ++
> >>  gcc/ira-color.cc|  3 +--
> >>  gcc/target.def  | 12 
> >>  gcc/targhooks.cc|  8 
> >>  gcc/targhooks.h |  1 +
> >>  7 files changed, 43 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> index f89201684a8..3128973ba79 100644
> >> --- a/gcc/config/i386/i386.cc
> >> +++ b/gcc/config/i386/i386.cc
> >> @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
> >>return false;
> >>  }
> >>
> >> +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> >> +
> >> +static int
> >> +ix86_ira_callee_saved_register_cost_scale (int)
> >> +{
> >> +  return 1;
> >> +}
> >> +
> >>  /* Return true if a set of DST by the expression SRC should be allowed.
> >> This prevents complex sets of likely_spilled hard regs before split1.  
> >> */
> >>
> >> @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
> >>  #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
> >> ix86_preferred_output_reload_class
> >>  #undef TARGET_CLASS_LIKELY_SPILLED_P
> >>  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> >> +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> >> +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> >> +  ix86_ira_callee_saved_register_cost_scale
> >>
> >>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
> >>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 0de24eda6f0..9f42913a4ef 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for 
> >> given pseudo from
> >>The default version of this target hook always returns given class.
> >>  @end deftypefn
> >>
> >> +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> >> (int @var{hard_regno})
> >> +A target hook which returns the callee-saved register @var{hard_regno}
> >> +cost scale in epilogue and prologue used by IRA.
> >> +
> >> +The default version of this target hook returns 1 if optimizing for
> >> +size, otherwise returns the entry block frequency.
> >> +@end deftypefn
> >> +
> >>  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
> >>  A target hook which returns true if we use LRA instead of reload pass.
> >>
> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> index 631d04131e3..6dbe22581ca 100644
> >> --- a/gcc/doc/tm.texi.in
> >> +++ b/gcc/doc/tm.texi.in
> >> @@ -2388,6 +2388,8 @@ in the reload pass.
> >>
> >>  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> >>
> >> +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> >> +
> >>  @hook TARGET_LRA_P
> >>
> >>  @hook TARGET_REGISTER_PRIORITY
> >> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> >

Re: [PATCH] RX: Restrict displacement ranges in "Q" constraint

2025-02-03 Thread Yoshinori Sato
On Thu, 30 Jan 2025 00:11:01 +0900,
Jeff Law wrote:
> 
> 
> 
> On 1/29/25 3:47 AM, Yoshinori Sato wrote:
> > When using the "Q" constraint in the inline assembler, the displacement 
> > value
> > could exceed the range specified by the instruction.
> > To avoid this issue, a displacement range check is added to the "Q" 
> > constraint.
> > 
> Thanks. I've pushed this to the trunk, even though it's not a
> regression as it's limited to the rx port and fixes a clear bug.
> 
> In the future, if you could include a testcase it'd be useful.
> 
> Thanks again,
> Jeff
> 

Thank,s
The source code that caused this problem is large, so if I can make
it smaller I'll add it as a test.

-- 
Yosinori Sato


[PATCH v1 14/16] Change target_version semantics to follow ACLE specification.

2025-02-03 Thread Alfie Richards

This changes behavior of target_clones and target_version attributes
to be inline with what is specified in the Arm C Language Extension.

Notably this changes the scope and signature of multiversioned functions
to that of the default version, and changes the resolver to be
created at the implementation of the default version.

This is achieved by changing the C++ front end to no longer resolve any
non-default version decls in lookup, and by moving dipatching
for default_target sets to reuse the dispatching logic for target_clones
in multiple_target.cc.

This also fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118313
for aarch64 and riscv.

This changes the behavior of both the aarch64, and riscv targets.

gcc/ChangeLog:

* cgraphunit.cc (analyze_functions): Add dependency from default node
to non-default versions.
* ipa.cc (symbol_table::remove_unreachable_nodes): Ditto.
* multiple_target.cc (ipa_target_clone): Change logic to conditionally
dispatch target_clones and to dispatch some target_version sets.

gcc/cp/ChangeLog:

* call.cc (add_candidates): For target_version semantics don't resolve
non-default versions.
* class.cc (resolve_address_of_overloaded_function): Ditto.
* cp-gimplify.cc (cp_genericize_r): For target_version semantics don't
redirect calls to versioned functions (done later at
multiple_target.cc.)
* decl.cc (start_decl): Mangle and mark all non-default function
decls.
(start_preparsed_function): Ditto.
* typeck.cc (cp_build_function_call_vec): Add error if target has no
default implementation.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Change for new semantics.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols2.C: Ditto.
* g++.target/riscv/mv-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols4.C: Ditto.
* g++.target/riscv/mv-symbols5.C: Ditto.
* g++.target/riscv/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols10.C: New test.
* g++.target/aarch64/mv-symbols11.C: New test.
* g++.target/aarch64/mv-symbols12.C: New test.
* g++.target/aarch64/mv-symbols14.C: New test.
* g++.target/aarch64/mv-symbols15.C: New test.
* g++.target/aarch64/mv-symbols6.C: New test.
* g++.target/aarch64/mv-symbols8.C: New test.
* g++.target/aarch64/mv-symbols9.C: New test.
---
 gcc/cgraphunit.cc |  9 
 gcc/cp/call.cc|  8 
 gcc/cp/class.cc   | 11 -
 gcc/cp/cp-gimplify.cc |  6 ++-
 gcc/cp/decl.cc| 24 ++
 gcc/cp/typeck.cc  |  8 
 gcc/ipa.cc| 11 +
 gcc/multiple_target.cc| 13 -
 gcc/testsuite/g++.target/aarch64/mv-1.C   |  4 ++
 .../g++.target/aarch64/mv-symbols10.C | 43 +
 .../g++.target/aarch64/mv-symbols11.C | 27 +++
 .../g++.target/aarch64/mv-symbols12.C | 18 +++
 .../g++.target/aarch64/mv-symbols14.C | 16 +++
 .../g++.target/aarch64/mv-symbols15.C | 16 +++
 .../g++.target/aarch64/mv-symbols2.C  | 12 ++---
 .../g++.target/aarch64/mv-symbols3.C  |  6 +--
 .../g++.target/aarch64/mv-symbols4.C  |  6 +--
 .../g++.target/aarch64/mv-symbols5.C  |  6 +--
 .../g++.target/aarch64/mv-symbols6.C  | 23 +
 .../g++.target/aarch64/mv-symbols8.C  | 48 +++
 .../g++.target/aarch64/mv-symbols9.C  | 46 ++
 .../g++.target/aarch64/mvc-symbols3.C | 12 ++---
 gcc/testsuite/g++.target/riscv/mv-symbols2.C  | 12 ++---
 gcc/testsuite/g++.target/riscv/mv-symbols3.C  |  6 +--
 gcc/testsuite/g++.target/riscv/mv-symbols4.C  |  6 +--
 gcc/testsuite/g++.target/riscv/mv-symbols5.C  |  6 +--
 gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 12 ++---
 27 files changed, 368 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols10.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols11.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols12.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols14.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols15.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols6.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols8.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols9.C

diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index 82f205488e9..f7f8957e618 100644
---

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Jeff Law




On 2/3/25 2:31 AM, H.J. Lu wrote:



IMO at this point a new target hook should preserve existing behavior by default
or alternatively the original patch should be reverted as causing
regressions and
a new patch introducing the target hook should be installed in next stage1.


I believe the original patch should be reverted.  Then my patch isn't needed.
That patch had significant improvements across the board for RISC-V.  I 
wouldn't want to see it reverted without a strong explanation of why it 
was wrong.




jeff



[PATCH v1 08/16] Add get_clone_versions function.

2025-02-03 Thread Alfie Richards

This is a reimplementation of get_target_clone_attr_len,
get_attr_str, and separate_attrs using string_slice and auto_vec to make
memory management and use simpler.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Change to use
get_clone_versions.

gcc/ChangeLog:

* tree.cc (get_clone_versions): New function.
(get_clone_attr_versions): New function.
* tree.h (get_clone_versions): New function.
(get_clone_attr_versions): New function.
---
 gcc/c-family/c-attribs.cc |  2 +-
 gcc/tree.cc   | 40 +++
 gcc/tree.h|  5 +
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index f3181e7b57c..642d724f6c6 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 	}
 	}
 
-  if (get_target_clone_attr_len (args) == -1)
+  if (get_clone_attr_versions (args).length () == 1)
 	{
 	  warning (OPT_Wattributes,
 		   "single % attribute is ignored");
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 05f679edc09..346522d01c0 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -15299,6 +15299,46 @@ get_target_clone_attr_len (tree arglist)
   return str_len_sum;
 }
 
+/* Returns an auto_vec of string_slices containing the version strings from
+   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
+
+auto_vec
+get_clone_attr_versions (const tree arglist, int *default_count)
+{
+  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
+  auto_vec versions;
+
+  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
+  string_slice separators = string_slice (separator_str);
+
+  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
+{
+  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE (arg)));
+  for (string_slice attr = string_slice::strtok (&str, separators);
+	   attr.is_valid (); attr = string_slice::strtok (&str, separators))
+	{
+	  attr = attr.strip ();
+	  if (attr == string_slice ("default") && default_count)
+	(*default_count)++;
+	  versions.safe_push (attr);
+	}
+}
+  return versions;
+}
+
+/* Returns an auto_vec of string_slices containing the version strings from
+   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for each
+   default version found.  */
+auto_vec
+get_clone_versions (const tree decl, int *default_count)
+{
+  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
+  if (!attr)
+return auto_vec ();
+  tree arglist = TREE_VALUE (attr);
+  return get_clone_attr_versions (arglist, default_count);
+}
+
 void
 tree_cc_finalize (void)
 {
diff --git a/gcc/tree.h b/gcc/tree.h
index 21f3cd5525c..aea1cf078a0 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "tree-core.h"
 #include "options.h"
+#include "vec.h"
 
 /* Convert a target-independent built-in function code to a combined_fn.  */
 
@@ -7035,5 +7036,9 @@ extern unsigned fndecl_dealloc_argno (tree);
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
 extern int get_target_clone_attr_len (tree);
+auto_vec
+get_clone_versions (const tree, int * = NULL);
+auto_vec
+get_clone_attr_versions (const tree, int * = NULL);
 
 #endif  /* GCC_TREE_H  */


Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Jeff Law




On 2/2/25 11:09 AM, Thomas Koenig wrote:

Hi,

I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
to gcc-patches also, as normal, but got back an e-mail that it
was too large. and that a moderator would look at it.

Maybe the limits can be increased a bit, sometimes patches can
be quite large, especially if they contain large test cases
or a large number of generated files.

I do think an increase in size is probably warranted.



(Does anybody actually look at the messages, as promised in the e-mail?=
I'd been doing this for a while, but at some point over the last few 
years I lost the password that allowed me to review this stuff.  After 
that it never bubbled up to get attention on my list.


jeff



[PATCH v1 16/16] Remove FMV beta warning.

2025-02-03 Thread Alfie Richards

This patch removes the warning for target_version and target_clones
in aarch64 as it is now spec compliant.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_process_target_version_attr):
Remove warning.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Remove option.
* g++.target/aarch64/mv-and-mvc1.C: Remove option.
* g++.target/aarch64/mv-and-mvc2.C: Remove option.
* g++.target/aarch64/mv-and-mvc3.C: Remove option.
* g++.target/aarch64/mv-and-mvc4.C: Remove option.
* g++.target/aarch64/mv-error1.C: Remove option.
* g++.target/aarch64/mv-error13.C: Remove option.
* g++.target/aarch64/mv-error2.C: Remove option.
* g++.target/aarch64/mv-error3.C: Remove option.
* g++.target/aarch64/mv-error7.C: Remove option.
* g++.target/aarch64/mv-error8.C: Remove option.
* g++.target/aarch64/mv-error9.C: Remove option.
* g++.target/aarch64/mv-pragma.C: Remove option.
* g++.target/aarch64/mv-symbols1.C: Remove option.
* g++.target/aarch64/mv-symbols10.C: Remove option.
* g++.target/aarch64/mv-symbols11.C: Remove option.
* g++.target/aarch64/mv-symbols12.C: Remove option.
* g++.target/aarch64/mv-symbols14.C: Remove option.
* g++.target/aarch64/mv-symbols15.C: Remove option.
* g++.target/aarch64/mv-symbols2.C: Remove option.
* g++.target/aarch64/mv-symbols3.C: Remove option.
* g++.target/aarch64/mv-symbols4.C: Remove option.
* g++.target/aarch64/mv-symbols5.C: Remove option.
* g++.target/aarch64/mv-symbols6.C: Remove option.
* g++.target/aarch64/mv-symbols8.C: Remove option.
* g++.target/aarch64/mv-symbols9.C: Remove option.
* g++.target/aarch64/mvc-symbols1.C: Remove option.
* g++.target/aarch64/mvc-symbols2.C: Remove option.
* g++.target/aarch64/mvc-symbols3.C: Remove option.
* g++.target/aarch64/mvc-symbols4.C: Remove option.
* g++.target/aarch64/mv-warning1.C: Removed.
* g++.target/aarch64/mvc-warning1.C: Removed.
---
 gcc/config/aarch64/aarch64.cc   | 9 -
 gcc/testsuite/g++.target/aarch64/mv-1.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error1.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error13.C   | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error2.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error3.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error7.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error8.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-error9.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-pragma.C| 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols1.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols10.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols11.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols12.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols14.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols15.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols2.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols3.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols4.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols5.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols6.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols8.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-symbols9.C  | 2 +-
 gcc/testsuite/g++.target/aarch64/mv-warning1.C  | 9 -
 gcc/testsuite/g++.target/aarch64/mvc-symbols1.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mvc-symbols2.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mvc-symbols3.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mvc-symbols4.C | 2 +-
 gcc/testsuite/g++.target/aarch64/mvc-warning1.C | 6 --
 33 files changed, 30 insertions(+), 54 deletions(-)
 delete mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C
 delete mode 100644 gcc/testsuite/g++.target/aarch64/mvc-warning1.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f6cb7903d88..a2c3ba8e12e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19939,15 +19939,6 @@ aarch64_parse_fmv_features (string_slice str, aarch64_feature_flags *isa_flags,
 static bool
 aarch64_process_target_version_attr (tree args)
 {
-  static bool issued_warning = false;
-  if (!issued_warning)
-{
-  warning (OPT_Wexperimental_fmv_target,
-	   "Function Multi Versioning support is experimental, and the "
-	   "behavior is likely to change");
-  issued_warning = true;
-}
-
   if (TREE_CODE (args) == TREE_LIST)
 {
   if (TREE_CHAIN (args))
diff --git a/gcc/testsuite/g++.target/aarch64/mv-1.C b/gcc/testsuite/g++.target/aarch64/mv-1.C
index 93b8a136587..4f815e18683 

Re: [PATCH v1 08/16] Add get_clone_versions function.

2025-02-03 Thread Richard Sandiford
Alfie Richards  writes:
> This is a reimplementation of get_target_clone_attr_len,
> get_attr_str, and separate_attrs using string_slice and auto_vec to make
> memory management and use simpler.
>
> gcc/c-family/ChangeLog:
>
>   * c-attribs.cc (handle_target_clones_attribute): Change to use
>   get_clone_versions.
>
> gcc/ChangeLog:
>
>   * tree.cc (get_clone_versions): New function.
>   (get_clone_attr_versions): New function.
>   * tree.h (get_clone_versions): New function.
>   (get_clone_attr_versions): New function.
> ---
>  gcc/c-family/c-attribs.cc |  2 +-
>  gcc/tree.cc   | 40 +++
>  gcc/tree.h|  5 +
>  3 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index f3181e7b57c..642d724f6c6 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, 
> tree ARG_UNUSED (args),
>   }
>   }
>  
> -  if (get_target_clone_attr_len (args) == -1)
> +  if (get_clone_attr_versions (args).length () == 1)
>   {
> warning (OPT_Wattributes,
>  "single % attribute is ignored");
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 05f679edc09..346522d01c0 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -15299,6 +15299,46 @@ get_target_clone_attr_len (tree arglist)
>return str_len_sum;
>  }
>  
> +/* Returns an auto_vec of string_slices containing the version strings from
> +   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
> +
> +auto_vec
> +get_clone_attr_versions (const tree arglist, int *default_count)
> +{
> +  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
> +  auto_vec versions;
> +
> +  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
> +  string_slice separators = string_slice (separator_str);
> +
> +  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
> +{
> +  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE 
> (arg)));
> +  for (string_slice attr = string_slice::strtok (&str, separators);
> +attr.is_valid (); attr = string_slice::strtok (&str, separators))
> + {
> +   attr = attr.strip ();
> +   if (attr == string_slice ("default") && default_count)

Do we need the explicit constructor here?  It would be nice if
attr == "default" worked.

> + (*default_count)++;
> +   versions.safe_push (attr);
> + }
> +}
> +  return versions;
> +}
> +
> +/* Returns an auto_vec of string_slices containing the version strings from
> +   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for 
> each
> +   default version found.  */
> +auto_vec
> +get_clone_versions (const tree decl, int *default_count)
> +{
> +  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
> +  if (!attr)
> +return auto_vec ();
> +  tree arglist = TREE_VALUE (attr);
> +  return get_clone_attr_versions (arglist, default_count);
> +}
> +
>  void
>  tree_cc_finalize (void)
>  {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 21f3cd5525c..aea1cf078a0 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #include "tree-core.h"
>  #include "options.h"
> +#include "vec.h"
>  
>  /* Convert a target-independent built-in function code to a combined_fn.  */
>  
> @@ -7035,5 +7036,9 @@ extern unsigned fndecl_dealloc_argno (tree);
>  extern tree get_attr_nonstring_decl (tree, tree * = NULL);
>  
>  extern int get_target_clone_attr_len (tree);
> +auto_vec
> +get_clone_versions (const tree, int * = NULL);
> +auto_vec
> +get_clone_attr_versions (const tree, int * = NULL);

Formatting nit, but: it's more usual to put declarations on a single line,
if they'd fit.

Otherwise it looks good, given that patch 13 removes the old functions.

Thanks,
Richard

>  
>  #endif  /* GCC_TREE_H  */


Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-03 Thread Jan Hubicka
> On Mon, Feb 3, 2025 at 5:21 PM Richard Biener
>  wrote:
> >
> > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu  wrote:
> > >
> > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> > >  wrote:
> > > >
> > > >
> > > >
> > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu :
> > > > >
> > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > > >  wrote:
> > > > >>
> > > > >>
> > > > >>
> > > >  Am 02.02.2025 um 08:00 schrieb H.J. Lu :
> > > > >>>
> > > > >>> Don't increase callee-saved register cost by 1000x, which leads to 
> > > > >>> that
> > > > >>> callee-saved registers aren't used to preserve local variable values
> > > > >>> across calls, by capping the scale to 300.
> > > > >>
> > > > >>>   PR rtl-optimization/111673
> > > > >>>   PR rtl-optimization/115932
> > > > >>>   PR rtl-optimization/116028
> > > > >>>   PR rtl-optimization/117081
> > > > >>>   PR rtl-optimization/118497
> > > > >>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > > > >>>   scale to 300.
> > > > >>>
> > > > >>> Signed-off-by: H.J. Lu 
> > > > >>> ---
> > > > >>> gcc/ira-color.cc | 16 ++--
> > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > > > >>>
> > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > > > >>> index 0699b349a1a..707ff188250 100644
> > > > >>> --- a/gcc/ira-color.cc
> > > > >>> +++ b/gcc/ira-color.cc
> > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool 
> > > > >>> retry_p)
> > > > >>> /* We need to save/restore the hard register in
> > > > >>>epilogue/prologue.  Therefore we increase the cost.  */
> > > > >>> {
> > > > >>> +int scale;
> > > > >>> +if (optimize_size)
> > > > >>> +  scale = 1;
> > > > >>> +else
> > > > >>> +  {
> > > > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > >>> +/* Don't increase callee-saved register cost by 1000x,
> > > > >>> +   which leads to that callee-saved registers aren't
> > > > >>> +   used to preserve local variable values across calls,
> > > > >>> +   by capping the scale to 300.  */
> > > > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > > > >>> +  scale = 300;
> > > > >>
> > > > >> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have 
> > > > >> expected to scale this down to [0, 300] or is MAX a magic value?
> > > > >
> > > > > There are
> > > > >
> > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > > >   This constant does not need to be high, as in infrequently executed
> > > > >   regions we want to count instructions equivalently to optimize for
> > > > >   size instead of speed.  */
> > > > > #define REG_FREQ_MAX 1000
> > > > >
> > > > > /* Compute register frequency from the BB frequency.  When optimizing 
> > > > > for size,
> > > > >   or profile driven feedback is available and the function is never 
> > > > > executed,
> > > > >   frequency is always equivalent.  Otherwise rescale the basic block
> > > > >   frequency.  */
> > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)
> > > > > \
> > > > >   || !cfun->cfg->count_max.initialized_p 
> > > > > ()) \
> > > > >  ? REG_FREQ_MAX   
> > > > >\
> > > > >  : ((bb)->count.to_frequency (cfun)   
> > > > >\
> > > > >* REG_FREQ_MAX / BB_FREQ_MAX)  
> > > > >\
> > > > >  ? ((bb)->count.to_frequency (cfun)   
> > > > >\
> > > > > * REG_FREQ_MAX / BB_FREQ_MAX) 
> > > > >\
> > > > >  : 1)
> > > > >
> > > > > 1000 is the default.  If it isn't 1000, it isn't the default.  I only 
> > > > > want
> > > > > to get a more reasonable default scale, instead of 1000.   Lower
> > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > > >
> > > > I see.  Why not adjust the above macro then?  That would be a bit more 
> > > > obvious.  Like use MAX/2 or so?
> > >
> > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > > Author: Surya Kumari Jangala 
> > > Date:   Tue Jun 25 08:37:49 2024 -0500
> > >
> > > ira: Scale save/restore costs of callee save registers with block 
> > > frequency
> > >
> > > uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
> > > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > > not as a cost scale.  Maybe the above commit should be reverted and we add
> > > a target hook for callee-saved register cost scale.  Each target can 
> > > choose
> > > a proper cost scale, install of increasing the cost by 1000x for everyone.
> >
> > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at 
> > least,
> > as it doesn't seem to be used.  The co

Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-03 Thread Jan Hubicka
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)  
> > > >   \
> > > >   || !cfun->cfg->count_max.initialized_p 
> > > > ()) \
> > > >  ? REG_FREQ_MAX 
> > > >  \
> > > >  : ((bb)->count.to_frequency (cfun) 
> > > >  \
> > > >* REG_FREQ_MAX / BB_FREQ_MAX)
> > > >  \
> > > >  ? ((bb)->count.to_frequency (cfun) 
> > > >  \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX)   
> > > >  \
> > > >  : 1)
> > > >
> > > > 1000 is the default.  If it isn't 1000, it isn't the default.  I only 
> > > > want
> > > > to get a more reasonable default scale, instead of 1000.   Lower
> > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > >
> > > I see.  Why not adjust the above macro then?  That would be a bit more 
> > > obvious.  Like use MAX/2 or so?
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala 
> > Date:   Tue Jun 25 08:37:49 2024 -0500
> >
> > ira: Scale save/restore costs of callee save registers with block 
> > frequency
> >
> > uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
> > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > not as a cost scale.  Maybe the above commit should be reverted and we add
> > a target hook for callee-saved register cost scale.  Each target can choose
> > a proper cost scale, install of increasing the cost by 1000x for everyone.
> 
> I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> as it doesn't seem to be used.  The comment talks about profile feedback,

It is used by count.to_frequency, which basically computes count/max_count * 
REG_FREQ_MAX.
It aborts if max_count is uninitialized rather than returning arbitrary
value...

Honza


[PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-03 Thread Tamar Christina
Hi All,

This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.

This patch does add two new restrictions:

1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
   group sizes, as they are unaligned every n % 2 iterations and so may cross
   a page unwittingly.

2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
   we cannot peel for alignment, as the alignment requirement is quite large at
   GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
   don't support it for now.

There are other steps documented inside the code itself so that the reasoning
is next to the code.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
vectorizable_load.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_peeling_alignment): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
* g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.

---
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in 
a basic block.
 Work bound when discovering transitive relations from existing relations.
 
 @item min-pagesize
-Minimum page size for warning purposes.
+Minimum page size for warning and early break vectorization purposes.
 
 @item openacc-kernels
 Specify mode of OpenACC `kernels' constructs handling.
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc 
b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
new file mode 100644
index 
..4b859488d533bf3ba5d0e0bcf8779d9b024b2596
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+typedef decltype(sizeof(0)) size_t;
+struct ts1 {
+  int spans[6][2];
+};
+struct gg {
+  int t[6];
+};
+ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
+  ts1 ret;
+  for (size_t i = 0; i != t; i++) {
+if (!(i < t)) __builtin_abort();
+ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
+ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
+  }
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 
9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554
 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -55,7 +55,9 @@ int main()
  }
 }
   rephase ();
+#pragma GCC novector
   for (i = 0; i < 32; ++i)
+#pragma GCC novector
 for (j = 0; j < 3; ++j)
 #pragma GCC novector
   for (k = 0; k < 3; ++k)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
index 
423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b5b252b716a8ba93a

RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog guard [PR117790]

2025-02-03 Thread Tamar Christina
Ping

> -Original Message-
> From: Tamar Christina
> Sent: Friday, January 24, 2025 9:18 AM
> To: Alex Coplan ; gcc-patches@gcc.gnu.org
> Cc: Richard Biener ; Jan Hubicka 
> Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog
> guard [PR117790]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Wednesday, January 15, 2025 2:08 PM
> > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Jan Hubicka 
> > Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog
> > guard [PR117790]
> >
> > Ping
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: Monday, January 6, 2025 11:35 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener ; Jan Hubicka ;
> Tamar
> > > Christina 
> > > Subject: [PATCH 3/4] vect: Ensure profile consistency when adding epilog 
> > > guard
> > > [PR117790]
> > >
> > > This patch tries to make the CFG profile consistent when adding a guard
> > > edge to skip the epilog during peeling.
> > >
> > > The changes can be summarized as follows:
> > >  - We avoid adding the guard edge entirely if the guard condition folds
> > >to false, otherwise the profile will become inconsistent since
> > >the cfgcleanup code doesn't attempt to update it on removing the dead
> > >edge.
> > >  - If the guard condition instead folds to true, we account for this by
> > >giving the skip edge 100% probability (otherwise the profile will
> > >again become inconsistent when removing the other now-dead edge).
> > >  - Finally, we use the new helper scale_loop_freqs_with_new_exit_count
> instead
> > >of scale_loop_profile to update the epilog frequencies / probabiltiies.
> > >We make the assumption here that if the IV exit is taken in the vector 
> > > loop,
> > >then it will also be taken in the epilog (and not an early exit).  
> > > Since we
> > >add the guard to the vector iv exit, we know any reduction in count
> > >associated with the epilog skip should be accounted for by a reduction 
> > > in the
> > >epilog's iv exit edge count.
> > >
> > > Bootstrapped/regtested as a series on aarch64-linux-gnu, 
> > > arm-linux-gnueabihf,
> > > and x86_64-linux-gnu.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * tree-vect-loop-manip.cc (vect_do_peeling): Attempt to maintain
> > >   consistency of the CFG profile when adding an epilog skip edge.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * gcc.dg/vect/vect-early-break-profile-1.c: New test.
> > > ---
> > >  .../gcc.dg/vect/vect-early-break-profile-1.c  | 10 
> > >  gcc/tree-vect-loop-manip.cc   | 48 ++-
> > >  2 files changed, 47 insertions(+), 11 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c



RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]

2025-02-03 Thread Tamar Christina
Ping

> -Original Message-
> From: Tamar Christina
> Sent: Friday, January 24, 2025 9:17 AM
> To: Alex Coplan ; 'gcc-patches@gcc.gnu.org'  patc...@gcc.gnu.org>
> Cc: 'Richard Biener' ; 'Jan Hubicka' 
> Subject: RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly
> [PR117790]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Wednesday, January 15, 2025 2:07 PM
> > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Jan Hubicka 
> > Subject: RE: [PATCH 1/4] vect: Set counts of early break exit blocks 
> > correctly
> > [PR117790]
> >
> > Ping
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: Monday, January 6, 2025 11:34 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener ; Jan Hubicka ;
> Tamar
> > > Christina 
> > > Subject: [PATCH 1/4] vect: Set counts of early break exit blocks correctly
> > > [PR117790]
> > >
> > > This adds missing code to correctly set the counts of the exit blocks we
> > > create when building the CFG for a vectorized early break loop.
> > >
> > > Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and
> > > x86_64-linux-gnu.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > >   Set profile counts for {main,alt}_loop_exit_block.
> > > ---
> > >  gcc/tree-vect-loop-manip.cc | 10 ++
> > >  1 file changed, 10 insertions(+)



RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-02-03 Thread Tamar Christina
Ping

> -Original Message-
> From: Tamar Christina
> Sent: Friday, January 24, 2025 9:18 AM
> To: Alex Coplan ; gcc-patches@gcc.gnu.org
> Cc: Richard Biener ; Jan Hubicka 
> Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> multi-exit
> loops [PR117790]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Wednesday, January 15, 2025 2:08 PM
> > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Jan Hubicka 
> > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> > multi-
> exit
> > loops [PR117790]
> >
> > Ping
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: Monday, January 6, 2025 11:35 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener ; Jan Hubicka ;
> Tamar
> > > Christina 
> > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> > > multi-exit
> > > loops [PR117790]
> > >
> > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > multiple exits.  In particular, in the case where the expected niters
> > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > number of iterations with a call to scale_loop_frequencies, which
> > > multiplies the count of each BB by a given probability.  This
> > > transformation preserves the relationships between the counts of the BBs
> > > within the loop (and thus the edge probabilities stay the same) but this
> > > cannot possibly work for loops with multiple exits, since in order for
> > > the expected niters to reduce (and counts along exit edges to remain the
> > > same), the exit edge probabilities must increase, thus decreasing the
> > > probabilities of the internal edges, meaning that the ratios of the
> > > counts of the BBs inside the loop must change.  So we need a different
> > > approach (not a straightforward multiplicative scaling) to adjust the
> > > expected niters of a loop with multiple exits.
> > >
> > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > used to correctly scale the profile of a loop with multiple exits.  It
> > > is parameterized by a probability (with which to scale the header and
> > > therefore the expected niters) and a lambda which gives the desired
> > > counts for the exit edges.  In this patch, to make things simpler,
> > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > whether a given loop meets these criteria.  This restriction is
> > > reasonable since this patch is motivated by fixing the profile
> > > consistency for early break vectorization, and we don't currently
> > > vectorize loops with internal control flow.  We also fall back to a
> > > multiplicative scaling (the status quo) for loops that
> > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > improvement.
> > >
> > > We wrap the call to flow_scale_loop_freqs in a helper
> > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > allow overriding the desired exit edge counts.  We specialize this with
> > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > counts along exit edges fixed), which is then used to implement the
> > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > the function for loops with multiple exits.
> > >
> > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > is a variant of scale_loop_profile which assumes we want to keep the
> > > counts along exit edges of the loop fixed through both parts of the
> > > transformation (including the initial probability scale).
> > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > subsequent patch when adding a skip edge around the epilog, where the
> > > reduction of count entering the loop is mirrored by a reduced count
> > > along a given exit edge.
> > >
> > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > >   (flow_scale_loop_freqs): New.
> > >   (scale_loop_freqs_with_exit_counts): New.
> > >   (scale_loop_freqs_hold_exit_counts): New.
> > >   (scale_loop_profile): Refactor to use the newly-added
> > >   scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > >   correctly handle reducing the expected niters for loops with multiple
> > >   exits.
> > >   (scale_loop_freqs_with_new_exit_count): New.
> > >   (scale_loop_profile_1): New.
> > >   (scale_loop_profile_hold_exit_counts): New.
> > >   * cfgloopmanip.h (scale_loop_profile

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2025, H.J. Lu wrote:

> Author: Surya Kumari Jangala 
> Date:   Tue Jun 25 08:37:49 2024 -0500
> 
> ira: Scale save/restore costs of callee save registers with block 
> frequency
> 
> scales the cost of saving/restoring a callee-save hard register in epilogue
> and prologue with the entry block frequency, which, if not optimizing for
> size, is 1, for all targets.

This merely represents the fact that the entry block is indeed entered 
exactly once per function invocation, i.e. 1.0 in fixed point with a scale 
of 1000.  All costs in ira are (supposed to be) scaled by bb-frequency of 
the allocno/register occurence, and hence this add_cost to cater for 
xlogue-save/restore needs to be scaled by that as well, which is what 
Suryas patch was adding.

Any fallout from that needs to be addressed on top of that, not by 
reverting it, or by introducing a hook to avoid that.  Think of this scale 
as an arbitrary value to implement pseudo-fixed-point arithmetic for 
costs.  All values need to be scaled by it.  That its value is a seemingly 
large number of 1000 is not the worry, it represents 1.0 .

If the issue is for instance that callee-saved registers aren't used 
because the prologue save/restore is now deemed too expensive relative to 
the around-call-save-restore when a call-clobbered register is used, then 
either the around-call-save-restore instructions aren't correctly costed 
(perhaps also missing the scale factor?), or because ties aren't broken 
nicely, in which case adding a 1 at one or the other place might be 
needed.


Ciao,
Michael.


Re: [PATCH 0/61] Improve Mips target

2025-02-03 Thread Richard Biener
On Mon, Feb 3, 2025 at 11:34 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic
> >  wrote:
> >>
> >> This patch series improves the support for the mips64r6 target in GCC,
> >> includes the enhancements to the general bug fixes and contains other
> >> MIPS ISA and processor enablement.
> >>
> >> These patches are cherry-picked from the mips_rel/11_2_0/master
> >> and mips_rel/9_3_0/master branches from the MIPS' repository:
> >> https://github.com/MIPS/gcc .
> >> Further details on the individual changes are included in the
> >> respective patches.
> >
> > Please split up this series at least into patches that solely affect mips/
> > and send patches that touch middle-end parts separately.  A 61 patches
> > series is unlikely to be looked at this way.
>
> Sorry to ask, but what about the copyright assignment/DCO side of things?
> Is it ok to assume that all these patches are covered by MTI's copyright
> assignment with the FSF, even though MTI didn't submit the patches
> themselves?  (Genuine question, not trying to imply a particular answer.)

It's a good question since one of the Signed-off e-mails bounces...

Richard.

> Thanks,
> Richard


RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790]

2025-02-03 Thread Tamar Christina
Ping

> -Original Message-
> From: Tamar Christina
> Sent: Friday, January 24, 2025 9:18 AM
> To: Alex Coplan ; gcc-patches@gcc.gnu.org
> Cc: Richard Biener ; Jan Hubicka 
> Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple 
> exits
> [PR117790]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Wednesday, January 15, 2025 2:08 PM
> > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Jan Hubicka 
> > Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple 
> > exits
> > [PR117790]
> >
> > Ping
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: Monday, January 6, 2025 11:36 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener ; Jan Hubicka ;
> Tamar
> > > Christina 
> > > Subject: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple 
> > > exits
> > > [PR117790]
> > >
> > > This adjusts scale_profile_for_vect_loop to DTRT for loops with multiple 
> > > exits,
> > > namely using scale_loop_profile_hold_exit_counts instead and scaling the
> > > expected niters by 1 / VF.
> > >
> > > Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and
> > > x86_64-linux-gnu.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * tree-vect-loop.cc (scale_profile_for_vect_loop): Use
> > >   scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
> > >   the exit edge parameter, since the code now handles multiple exits.
> > >   Adjust the caller ...
> > >   (vect_transform_loop): ... here.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/117790
> > >   * gcc.dg/vect/vect-early-break-profile-2.c: New test.
> > > ---
> > >  .../gcc.dg/vect/vect-early-break-profile-2.c  | 21 +++
> > >  gcc/tree-vect-loop.cc | 21 ++-
> > >  2 files changed, 27 insertions(+), 15 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c



Re: [PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]

2025-02-03 Thread Marek Polacek
On Sun, Feb 02, 2025 at 11:14:55AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, this pedwarni is desirable for the implicit or
> explicit capturing of structured bindings in C++17, but in the case of
> init-captures the initializer is just some expression and that can include
> structured bindings.
> 
> So, the following patch limits the warning to non-explicit_init_p.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM, sorry for missing the !explicit_init_p check.
 
> 2025-02-02  Jakub Jelinek  
> 
>   PR c++/118719
>   * lambda.cc (add_capture): Only pedwarn about capturing structured
>   binding if !explicit_init_p.
> 
>   * g++.dg/cpp1z/decomp63.C: New test.
> 
> --- gcc/cp/lambda.cc.jj   2025-01-24 17:37:49.004457905 +0100
> +++ gcc/cp/lambda.cc  2025-01-31 23:47:08.907034696 +0100
> @@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree
>   return error_mark_node;
>   }
>  
> -  if (cxx_dialect < cxx20)
> +  if (cxx_dialect < cxx20 && !explicit_init_p)
>   {
> auto_diagnostic_group d;
> tree stripped_init = tree_strip_any_location_wrapper (initializer);
> --- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj  2025-01-31 23:54:15.480699418 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/decomp63.C 2025-01-31 23:53:02.998578507 
> +0100
> @@ -0,0 +1,18 @@
> +// PR c++/118719
> +// { dg-do compile { target c++11 } }
> +// { dg-options "" }
> +
> +int
> +main ()
> +{
> +  int a[] = { 42 };
> +  auto [x] = a;  // { dg-warning 
> "structured bindings only available with" "" { target c++14_down } }
> + // { dg-message "declared here" 
> "" { target c++17_down } .-1 }
> +  [=] () { int b = x; (void) b; };   // { dg-warning "captured 
> structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
> +  [&] () { int b = x; (void) b; };   // { dg-warning "captured 
> structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
> +  [x] () { int b = x; (void) b; };   // { dg-warning "captured 
> structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
> +  [&x] () { int b = x; (void) b; };  // { dg-warning "captured 
> structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
> +  [x = x] () { int b = x; (void) b; };   // { dg-warning "lambda 
> capture initializers only available with" "" { target c++11_only } }
> +  [y = x] () { int b = y; (void) b; };   // { dg-warning "lambda 
> capture initializers only available with" "" { target c++11_only } }
> +  [y = x * 2] () { int b = y; (void) b; };   // { dg-warning "lambda capture 
> initializers only available with" "" { target c++11_only } }
> +}
> 
>   Jakub
> 

Marek



[RFC][PATCH v1 00/16] FMV refactor and ACLE compliance.

2025-02-03 Thread Alfie Richards
Hello,

This patch series intends to changes the behavior of targets with
TARGET_HAS_FMV_TARGET_ATTRIBUTE set to false (ie. uses target_version attributes
for FMV as opposed to target attributes)
to follow the behavior specified in the Arm C Language Extension.

There is significant refactoring to FMV in the process.

Notable changes include:

* Introduction of the string_slice class.
* Refactoring FMV mangling to always use the existing hook.
  * Changing the x86 mangling of dispatched symbols.
  * Adding new members to cgraph_function_version_info and cgraph_node.
  * Specifically, adding cgraph logic earlier in the C and C++ front ends than
it was previously.
* Changing the cgraph_function_version_info to be implicitly ordered.
* Changing resolver creation for target_version to reuse the target_clones
  logic.
  * Only creating the resolver (in target_version semantics) when the default
version is implemented.
* Changing C++ symbol resolution for target_version semantics to only resolve
  default versions.
  * ie. changing the scope and signature of the FMV function set to be
determined by default versions.

I would appreciate overall feedback on these changes, and specific thoughts on
the behavioral changes it makes to riscv FMV (as it also uses target_version
semantics) and to the mangling change for x86 (see test changes for both) from
relevant maintainers.

These changes are targeting GCC 16 stage 1.

Regression tested and bootstrapped for aarch64-none-linux-gnu and
x86_64-unknown-linux-gnu.
Cross compiled and the FMV tests ran for riscv and powerpc.

Kind regards,
Alfie Richards

Alfie Richards (16):
  Add PowerPC FMV symbol tests.
  Add x86 FMV symbol tests
  Add string_slice class.
  Remove unnecessary `record` argument from maybe_version_functions.
  Update is_function_default_version to work with target_version.
  Change function versions to be implicitly ordered.
  Add version of make_attribute supporting string_slice.
  Add get_clone_versions function.
  Add assembler_name to cgraph_function_version_info.
  Add dispatcher_resolver_function and is_target_clone to cgraph_node.
  Add clone_identifier function.
  Refactor FMV name mangling.
  Remove unused target_clone parsing code.
  Change target_version semantics to follow ACLE specification.
  Support mixing of target_clones and target_version for aarch64.
  Remove FMV beta warning.

 gcc/attribs.cc|  79 --
 gcc/attribs.h |   1 +
 gcc/c-family/c-attribs.cc |   4 +-
 gcc/c/c-decl.cc   |  20 ++
 gcc/cgraph.cc |  49 +++-
 gcc/cgraph.h  |  31 ++-
 gcc/cgraphclones.cc   |  16 +-
 gcc/cgraphunit.cc |   9 +
 gcc/config/aarch64/aarch64.cc | 233 --
 gcc/config/i386/i386-features.cc  | 123 +
 gcc/config/riscv/riscv.cc | 136 --
 gcc/config/rs6000/rs6000.cc   | 139 ---
 gcc/cp/call.cc|   8 +
 gcc/cp/class.cc   |  13 +-
 gcc/cp/cp-gimplify.cc |   6 +-
 gcc/cp/cp-tree.h  |   2 +-
 gcc/cp/decl.cc|  80 +-
 gcc/cp/typeck.cc  |   8 +
 gcc/ipa.cc|  11 +
 gcc/multiple_target.cc| 220 ++---
 gcc/testsuite/g++.target/aarch64/mv-1.C   |   6 +-
 .../g++.target/aarch64/mv-and-mvc1.C  |  38 +++
 .../g++.target/aarch64/mv-and-mvc2.C  |  29 +++
 .../g++.target/aarch64/mv-and-mvc3.C  |  41 +++
 .../g++.target/aarch64/mv-and-mvc4.C  |  38 +++
 gcc/testsuite/g++.target/aarch64/mv-error1.C  |  13 +
 gcc/testsuite/g++.target/aarch64/mv-error13.C |  13 +
 gcc/testsuite/g++.target/aarch64/mv-error2.C  |  10 +
 gcc/testsuite/g++.target/aarch64/mv-error3.C  |  13 +
 gcc/testsuite/g++.target/aarch64/mv-error7.C  |   9 +
 gcc/testsuite/g++.target/aarch64/mv-error8.C  |  21 ++
 gcc/testsuite/g++.target/aarch64/mv-error9.C  |  12 +
 gcc/testsuite/g++.target/aarch64/mv-pragma.C  |   2 +-
 .../g++.target/aarch64/mv-symbols1.C  |   2 +-
 .../g++.target/aarch64/mv-symbols10.C |  43 
 .../g++.target/aarch64/mv-symbols11.C |  27 ++
 .../g++.target/aarch64/mv-symbols12.C |  18 ++
 .../g++.target/aarch64/mv-symbols14.C |  16 ++
 .../g++.target/aarch64/mv-symbols15.C |  16 ++
 .../g++.target/aarch64/mv-symbols2.C  |  14 +-
 .../g++.target/aarch64/mv-symbols3.C  |   8 +-
 .../g++.target/aarch64/mv-symbols4.C  |   8 +-
 .../g++.target/aarch64/mv-symbols5.C  |   8 +-
 .../g++.target/aarch64/mv-symbols6.C  |  23 ++
 .../g++.target/aarch64/mv-symbols8.C  |  48 
 .../g++.targe

[PATCH v1 04/16] Remove unnecessary `record` argument from maybe_version_functions.

2025-02-03 Thread Alfie Richards

The `record` argument in maybe_version_function was intended to allow
controlling recording the relationship of versions. However, it only
exercised this if both input functions were already marked as versioned,
and this same logic is repeated in maybe_version_function itself so the
argument is unnecessary.

gcc/cp/ChangeLog:

* class.cc (add_method): Remove argument.
* cp-tree.h (maybe_version_functions): Ditto.
* decl.cc (decls_match): Ditto.
(maybe_version_functions): Ditto.
---
 gcc/cp/class.cc  | 2 +-
 gcc/cp/cp-tree.h | 2 +-
 gcc/cp/decl.cc   | 9 +++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index f2f81a44718..a9a80d1b4be 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using)
   /* If these are versions of the same function, process and
 	 move on.  */
   if (TREE_CODE (fn) == FUNCTION_DECL
-	  && maybe_version_functions (method, fn, true))
+	  && maybe_version_functions (method, fn))
 	continue;
 
   if (DECL_INHERITED_CTOR (method))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ec976928f5f..8eba8d455be 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7114,7 +7114,7 @@ extern void determine_local_discriminator	(tree, tree = NULL_TREE);
 extern bool member_like_constrained_friend_p	(tree);
 extern bool fns_correspond			(tree, tree);
 extern int decls_match(tree, tree, bool = true);
-extern bool maybe_version_functions		(tree, tree, bool);
+extern bool maybe_version_functions		(tree, tree);
 extern bool validate_constexpr_redeclaration	(tree, tree);
 extern bool merge_default_template_args		(tree, tree, bool);
 extern tree duplicate_decls			(tree, tree,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index cf5e055e146..3b3b4481964 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1215,9 +1215,7 @@ decls_match (tree newdecl, tree olddecl, bool record_versions /* = true */)
 	  && targetm.target_option.function_versions (newdecl, olddecl))
 	{
 	  if (record_versions)
-	maybe_version_functions (newdecl, olddecl,
- (!DECL_FUNCTION_VERSIONED (newdecl)
-  || !DECL_FUNCTION_VERSIONED (olddecl)));
+	maybe_version_functions (newdecl, olddecl);
 	  return 0;
 	}
 }
@@ -1288,7 +1286,7 @@ maybe_mark_function_versioned (tree decl)
If RECORD is set to true, record function versions.  */
 
 bool
-maybe_version_functions (tree newdecl, tree olddecl, bool record)
+maybe_version_functions (tree newdecl, tree olddecl)
 {
   if (!targetm.target_option.function_versions (newdecl, olddecl))
 return false;
@@ -1311,8 +1309,7 @@ maybe_version_functions (tree newdecl, tree olddecl, bool record)
   maybe_mark_function_versioned (newdecl);
 }
 
-  if (record)
-cgraph_node::record_function_versions (olddecl, newdecl);
+  cgraph_node::record_function_versions (olddecl, newdecl);
 
   return true;
 }


[PATCH v1 07/16] Add version of make_attribute supporting string_slice.

2025-02-03 Thread Alfie Richards

gcc/ChangeLog:

* attribs.cc (make_attribute): New function overload.
* attribs.h (make_attribute): New function overload.
---
 gcc/attribs.cc | 19 ++-
 gcc/attribs.h  |  1 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 5cf45491ada..cb25845715d 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1090,7 +1090,24 @@ make_attribute (const char *name, const char *arg_name, tree chain)
   return attr;
 }
 
-
+/* Makes a function attribute of the form NAME (ARG_NAME) and chains
+   it to CHAIN.  */
+
+tree
+make_attribute (string_slice name, string_slice arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier_with_length (name.begin (), name.size ());
+  attr_arg_name = build_string (arg_name.size (), arg_name.begin ());
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
 /* Common functions used for target clone support.  */
 
 /* Comparator function to be used in qsort routine to sort attribute
diff --git a/gcc/attribs.h b/gcc/attribs.h
index 4b946390f76..e7d592c5b41 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -46,6 +46,7 @@ extern tree get_attribute_name (const_tree);
 extern tree get_attribute_namespace (const_tree);
 extern void apply_tm_attr (tree, tree);
 extern tree make_attribute (const char *, const char *, tree);
+extern tree make_attribute (string_slice, string_slice, tree);
 extern bool attribute_ignored_p (tree);
 extern bool attribute_ignored_p (const attribute_spec *const);
 extern bool any_nonignored_attribute_p (tree);


[PATCH v1 02/16] Add x86 FMV symbol tests

2025-02-03 Thread Alfie Richards

This is for testing the x86 mangling of FMV versioned function
assembly names.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv-symbols1.C: New test.
* g++.target/i386/mv-symbols2.C: New test.
* g++.target/i386/mv-symbols3.C: New test.
* g++.target/i386/mv-symbols4.C: New test.
* g++.target/i386/mv-symbols5.C: New test.
* g++.target/i386/mvc-symbols1.C: New test.
* g++.target/i386/mvc-symbols2.C: New test.
* g++.target/i386/mvc-symbols3.C: New test.
* g++.target/i386/mvc-symbols4.C: New test.
---
 gcc/testsuite/g++.target/i386/mv-symbols1.C  | 68 
 gcc/testsuite/g++.target/i386/mv-symbols2.C  | 56 
 gcc/testsuite/g++.target/i386/mv-symbols3.C  | 44 +
 gcc/testsuite/g++.target/i386/mv-symbols4.C  | 50 ++
 gcc/testsuite/g++.target/i386/mv-symbols5.C  | 56 
 gcc/testsuite/g++.target/i386/mvc-symbols1.C | 44 +
 gcc/testsuite/g++.target/i386/mvc-symbols2.C | 29 +
 gcc/testsuite/g++.target/i386/mvc-symbols3.C | 35 ++
 gcc/testsuite/g++.target/i386/mvc-symbols4.C | 23 +++
 9 files changed, 405 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols4.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols5.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols4.C

diff --git a/gcc/testsuite/g++.target/i386/mv-symbols1.C b/gcc/testsuite/g++.target/i386/mv-symbols1.C
new file mode 100644
index 000..1290299aea5
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/mv-symbols1.C
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target("arch=slm")))
+int foo ()
+{
+  return 3;
+}
+
+__attribute__((target("sse4.2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target("sse4.2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target("arch=slm")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target("default")))
+int foo (int)
+{
+  return 2;
+}
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mvc-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3foovv\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3fooii\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3fooii, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z7_Z3fooii,_Z3fooi\.resolver\n" 1 } } */
diff --git a/gcc/testsuite/g++.target/i386/mv-symbols2.C b/gcc/testsuite/g++.target/i386/mv-symbols2.C
new file mode 100644
index 000..8b75565d78d
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/mv-symbols2.C
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target("arch=slm")))
+int foo ()
+{
+  return 3;
+}
+
+__attribute__((target("sse4.2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target("sse4.2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target("arch=slm")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target("default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mvc-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 0 } } */
+/* { dg-final { s

[PATCH v1 11/16] Add clone_identifier function.

2025-02-03 Thread Alfie Richards

This is similar to clone_function_name and its siblings but takes an
identifier tree node rather than a function declaration.

This is to be used in conjunction with the identifier node stored in
cgraph_function_version_info::assembler_name to mangle FMV functions in
later patches.

gcc/ChangeLog:

* cgraph.h (clone_identifier): New function.
* cgraphclones.cc (clone_identifier): New function.
clone_function_name: Refactored to use clone_identifier.
---
 gcc/cgraph.h|  1 +
 gcc/cgraphclones.cc | 16 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 9561bce2c33..a4eff14ddf6 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -2627,6 +2627,7 @@ tree clone_function_name (const char *name, const char *suffix,
 tree clone_function_name (tree decl, const char *suffix,
 			  unsigned long number);
 tree clone_function_name (tree decl, const char *suffix);
+tree clone_identifier (tree decl, const char *suffix);
 
 void tree_function_versioning (tree, tree, vec *,
 			   ipa_param_adjustments *,
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 5332a433317..6b650849a63 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -557,6 +557,14 @@ clone_function_name (tree decl, const char *suffix)
   /* For consistency this needs to behave the same way as
  ASM_FORMAT_PRIVATE_NAME does, but without the final number
  suffix.  */
+  return clone_identifier (identifier, suffix);
+}
+
+/* Return a new clone of ID ending with the string SUFFIX.  */
+
+tree
+clone_identifier (tree id, const char *suffix)
+{
   char *separator = XALLOCAVEC (char, 2);
   separator[0] = symbol_table::symbol_suffix_separator ();
   separator[1] = 0;
@@ -565,15 +573,11 @@ clone_function_name (tree decl, const char *suffix)
 #else
   const char *prefix = "";
 #endif
-  char *result = ACONCAT ((prefix,
-			   IDENTIFIER_POINTER (identifier),
-			   separator,
-			   suffix,
-			   (char*)0));
+  char *result = ACONCAT (
+(prefix, IDENTIFIER_POINTER (id), separator, suffix, (char *) 0));
   return get_identifier (result);
 }
 
-
 /* Create callgraph node clone with new declaration.  The actual body will be
copied later at compilation stage.  The name of the new clone will be
constructed from the name of the original node, SUFFIX and NUM_SUFFIX.


[PATCH v1 13/16] Remove unused target_clone parsing code.

2025-02-03 Thread Alfie Richards

This removes the target_clone parsing code that was replaced with
get_clone_versions.

gcc/ChangeLog:

* multiple_target.cc (get_attr_str): Removed.
(separate_attrs): Removed.
* tree.cc (get_target_clone_attr_len): Removed.
* tree.h (get_target_clone_attr_len): Removed.
---
 gcc/multiple_target.cc | 61 --
 gcc/tree.cc| 26 --
 gcc/tree.h |  1 -
 3 files changed, 88 deletions(-)

diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 6aeceadbfd1..4f748a81f9b 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -177,67 +177,6 @@ create_dispatcher_calls (struct cgraph_node *node)
 }
 }
 
-/* Create string with attributes separated by TARGET_CLONES_ATTR_SEPARATOR.
-   Return number of attributes.  */
-
-static int
-get_attr_str (tree arglist, char *attr_str)
-{
-  tree arg;
-  size_t str_len_sum = 0;
-  int argnum = 0;
-
-  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
-{
-  const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
-  size_t len = strlen (str);
-  for (const char *p = strchr (str, TARGET_CLONES_ATTR_SEPARATOR);
-	   p;
-	   p = strchr (p + 1, TARGET_CLONES_ATTR_SEPARATOR))
-	argnum++;
-  memcpy (attr_str + str_len_sum, str, len);
-  attr_str[str_len_sum + len]
-	= TREE_CHAIN (arg) ? TARGET_CLONES_ATTR_SEPARATOR : '\0';
-  str_len_sum += len + 1;
-  argnum++;
-}
-  return argnum;
-}
-
-/* Return number of attributes separated by TARGET_CLONES_ATTR_SEPARATOR
-   and put them into ARGS.
-   If there is no DEFAULT attribute return -1.
-   If there is an empty string in attribute return -2.
-   If there are multiple DEFAULT attributes return -3.
-   */
-
-static int
-separate_attrs (char *attr_str, char **attrs, int attrnum)
-{
-  int i = 0;
-  int default_count = 0;
-  static const char separator_str[] = { TARGET_CLONES_ATTR_SEPARATOR, 0 };
-
-  for (char *attr = strtok (attr_str, separator_str);
-   attr != NULL; attr = strtok (NULL, separator_str))
-{
-  if (strcmp (attr, "default") == 0)
-	{
-	  default_count++;
-	  continue;
-	}
-  attrs[i++] = attr;
-}
-  if (default_count == 0)
-return -1;
-  else if (default_count > 1)
-return -3;
-  else if (i + default_count < attrnum)
-return -2;
-
-  return i;
-}
-
 /*  Creates target clone of NODE.  */
 
 static cgraph_node *
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 346522d01c0..9856f190367 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -15273,32 +15273,6 @@ get_attr_nonstring_decl (tree expr, tree *ref)
   return NULL_TREE;
 }
 
-/* Return length of attribute names string,
-   if arglist chain > 1, -1 otherwise.  */
-
-int
-get_target_clone_attr_len (tree arglist)
-{
-  tree arg;
-  int str_len_sum = 0;
-  int argnum = 0;
-
-  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
-{
-  const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
-  size_t len = strlen (str);
-  str_len_sum += len + 1;
-  for (const char *p = strchr (str, TARGET_CLONES_ATTR_SEPARATOR);
-	   p;
-	   p = strchr (p + 1, TARGET_CLONES_ATTR_SEPARATOR))
-	argnum++;
-  argnum++;
-}
-  if (argnum <= 1)
-return -1;
-  return str_len_sum;
-}
-
 /* Returns an auto_vec of string_slices containing the version strings from
ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
 
diff --git a/gcc/tree.h b/gcc/tree.h
index aea1cf078a0..df64d9cc847 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -7035,7 +7035,6 @@ extern unsigned fndecl_dealloc_argno (tree);
object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
-extern int get_target_clone_attr_len (tree);
 auto_vec
 get_clone_versions (const tree, int * = NULL);
 auto_vec


[PATCH] lto/113207 - fix free_lang_data_in_type

2025-02-03 Thread Richard Biener
When we process function types we strip volatile and const qualifiers
after building a simplified type variant (which preserves those).
The qualified type handling of both isn't really compatible, so avoid
bad interaction by swapping this, first dropping const/volatile
qualifiers and then building the simplified type thereof.

LTO bootstrapped on x86_64-unknown-linux-gnu (with extra checking
as indicated in the PR), testing in progress.

I'll push this unless you have any further comments and queue the
extra checking for stage1.

PR lto/113207
* ipa-free-lang-data.cc (free_lang_data_in_type): First drop
const/volatile qualifiers from function argument types,
then build a simplified type.

* gcc.dg/pr113207.c: New testcase.
---
 gcc/ipa-free-lang-data.cc   |  3 +--
 gcc/testsuite/gcc.dg/pr113207.c | 10 ++
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr113207.c

diff --git a/gcc/ipa-free-lang-data.cc b/gcc/ipa-free-lang-data.cc
index be96d2928d7..a865332ddf1 100644
--- a/gcc/ipa-free-lang-data.cc
+++ b/gcc/ipa-free-lang-data.cc
@@ -441,9 +441,7 @@ free_lang_data_in_type (tree type, class free_lang_data_d 
*fld)
 different front ends.  */
   for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
{
- TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld);
  tree arg_type = TREE_VALUE (p);
-
  if (TYPE_READONLY (arg_type) || TYPE_VOLATILE (arg_type))
{
  int quals = TYPE_QUALS (arg_type)
@@ -453,6 +451,7 @@ free_lang_data_in_type (tree type, class free_lang_data_d 
*fld)
  if (!fld->pset.add (TREE_VALUE (p)))
free_lang_data_in_type (TREE_VALUE (p), fld);
}
+ TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld);
  /* C++ FE uses TREE_PURPOSE to store initial values.  */
  TREE_PURPOSE (p) = NULL;
}
diff --git a/gcc/testsuite/gcc.dg/pr113207.c b/gcc/testsuite/gcc.dg/pr113207.c
new file mode 100644
index 000..81f53d8fcc2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113207.c
@@ -0,0 +1,10 @@
+/* { dg-compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -fchecking" }  */
+
+typedef struct cl_lispunion *cl_object;
+struct cl_lispunion {};
+cl_object cl_error() __attribute__((noreturn));
+volatile cl_object cl_coerce_value0;
+void cl_coerce() { cl_error(); }
+void L66safe_canonical_type(cl_object volatile);
-- 
2.43.0


Re: [PATCH v1 06/16] Change function versions to be implicitly ordered.

2025-02-03 Thread Richard Sandiford
Alfie Richards  writes:
> This changes function version structures to maintain the default version
> as the first declaration in the linked data structures by giving priority
> to the set containing the default when constructing the structure.
>
> This allows for removing logic for moving the default to the first
> position which was duplicated across target specific code and enables
> easier reasoning about function sets when checking for a default.
>
> gcc/ChangeLog:
>
>   * cgraph.cc (cgraph_node::record_function_versions): Update to
>   implicitly keep default first.
>   * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
>   Remove reordering.

Thanks, this is a really nice clean-up.  I see that it's already the
documented expectation:

  /* Chains all the semantically identical function versions.  The
 first function in this chain is the version_info node of the
 default function.  */
  cgraph_function_version_info *prev;

So in a sense the patch isn't changing the structures.  It's simply making
the current expectation always hold, rather than hold after a certain point.

Some comments below.

> ---
>  gcc/cgraph.cc| 39 +++---
>  gcc/config/aarch64/aarch64.cc| 37 +++-
>  gcc/config/i386/i386-features.cc | 33 -
>  gcc/config/riscv/riscv.cc| 41 +++-
>  gcc/config/rs6000/rs6000.cc  | 35 +--
>  5 files changed, 58 insertions(+), 127 deletions(-)
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index d0b19ad850e..1ea38d16e56 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -236,37 +236,58 @@ cgraph_node::delete_function_version_by_decl (tree decl)
>  void
>  cgraph_node::record_function_versions (tree decl1, tree decl2)
>  {
> -  cgraph_node *decl1_node = cgraph_node::get_create (decl1);
> -  cgraph_node *decl2_node = cgraph_node::get_create (decl2);
> +  cgraph_node *decl1_node;
> +  cgraph_node *decl2_node;
>cgraph_function_version_info *decl1_v = NULL;
>cgraph_function_version_info *decl2_v = NULL;
>cgraph_function_version_info *before;
>cgraph_function_version_info *after;
> +  cgraph_function_version_info *temp_node;
> +
> +  decl1_node = cgraph_node::get_create (decl1);
> +  decl2_node = cgraph_node::get_create (decl2);
>  
>gcc_assert (decl1_node != NULL && decl2_node != NULL);
>decl1_v = decl1_node->function_version ();
>decl2_v = decl2_node->function_version ();
>  
> -  if (decl1_v != NULL && decl2_v != NULL)
> -return;
> -

Could you go into more detail about why this return needs to be removed?
It seems like the assumption was that, if the two decls were already
versioned, they were already versions of the same thing.  For example,
we wouldn't create a set of 4 versions and a set of 2 versions and
only then merge them into a single set of 6 versions.  Is that not
the case with the new scheme?

If we could keep the return, then we could add:

  if (is_function_default_version (decl2)
  || (!decl1_v && !is_function_default_version (decl1)))
{
  std::swap (decl1, decl2);
  std::swap (decl1_v, decl2_v);
}

after it and then proceed as before, on the basis that (a) decl1_v and
decl2_v are individually canonical and (b) after the swap, any default
must be decl1 or earlier in decl1_v.  That would avoid a bit of extra
pointer chasing.

>if (decl1_v == NULL)
>  decl1_v = decl1_node->insert_new_function_version ();
>  
>if (decl2_v == NULL)
>  decl2_v = decl2_node->insert_new_function_version ();
>  
> -  /* Chain decl2_v and decl1_v.  All semantically identical versions
> - will be chained together.  */
> +  gcc_assert (decl1_v);
> +  gcc_assert (decl2_v);
>  
>before = decl1_v;
>after = decl2_v;
>  
> +  /* Go to first after node.  */
> +  while (after->prev != NULL)
> +after = after->prev;
> +
> +  /* Go to first before node.  */
> +  while (before->prev != NULL)
> +before = before->prev;
> +
> +  /* These are already recorded as versions.  */
> +  if (before == after)
> +return;
> +
> +  /* Possibly swap to make sure the default node stays at the front.  */
> +  if (is_function_default_version (after->this_node->decl))
> +{
> +  temp_node = after;
> +  after = before;
> +  before = temp_node;
> +}
> +
> +  /* Go to last node of before.  */
>while (before->next != NULL)
>  before = before->next;
>  
> -  while (after->prev != NULL)
> -after= after->prev;
> +  /* Chain decl2_v and decl1_v.  */
>  
>before->next = after;
>after->prev = before;
> [...]
> diff 

Re: [PATCH v1 07/16] Add version of make_attribute supporting string_slice.

2025-02-03 Thread Richard Sandiford
Alfie Richards  writes:
> gcc/ChangeLog:
>
>   * attribs.cc (make_attribute): New function overload.
>   * attribs.h (make_attribute): New function overload.
> ---
>  gcc/attribs.cc | 19 ++-
>  gcc/attribs.h  |  1 +
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 5cf45491ada..cb25845715d 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -1090,7 +1090,24 @@ make_attribute (const char *name, const char 
> *arg_name, tree chain)
>return attr;
>  }
>  
> -
> +/* Makes a function attribute of the form NAME (ARG_NAME) and chains
> +   it to CHAIN.  */
> +
> +tree
> +make_attribute (string_slice name, string_slice arg_name, tree chain)
> +{
> +  tree attr_name;
> +  tree attr_arg_name;
> +  tree attr_args;
> +  tree attr;
> +
> +  attr_name = get_identifier_with_length (name.begin (), name.size ());
> +  attr_arg_name = build_string (arg_name.size (), arg_name.begin ());
> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
> +  attr = tree_cons (attr_name, attr_args, chain);
> +  return attr;
> +}
> +

It seems to be more usual in new code to prefer initialisation over
assignment where possible, so:

  tree attr_name = get_identifier_with_length (name.begin (), name.size ());
  tree attr_arg_name = build_string (arg_name.size (), arg_name.begin ());
  tree attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
  tree attr = tree_cons (attr_name, attr_args, chain);
  return attr;

OK for GCC 16 with that change, thanks.

Richard

>  /* Common functions used for target clone support.  */
>  
>  /* Comparator function to be used in qsort routine to sort attribute
> diff --git a/gcc/attribs.h b/gcc/attribs.h
> index 4b946390f76..e7d592c5b41 100644
> --- a/gcc/attribs.h
> +++ b/gcc/attribs.h
> @@ -46,6 +46,7 @@ extern tree get_attribute_name (const_tree);
>  extern tree get_attribute_namespace (const_tree);
>  extern void apply_tm_attr (tree, tree);
>  extern tree make_attribute (const char *, const char *, tree);
> +extern tree make_attribute (string_slice, string_slice, tree);
>  extern bool attribute_ignored_p (tree);
>  extern bool attribute_ignored_p (const attribute_spec *const);
>  extern bool any_nonignored_attribute_p (tree);


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Jan Hubicka
> > I don't think we should add a new target hook unless it's providing
> > genuinely new information about the target.  Hooking into the RA to
> > brute-force a particular heuristic makes it harder to improve the RA
> > in future.
> >
> > There are already hooks that provide the costs of the relevant operations,
> > so I think we should concentrate on using those to get good results for
> > both Power and x86.
> 
> It isn't just about Power and x86.
> 
> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> Author: Surya Kumari Jangala 
> Date:   Tue Jun 25 08:37:49 2024 -0500
> 
> ira: Scale save/restore costs of callee save registers with block 
> frequency
> 
> caused regressions on many targets, including aarch64:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116028
> 
> I don't understand why frequency can be used to scale the
> cost.

Adding such absolute hook does not seem to make sense to me, expecially
if the problem happens on multiple targets.

Let me see if I understnad what is going on.  Checking one PR the patch
broke testcase:

void foo (void);
void bar (void);

int
test (int a)
{
  int r;

  if (r = -a)
foo ();
  else
bar ();

  return r;
}


Here the cost model should decide whether extra spill in
prologe/epilogue is chaper than extra spill around calls to foo() and
bar().

REG_FREQ_FROM_BB should return values in range 1REG_FREQ_MAX which
expresses relative frequencies of BBs compressed to 0...REG_FREQ_MAX.

Avoiding 0 is to prevent cost model from thinking that spilling is
completely free.

In the case above we should have entry_block_freq = REG_FREQ_MAX (so the
epilogue/prologue cost is scaled by 1000) and frequencies of foo/bar to
have something like REG_FREQ_MAX * p and REG_FREQ_MAX * (1-p) where p is
the probability of the conditional.

So I guess IRA ends up comparing
  - spill_cost * REG_FREQ_MAX for using callee saved register 
  - spill_cost * (p + 1-p) * REG_FREQ_MAX for using caller saved register.
Which correctly represnets that in both case we will end up executing
same amount of spill code.

If we want it to choose variant with fewer static spill count, we could
add 1 to the final costs for every spill instruction which will biass
the cost model that way...

What would make sense would be to express to IRA that moves used in
prologue/epilogue are slightly chaper on x86 then reuglar moves, since we can 
use
push/pop pair which encode shorter than stack frame adjustment + regular
store/load used to spill around call.

Honza
> 
> -- 
> H.J.


[PATCH v2] c++: Properly detect calls to digest_init in build_vec_init [PR114619]

2025-02-03 Thread Simon Martin
Hi Jason,

On 16 Jan 2025, at 23:28, Jason Merrill wrote:

> On 10/19/24 5:09 AM, Simon Martin wrote:
>> We currently ICE in checking mode with cxx_dialect < 17 on the 
>> following
>> valid code
>>
>> === cut here ===
>> struct X {
>>X(const X&) {}
>> };
>> extern X x;
>> void foo () {
>>new X[1]{x};
>> }
>> === cut here ===
>>
>> The problem is that cp_gimplify_expr gcc_checking_asserts that a
>> TARGET_EXPR is not TARGET_EXPR_ELIDING_P (or cannot be elided), while 
>> in
>> this case with cxx_dialect < 17, it is TARGET_EXPR_ELIDING_P but we 
>> have
>> not even tried to elide.
>>
>> This patch relaxes that gcc_checking_assert to not fail when using
>> cxx_dialect < 17 and -fno-elide-constructors (I considered being more

>> clever at setting TARGET_EXPR_ELIDING_P appropriately but it looks 

>> more
>> risky and not worth the extra complexity for a checking assert).
>
> The problem is that in that case we end up with two copy constructor 
> calls instead of one: one built in massage_init_elt, and the other in 
> expand_default_init.  The result of the first copy is marked 
> TARGET_EXPR_ELIDING_P, so when we try to pass it to the second copy we 
> hit the assert.  I think the assert is catching a real bug: even with 
> -fno-elide-constructors we should only copy once, not twice.
That’s right, thanks for pointing me in the right direction.

> This seems to be because 'digested' has the wrong value in 
> build_vec_init; we did just call digest_init in build_new_1, but 
> build_vec_init doesn't understand that.
The test to determine whether digest_init has been called is indeed 
incorrect, in that it will work if BASE is a reference to the array but 
not if it’s a pointer to its first element. The attached updated patch 
fixes this.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Simon
From 578ac1a022ff039cdca45cdfca31bdfe8b571b79 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 3 Feb 2025 11:43:14 +0100
Subject: [PATCH] c++: Properly detect calls to digest_init in build_vec_init
 [PR114619]

We currently ICE in checking mode with cxx_dialect < 17 on the following
valid code

=== cut here ===
struct X {
  X(const X&) {}
};
extern X x;
void foo () {
  new X[1]{x};
}
=== cut here ===

We trip on a gcc_checking_assert in cp_gimplify_expr due to a
TARGET_EXPR that is not TARGET_EXPR_ELIDING_P. As pointed by Jason, the
problem is that build_vec_init does not recognize that digest_init has
been called, and we end up calling the copy constructor twice.

This happens because the detection in build_vec_init assumes that BASE
is a reference to the array, while it's a pointer to its first element
here. This patch makes sure that the detection works in both cases.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/114619

gcc/cp/ChangeLog:

* init.cc (build_vec_init): Properly determine whether
digest_init has been called.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide4.C: New test.

---
 gcc/cp/init.cc|  3 ++-
 gcc/testsuite/g++.dg/init/no-elide4.C | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/init/no-elide4.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 3ab7f96335c..613775c5a7c 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4786,7 +4786,8 @@ build_vec_init (tree base, tree maxindex, tree init,
   tree field, elt;
   /* If the constructor already has the array type, it's been through
 digest_init, so we shouldn't try to do anything more.  */
-  bool digested = same_type_p (atype, TREE_TYPE (init));
+  bool digested = (TREE_CODE (TREE_TYPE (init)) == ARRAY_TYPE
+  && same_type_p (type, TREE_TYPE (TREE_TYPE (init;
   from_array = 0;
 
   if (length_check)
diff --git a/gcc/testsuite/g++.dg/init/no-elide4.C 
b/gcc/testsuite/g++.dg/init/no-elide4.C
new file mode 100644
index 000..9377d9f0161
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/no-elide4.C
@@ -0,0 +1,11 @@
+// PR c++/114619
+// { dg-do "compile" { target c++11 } }
+// { dg-options "-fno-elide-constructors" }
+
+struct X {
+  X(const X&) {}
+};
+extern X x;
+void foo () {
+  new X[1]{x};
+}
-- 
2.44.0



[PATCH v1 05/16] Update is_function_default_version to work with target_version.

2025-02-03 Thread Alfie Richards

Notably this respects target_version semantics where an unannotated
function can be the default version.

gcc/ChangeLog:

* attribs.cc (is_function_default_version): Add target_version logic.
---
 gcc/attribs.cc | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 56dd18c2fa8..5cf45491ada 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1279,18 +1279,30 @@ make_dispatcher_decl (const tree decl)
   return func_decl;
 }
 
-/* Returns true if DECL is multi-versioned using the target attribute, and this
-   is the default version.  This function can only be used for targets that do
-   not support the "target_version" attribute.  */
+/* Returns true if DECL a multiversioned default.
+   With the target attribute semantics, returns true if the function is marked
+   as default with the target version.
+   With the target_version attribute semantics, returns true if the function
+   is either not annotated, or annotated as default.  */
 
 bool
 is_function_default_version (const tree decl)
 {
-  if (TREE_CODE (decl) != FUNCTION_DECL
-  || !DECL_FUNCTION_VERSIONED (decl))
-return false;
-  tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
-  gcc_assert (attr);
+  tree attr;
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+{
+  if (!DECL_FUNCTION_VERSIONED (decl))
+	return false;
+  attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  if (!attr)
+	return false;
+}
+  else
+{
+  attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
+  if (!attr)
+	return true;
+}
   attr = TREE_VALUE (TREE_VALUE (attr));
   return (TREE_CODE (attr) == STRING_CST
 	  && strcmp (TREE_STRING_POINTER (attr), "default") == 0);


[PATCH v1 06/16] Change function versions to be implicitly ordered.

2025-02-03 Thread Alfie Richards

This changes function version structures to maintain the default version
as the first declaration in the linked data structures by giving priority
to the set containing the default when constructing the structure.

This allows for removing logic for moving the default to the first
position which was duplicated across target specific code and enables
easier reasoning about function sets when checking for a default.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::record_function_versions): Update to
implicitly keep default first.
* config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
Remove reordering.
* config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
Remove reordering.
* config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
Remove reordering.
* config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
Remove reordering.
---
 gcc/cgraph.cc| 39 +++---
 gcc/config/aarch64/aarch64.cc| 37 +++-
 gcc/config/i386/i386-features.cc | 33 -
 gcc/config/riscv/riscv.cc| 41 +++-
 gcc/config/rs6000/rs6000.cc  | 35 +--
 5 files changed, 58 insertions(+), 127 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index d0b19ad850e..1ea38d16e56 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -236,37 +236,58 @@ cgraph_node::delete_function_version_by_decl (tree decl)
 void
 cgraph_node::record_function_versions (tree decl1, tree decl2)
 {
-  cgraph_node *decl1_node = cgraph_node::get_create (decl1);
-  cgraph_node *decl2_node = cgraph_node::get_create (decl2);
+  cgraph_node *decl1_node;
+  cgraph_node *decl2_node;
   cgraph_function_version_info *decl1_v = NULL;
   cgraph_function_version_info *decl2_v = NULL;
   cgraph_function_version_info *before;
   cgraph_function_version_info *after;
+  cgraph_function_version_info *temp_node;
+
+  decl1_node = cgraph_node::get_create (decl1);
+  decl2_node = cgraph_node::get_create (decl2);
 
   gcc_assert (decl1_node != NULL && decl2_node != NULL);
   decl1_v = decl1_node->function_version ();
   decl2_v = decl2_node->function_version ();
 
-  if (decl1_v != NULL && decl2_v != NULL)
-return;
-
   if (decl1_v == NULL)
 decl1_v = decl1_node->insert_new_function_version ();
 
   if (decl2_v == NULL)
 decl2_v = decl2_node->insert_new_function_version ();
 
-  /* Chain decl2_v and decl1_v.  All semantically identical versions
- will be chained together.  */
+  gcc_assert (decl1_v);
+  gcc_assert (decl2_v);
 
   before = decl1_v;
   after = decl2_v;
 
+  /* Go to first after node.  */
+  while (after->prev != NULL)
+after = after->prev;
+
+  /* Go to first before node.  */
+  while (before->prev != NULL)
+before = before->prev;
+
+  /* These are already recorded as versions.  */
+  if (before == after)
+return;
+
+  /* Possibly swap to make sure the default node stays at the front.  */
+  if (is_function_default_version (after->this_node->decl))
+{
+  temp_node = after;
+  after = before;
+  before = temp_node;
+}
+
+  /* Go to last node of before.  */
   while (before->next != NULL)
 before = before->next;
 
-  while (after->prev != NULL)
-after= after->prev;
+  /* Chain decl2_v and decl1_v.  */
 
   before->next = after;
   after->prev = before;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index be99137b052..15dd7dda48a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20630,7 +20630,6 @@ aarch64_get_function_versions_dispatcher (void *decl)
   struct cgraph_node *node = NULL;
   struct cgraph_node *default_node = NULL;
   struct cgraph_function_version_info *node_v = NULL;
-  struct cgraph_function_version_info *first_v = NULL;
 
   tree dispatch_decl = NULL;
 
@@ -20647,37 +20646,17 @@ aarch64_get_function_versions_dispatcher (void *decl)
   if (node_v->dispatcher_resolver != NULL)
 return node_v->dispatcher_resolver;
 
-  /* Find the default version and make it the first node.  */
-  first_v = node_v;
-  /* Go to the beginning of the chain.  */
-  while (first_v->prev != NULL)
-first_v = first_v->prev;
-  default_version_info = first_v;
-  while (default_version_info != NULL)
-{
-  if (get_feature_mask_for_version
-	(default_version_info->this_node->decl) == 0ULL)
-	break;
-  default_version_info = default_version_info->next;
-}
-
-  /* If there is no default node, just return NULL.  */
-  if (default_version_info == NULL)
-return NULL;
-
-  /* Make default info the first node.  */
-  if (first_v != default_version_info)
-{
-  default_version_info->prev->next = default_version_info->next;
-  if (default_version_info->next)
-	default_version_info->next->prev = default_version_info->prev;
-  first_v->prev = default_version_info;
- 

[PATCH v1 09/16] Add assembler_name to cgraph_function_version_info.

2025-02-03 Thread Alfie Richards

This adds the assembler_name member to cgraph_function_version_info
to store the base assembler name for the function to be mangled. This is
used in later patches for refactoring FMV mangling.

gcc/c/ChangeLog:

* c-decl.cc (start_decl): Record assembler_name.
(start_function): Record assembler_name.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::record_function_versions): Record
assembler_name.
* cgraph.h (struct cgraph_function_version_info): Add assembler_name.

gcc/cp/ChangeLog:

* decl.cc (maybe_mark_function_versioned): Record assember_name.
(start_decl): Record assembler_name.
(start_preparsed_function): Record assembler_name.
---
 gcc/c/c-decl.cc | 20 
 gcc/cgraph.cc   | 10 --
 gcc/cgraph.h|  3 +++
 gcc/cp/decl.cc  | 34 ++
 4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 0dcbae9b26f..daa19f360e6 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5762,6 +5762,16 @@ start_decl (struct c_declarator *declarator, struct c_declspecs *declspecs,
   && VAR_OR_FUNCTION_DECL_P (decl))
   objc_check_global_decl (decl);
 
+  /* Store the base assembler name for mangling later.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)))
+{
+  cgraph_node *node = cgraph_node::get_create (decl);
+  if (!node->function_version ())
+	node->insert_new_function_version ();
+  node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl);
+}
+
   /* Add this decl to the current scope.
  TEM may equal DECL or it may be a previous decl of the same name.  */
   if (do_push)
@@ -10863,6 +10873,16 @@ start_function (struct c_declspecs *declspecs, struct c_declarator *declarator,
 
   current_function_decl = pushdecl (decl1);
 
+  /* Store the base assembler name for mangling later.  */
+  if (TREE_CODE (decl1) == FUNCTION_DECL
+  && lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl1)))
+{
+  cgraph_node *node = cgraph_node::get_create (decl1);
+  if (!node->function_version ())
+	node->insert_new_function_version ();
+  node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl1);
+}
+
   if (tree access = build_attr_access_from_parms (parms, false))
 decl_attributes (¤t_function_decl, access, ATTR_FLAG_INTERNAL,
 		 old_decl);
diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 1ea38d16e56..c2038be4671 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -252,10 +252,16 @@ cgraph_node::record_function_versions (tree decl1, tree decl2)
   decl2_v = decl2_node->function_version ();
 
   if (decl1_v == NULL)
-decl1_v = decl1_node->insert_new_function_version ();
+{
+  decl1_v = decl1_node->insert_new_function_version ();
+  decl1_v->assembler_name = DECL_ASSEMBLER_NAME (decl1);
+}
 
   if (decl2_v == NULL)
-decl2_v = decl2_node->insert_new_function_version ();
+{
+  decl2_v = decl2_node->insert_new_function_version ();
+  decl2_v->assembler_name = DECL_ASSEMBLER_NAME (decl2);
+}
 
   gcc_assert (decl1_v);
   gcc_assert (decl2_v);
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 065fcc742e8..d9177364b7a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -856,6 +856,9 @@ struct GTY((for_user)) cgraph_function_version_info {
  dispatcher. The dispatcher decl is an alias to the resolver
  function decl.  */
   tree dispatcher_resolver;
+
+  /* The assmbly name of the function set before version mangling.  */
+  tree assembler_name;
 };
 
 #define DEFCIFCODE(code, type, string)	CIF_ ## code,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 3b3b4481964..fdef98f8062 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1273,6 +1273,12 @@ maybe_mark_function_versioned (tree decl)
 {
   if (!DECL_FUNCTION_VERSIONED (decl))
 {
+  cgraph_node *node = cgraph_node::get_create (decl);
+  if (!node->function_version ())
+	node->insert_new_function_version ();
+  if (!node->function_version ()->assembler_name)
+	node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl);
+
   DECL_FUNCTION_VERSIONED (decl) = 1;
   /* If DECL_ASSEMBLER_NAME has already been set, re-mangle
 	 to include the version marker.  */
@@ -6155,6 +6161,20 @@ start_decl (const cp_declarator *declarator,
 
   was_public = TREE_PUBLIC (decl);
 
+  /* Set the assembler string for any versioned function.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && (lookup_attribute (TARGET_HAS_FMV_TARGET_ATTRIBUTE ? "target"
+			: "target_version",
+			DECL_ATTRIBUTES (decl))
+	  || lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl
+{
+  cgraph_node *node = cgraph_node::get_create (decl);
+  if (!node->function_version ())
+	node->insert_new_function_version ();
+  if (!node->function_version ()->assembler_name)
+	node->function_version ()->as

Re: [PATCH v1 05/16] Update is_function_default_version to work with target_version.

2025-02-03 Thread Richard Sandiford
Alfie Richards  writes:
> Notably this respects target_version semantics where an unannotated
> function can be the default version.
>
> gcc/ChangeLog:
>
>   * attribs.cc (is_function_default_version): Add target_version logic.

Generally looks good to me, but:

> ---
>  gcc/attribs.cc | 28 
>  1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 56dd18c2fa8..5cf45491ada 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -1279,18 +1279,30 @@ make_dispatcher_decl (const tree decl)
>return func_decl;
>  }
>  
> -/* Returns true if DECL is multi-versioned using the target attribute, and 
> this
> -   is the default version.  This function can only be used for targets that 
> do
> -   not support the "target_version" attribute.  */
> +/* Returns true if DECL a multiversioned default.
> +   With the target attribute semantics, returns true if the function is 
> marked
> +   as default with the target version.
> +   With the target_version attribute semantics, returns true if the function
> +   is either not annotated, or annotated as default.  */
>  
>  bool
>  is_function_default_version (const tree decl)
>  {
> -  if (TREE_CODE (decl) != FUNCTION_DECL
> -  || !DECL_FUNCTION_VERSIONED (decl))
> -return false;
> -  tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
> -  gcc_assert (attr);

It might be worth either preserving the FUNCTION_DECL test or turning
it into an assert.  With that change...

> +  tree attr;
> +  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
> +{
> +  if (!DECL_FUNCTION_VERSIONED (decl))
> + return false;
> +  attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
> +  if (!attr)
> + return false;

...I suppose we should also preserve the original assert here,
unless there's a specific reason not to.

Thanks,
Richard

> +}
> +  else
> +{
> +  attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
> +  if (!attr)
> + return true;
> +}
>attr = TREE_VALUE (TREE_VALUE (attr));
>return (TREE_CODE (attr) == STRING_CST
> && strcmp (TREE_STRING_POINTER (attr), "default") == 0);


[committed] hppa: Revise various millicode insn patterns to use match_operand

2025-02-03 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.
Committed to trunk.

Dave
---

hppa: Revise various millicode insn patterns to use match_operand

LRA does not correctly support hard-register input operands that
are clobbered.  This is needed to support millicode calls on hppa.
The operand setup is sometimes deleted.

This problem can be avoided by hiding hard-register input operands
using match_operand.  This also potentially allows for constraints
that specify the operand is both read and written.

2025-02-03  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/117248
* config/pa/predicates.md (r25_operand): New predicate.
(r26_operand): Likewise.
* config/pa/pa.md: Use match_operand for r25 and r26 hard
register operands in mult, div, udiv, mod and umod millicode
patterns.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index df1b61e871f..23129940e64 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -5632,8 +5632,10 @@
(set_attr "length" "4")])
 
 (define_insn ""
-  [(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25)))
-   (clobber (match_operand:SI 0 "register_operand" "=a"))
+  [(set (reg:SI 29)
+   (mult:SI (match_operand:SI 1 "r26_operand" "")
+(match_operand:SI 0 "r25_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 31))]
@@ -5645,8 +5647,10 @@
  (symbol_ref "pa_attr_length_millicode_call (insn)")))])
 
 (define_insn ""
-  [(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25)))
-   (clobber (match_operand:SI 0 "register_operand" "=a"))
+  [(set (reg:SI 29)
+   (mult:SI (match_operand:SI 1 "r26_operand" "")
+(match_operand:SI 0 "r25_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 2))]
@@ -5753,8 +5757,9 @@
 
 (define_insn ""
   [(set (reg:SI 29)
-   (div:SI (reg:SI 26) (match_operand:SI 0 "div_operand" "")))
-   (clobber (match_operand:SI 1 "register_operand" "=a"))
+   (div:SI (match_operand:SI 1 "r26_operand" "")
+   (match_operand:SI 0 "div_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 31))]
@@ -5768,8 +5773,9 @@
 
 (define_insn ""
   [(set (reg:SI 29)
-   (div:SI (reg:SI 26) (match_operand:SI 0 "div_operand" "")))
-   (clobber (match_operand:SI 1 "register_operand" "=a"))
+   (div:SI (match_operand:SI 1 "r26_operand" "")
+   (match_operand:SI 0 "div_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 2))]
@@ -5800,8 +5806,9 @@
 
 (define_insn ""
   [(set (reg:SI 29)
-   (udiv:SI (reg:SI 26) (match_operand:SI 0 "div_operand" "")))
-   (clobber (match_operand:SI 1 "register_operand" "=a"))
+   (udiv:SI (match_operand:SI 1 "r26_operand" "")
+(match_operand:SI 0 "div_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 31))]
@@ -5815,8 +5822,9 @@
 
 (define_insn ""
   [(set (reg:SI 29)
-   (udiv:SI (reg:SI 26) (match_operand:SI 0 "div_operand" "")))
-   (clobber (match_operand:SI 1 "register_operand" "=a"))
+   (udiv:SI (match_operand:SI 1 "r26_operand" "")
+(match_operand:SI 0 "div_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 2))]
@@ -5844,8 +5852,10 @@
 }")
 
 (define_insn ""
-  [(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25)))
-   (clobber (match_operand:SI 0 "register_operand" "=a"))
+  [(set (reg:SI 29)
+   (mod:SI (match_operand:SI 1 "r26_operand" "")
+   (match_operand:SI 0 "r25_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 31))]
@@ -5858,8 +5868,10 @@
  (symbol_ref "pa_attr_length_millicode_call (insn)")))])
 
 (define_insn ""
-  [(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25)))
-   (clobber (match_operand:SI 0 "register_operand" "=a"))
+  [(set (reg:SI 29)
+   (mod:SI (match_operand:SI 1 "r26_operand" "")
+   (match_operand:SI 0 "r25_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:SI 26))
(clobber (reg:SI 25))
(clobber (reg:SI 2))]
@@ -5887,8 +5899,10 @@
 }")
 
 (define_insn ""
-  [(set (reg:SI 29) (umod:SI (reg:SI 26) (reg:SI 25)))
-   (clobber (match_operand:SI 0 "register_operand" "=a"))
+  [(set (reg:SI 29)
+   (umod:SI (match_operand:SI 1 "r26_operand" "")
+(match_operand:SI 0 "r25_operand" "")))
+   (clobber (match_operand:SI 2 "register_operand" "=a"))
(clobber (reg:S

Re: [PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

2025-02-03 Thread Jakub Jelinek
On Mon, Feb 03, 2025 at 11:33:38AM +0100, Richard Biener wrote:
> The first argument is supposed to be a type, not a decl.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> OK?
> 
>   PR c++/79786
> gcc/cp/
>   * rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.

LGTM.

> --- a/gcc/cp/rtti.cc
> +++ b/gcc/cp/rtti.cc
> @@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl)
>/* Avoid targets optionally bumping up the alignment to improve
>vector instruction accesses, tinfo are never accessed this way.  */
>  #ifdef DATA_ABI_ALIGNMENT
> -  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE 
> (decl;
> +  SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl),
> + TYPE_ALIGN (TREE_TYPE (decl;
>DECL_USER_ALIGN (decl) = true;
>  #endif
>return true;
> -- 
> 2.43.0

Jakub



[PATCH v1 10/16] Add dispatcher_resolver_function and is_target_clone to cgraph_node.

2025-02-03 Thread Alfie Richards

These flags are used to make sure mangling is done correctly.

gcc/ChangeLog:

* cgraph.h (struct cgraph_node): Add dispatcher_resolver_function and
is_target_clone.
---
 gcc/cgraph.h | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index d9177364b7a..9561bce2c33 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -896,19 +896,19 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   /* Constructor.  */
   explicit cgraph_node ()
 : symtab_node (SYMTAB_FUNCTION), callees (NULL), callers (NULL),
-  indirect_calls (NULL),
-  next_sibling_clone (NULL), prev_sibling_clone (NULL), clones (NULL),
-  clone_of (NULL), call_site_hash (NULL), former_clone_of (NULL),
-  simdclone (NULL), simd_clones (NULL), ipa_transforms_to_apply (vNULL),
-  inlined_to (NULL), rtl (NULL),
-  count (profile_count::uninitialized ()),
+  indirect_calls (NULL), next_sibling_clone (NULL),
+  prev_sibling_clone (NULL), clones (NULL), clone_of (NULL),
+  call_site_hash (NULL), former_clone_of (NULL), simdclone (NULL),
+  simd_clones (NULL), ipa_transforms_to_apply (vNULL), inlined_to (NULL),
+  rtl (NULL), count (profile_count::uninitialized ()),
   count_materialization_scale (REG_BR_PROB_BASE), profile_id (0),
   unit_id (0), tp_first_run (0), thunk (false),
-  used_as_abstract_origin (false),
-  lowered (false), process (false), frequency (NODE_FREQUENCY_NORMAL),
-  only_called_at_startup (false), only_called_at_exit (false),
-  tm_clone (false), dispatcher_function (false), calls_comdat_local (false),
-  icf_merged (false), nonfreeing_fn (false), merged_comdat (false),
+  used_as_abstract_origin (false), lowered (false), process (false),
+  frequency (NODE_FREQUENCY_NORMAL), only_called_at_startup (false),
+  only_called_at_exit (false), tm_clone (false),
+  dispatcher_function (false), dispatcher_resolver_function (false),
+  is_target_clone (false), calls_comdat_local (false), icf_merged (false),
+  nonfreeing_fn (false), merged_comdat (false),
   merged_extern_inline (false), parallelized_function (false),
   split_part (false), indirect_call_target (false), local (false),
   versionable (false), can_change_signature (false),
@@ -1465,6 +1465,11 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* True if this decl is a resolver for function versions.  */
+  unsigned dispatcher_resolver_function : 1;
+  /* True this is part of a multiversioned set and the default version
+ comes from a target_clone attribute.  */
+  unsigned is_target_clone : 1;
   /* True if this decl calls a COMDAT-local function.  This is set up in
  compute_fn_summary and inline_call.  */
   unsigned calls_comdat_local : 1;


[PATCH v1 04/16] Remove unecessary `record` argument from maybe_version_functions.

2025-02-03 Thread Alfie Richards

The `record` argument in maybe_version_function was intended to allow
controlling recording the relationship of versions. However, it only
exercised this if both input funcitons were already marked as versioned,
and this same logic is repeated in maybe_version_function itself so the
argument is unecessary.

gcc/cp/ChangeLog:

* class.cc (add_method): Remove argument.
* cp-tree.h (maybe_version_functions): Ditto.
* decl.cc (decls_match): Ditto.
(maybe_version_functions): Ditto.
---
 gcc/cp/class.cc  | 2 +-
 gcc/cp/cp-tree.h | 2 +-
 gcc/cp/decl.cc   | 9 +++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index f2f81a44718..a9a80d1b4be 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using)
   /* If these are versions of the same function, process and
 	 move on.  */
   if (TREE_CODE (fn) == FUNCTION_DECL
-	  && maybe_version_functions (method, fn, true))
+	  && maybe_version_functions (method, fn))
 	continue;
 
   if (DECL_INHERITED_CTOR (method))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ec976928f5f..8eba8d455be 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7114,7 +7114,7 @@ extern void determine_local_discriminator	(tree, tree = NULL_TREE);
 extern bool member_like_constrained_friend_p	(tree);
 extern bool fns_correspond			(tree, tree);
 extern int decls_match(tree, tree, bool = true);
-extern bool maybe_version_functions		(tree, tree, bool);
+extern bool maybe_version_functions		(tree, tree);
 extern bool validate_constexpr_redeclaration	(tree, tree);
 extern bool merge_default_template_args		(tree, tree, bool);
 extern tree duplicate_decls			(tree, tree,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index cf5e055e146..3b3b4481964 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1215,9 +1215,7 @@ decls_match (tree newdecl, tree olddecl, bool record_versions /* = true */)
 	  && targetm.target_option.function_versions (newdecl, olddecl))
 	{
 	  if (record_versions)
-	maybe_version_functions (newdecl, olddecl,
- (!DECL_FUNCTION_VERSIONED (newdecl)
-  || !DECL_FUNCTION_VERSIONED (olddecl)));
+	maybe_version_functions (newdecl, olddecl);
 	  return 0;
 	}
 }
@@ -1288,7 +1286,7 @@ maybe_mark_function_versioned (tree decl)
If RECORD is set to true, record function versions.  */
 
 bool
-maybe_version_functions (tree newdecl, tree olddecl, bool record)
+maybe_version_functions (tree newdecl, tree olddecl)
 {
   if (!targetm.target_option.function_versions (newdecl, olddecl))
 return false;
@@ -1311,8 +1309,7 @@ maybe_version_functions (tree newdecl, tree olddecl, bool record)
   maybe_mark_function_versioned (newdecl);
 }
 
-  if (record)
-cgraph_node::record_function_versions (olddecl, newdecl);
+  cgraph_node::record_function_versions (olddecl, newdecl);
 
   return true;
 }


[PATCH v1 03/16] Add string_slice class.

2025-02-03 Thread Alfie Richards

The string_slice inherits from array_slice and is used to refer to a
substring of an array that is memory managed elsewhere without modifying
the underlying array.

For example, this is useful in cases such as when needing to refer to a
substring of an attribute in the syntax tree.

This commit also adds some minimal helper functions for string_slice,
such as strtok, strcmp, and a function to strip whitespace from the
beginning and end of a slice.

gcc/ChangeLog:

* vec.cc (string_slice::strtok): New method.
(strcmp): Add implementation for string_slice.
(string_slice::strip): New method.
(test_string_slice_initializers): New test.
(test_string_slice_strtok): Ditto.
(test_string_slice_strcmp): Ditto.
(test_string_slice_equality): Ditto.
(test_string_slice_invalid): Ditto.
(test_string_slice_strip): Ditto.
(vec_cc_tests): Add new tests.
* vec.h (class string_slice): New class.
(strcmp): Add implementation for string_slice.
---
 gcc/vec.cc | 157 +
 gcc/vec.h  |  38 +
 2 files changed, 195 insertions(+)

diff --git a/gcc/vec.cc b/gcc/vec.cc
index 55f5f3dd447..569dbf2a53c 100644
--- a/gcc/vec.cc
+++ b/gcc/vec.cc
@@ -176,6 +176,67 @@ dump_vec_loc_statistics (void)
   vec_mem_desc.dump (VEC_ORIGIN);
 }
 
+string_slice
+string_slice::strtok (string_slice *str, string_slice delims)
+{
+  const char *ptr = str->begin ();
+
+  /* If the input string is empty or invalid, return an invalid slice
+ as there are no more tokens to return.  */
+  if (str->empty () || !str->is_valid ())
+{
+  *str = string_slice::invalid ();
+  return string_slice::invalid ();
+}
+
+  for (; ptr < str->end (); ptr++)
+for (const char *c = delims.begin (); c < delims.end(); c++)
+  if (*ptr == *c)
+	{
+	  const char *start = str->begin ();
+	  /* Update the input string to be the remaining string.  */
+	  *str = string_slice ((ptr + 1), str->end () - ptr - 1);
+	  return string_slice (start, (size_t) (ptr - start));
+	}
+
+  /* If no deliminators between the start and end, return the whole string.  */
+  string_slice res = *str;
+  *str = string_slice::invalid ();
+  return res;
+}
+
+int
+strcmp (string_slice str1, string_slice str2)
+{
+  for (unsigned int i = 0; i < str1.size () && i < str2.size (); i++)
+{
+  if (str1[i] < str2[i])
+	return -1;
+  if (str1[i] > str2[i])
+	return 1;
+}
+
+  if (str1.size () < str2.size ())
+return -1;
+  if (str1.size () > str2.size ())
+return 1;
+  return 0;
+}
+
+string_slice
+string_slice::strip ()
+{
+  const char *start = this->begin ();
+  const char *end = this->end ();
+
+  while (start < end && ISSPACE (*start))
+start++;
+  while (end > start && ISSPACE (*(end-1)))
+end--;
+
+  return string_slice (start, end-start);
+}
+
 #if CHECKING_P
 /* Report qsort comparator CMP consistency check failure with P1, P2, P3 as
witness elements.  */
@@ -584,6 +645,96 @@ test_auto_alias ()
   ASSERT_EQ (val, 0);
 }
 
+static void
+test_string_slice_initializers ()
+{
+  string_slice str1 = string_slice ();
+  ASSERT_TRUE (str1.is_valid ());
+  ASSERT_EQ (str1.size (), 0);
+
+  string_slice str2 = string_slice ("Test string");
+  ASSERT_TRUE (str2.is_valid ());
+  ASSERT_EQ (str2.size (), 11);
+
+  string_slice str3 = string_slice ("Test string", 4);
+  ASSERT_TRUE (str3.is_valid ());
+  ASSERT_EQ (str3.size (), 4);
+}
+
+static void
+test_string_slice_strtok ()
+{
+  const char *test_string
+= "This is the test string, it \0 is for testing, 123 ,,";
+
+  string_slice test_string_slice = string_slice (test_string, 53);
+  string_slice test_delims = string_slice (",\0", 2);
+
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+	 string_slice ("This is the test string"));
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+	 string_slice (" it "));
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+ string_slice (" is for testing"));
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+	 string_slice (" 123 "));
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+	 string_slice (""));
+  ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims),
+	 string_slice (""));
+  ASSERT_TRUE (test_string_slice.empty ());
+  ASSERT_FALSE (string_slice::strtok (&test_string_slice, test_delims)
+.is_valid ());
+  ASSERT_FALSE (test_string_slice.is_valid ());
+}
+
+static void
+test_string_slice_strcmp ()
+{
+  ASSERT_EQ (strcmp (string_slice (), string_slice ()), 0);
+  ASSERT_EQ (strcmp (string_slice ("test"), string_slice ()), 1);
+  ASSERT_EQ (strcmp (string_slice (), string_slice ("test")), -1);
+  ASSERT_EQ (strcmp (string_slice ("test"), string_slice ("test")), 0);
+  ASSERT_EQ (strcmp (string_slice ("a"), string_slice ("b")), -1);
+  ASSERT_EQ 

[PATCH v1 15/16] Support mixing of target_clones and target_version for aarch64.

2025-02-03 Thread Alfie Richards

This patch adds support for the combination of target_clones and
target_version in the definition of a versioned function.

This patch changes is_function_default_version to consider a function
declaration annotated with target_clones containing default to be a
default version. It also changes the common_function_version hook to
consider two functions annotated with target_clones and/or
target_versions to be common if their specified versions don't overlap.

This takes advantage of refactoring done in previous patches changing
how target_clones are expanded.

gcc/ChangeLog:

* attribs.cc (is_function_default_version): Add logic for
target_clones defining the default version.
* config/aarch64/aarch64.cc (aarch64_common_function_versions): Add
logic for a target_clones and target_version, or two target_clones
coexisting in a version set.

gcc/c-family/ChangeLog:

* c-attribs.cc: Add support for target_version and target_clones
coexisting.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-and-mvc1.C: New test.
* g++.target/aarch64/mv-and-mvc2.C: New test.
* g++.target/aarch64/mv-and-mvc3.C: New test.
* g++.target/aarch64/mv-and-mvc4.C: New test.
---
 gcc/attribs.cc|  7 +++
 gcc/c-family/c-attribs.cc |  2 -
 gcc/config/aarch64/aarch64.cc | 46 ++-
 .../g++.target/aarch64/mv-and-mvc1.C  | 38 +++
 .../g++.target/aarch64/mv-and-mvc2.C  | 29 
 .../g++.target/aarch64/mv-and-mvc3.C  | 41 +
 .../g++.target/aarch64/mv-and-mvc4.C  | 38 +++
 gcc/testsuite/g++.target/aarch64/mv-error1.C  | 13 ++
 gcc/testsuite/g++.target/aarch64/mv-error13.C | 13 ++
 gcc/testsuite/g++.target/aarch64/mv-error2.C  | 10 
 gcc/testsuite/g++.target/aarch64/mv-error3.C  | 13 ++
 gcc/testsuite/g++.target/aarch64/mv-error7.C  |  9 
 gcc/testsuite/g++.target/aarch64/mv-error8.C  | 21 +
 gcc/testsuite/g++.target/aarch64/mv-error9.C  | 12 +
 14 files changed, 289 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error13.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error7.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error8.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error9.C

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 687e6d4143a..f877dc4f6e3 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1327,6 +1327,13 @@ is_function_default_version (const tree decl)
 }
   else
 {
+  if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)))
+	{
+	  int num_def = 0;
+	  auto_vec versions = get_clone_versions (decl, &num_def);
+	  return num_def > 0;
+	}
+
   attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
   if (!attr)
 	return true;
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 642d724f6c6..f2cc43ad641 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -249,13 +249,11 @@ static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
   ATTR_EXCL ("always_inline", true, true, true),
   ATTR_EXCL ("target", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
 	 TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE),
-  ATTR_EXCL ("target_version", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
 static const struct attribute_spec::exclusions attr_target_version_exclusions[] =
 {
-  ATTR_EXCL ("target_clones", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 420bbba9be2..f6cb7903d88 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20671,7 +20671,51 @@ aarch64_common_function_versions (tree fn1, tree fn2)
   || TREE_CODE (fn2) != FUNCTION_DECL)
 return false;
 
-  return (aarch64_compare_version_priority (fn1, fn2) != 0);
+  if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn2)))
+{
+  tree temp = fn1;
+  fn1 = fn2;
+  fn2 = temp;
+}
+
+  if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn1)))
+{
+  auto_vec fn1_versions = get_clone_versions (fn1);
+  // fn1 is target_clone
+  if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn2)))
+	{
+	  auto_vec fn2_versions = get_clone_versions (fn2);
+	  for (string_slice v1 : fn1_ve

[PATCH v1 01/16] Add PowerPC FMV symbol tests.

2025-02-03 Thread Alfie Richards

This tests the mangling of function assembly names when annotated with
target_clones attributes.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mvc-symbols1.C: New test.
* g++.target/powerpc/mvc-symbols2.C: New test.
* g++.target/powerpc/mvc-symbols3.C: New test.
* g++.target/powerpc/mvc-symbols4.C: New test.
---
 .../g++.target/powerpc/mvc-symbols1.C | 47 +++
 .../g++.target/powerpc/mvc-symbols2.C | 35 ++
 .../g++.target/powerpc/mvc-symbols3.C | 41 
 .../g++.target/powerpc/mvc-symbols4.C | 29 
 4 files changed, 152 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols4.C

diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
new file mode 100644
index 000..9424382bf14
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_clones("default", "cpu=power6", "cpu=power6x")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_clones("cpu=power6x", "cpu=power6", "default")))
+int foo (int)
+{
+  return 2;
+}
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl _Z3foov\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl _Z3fooi\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6\n" 0 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6x\n" 1 } } */
diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
new file mode 100644
index 000..edf54480efd
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_clones("default", "cpu=power6", "cpu=power6x")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_clones("cpu=power6x", "cpu=power6", "default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, @gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times

[PATCH v1 12/16] Refactor FMV name mangling.

2025-02-03 Thread Alfie Richards

This patch is an overhaul of how FMV name mangling works. Previously
mangling logic was duplicated in several places across both target
specific and independent code. This patch changes this such that all
mangling is done in targetm.mangle_decl_assembler_name (including for the
dispatched symbol and dispatcher resolver).

This allows for the removing of previous hacks, such as where the default
mangled decl's assembler name was unmangled to then remangle all versions
and the resolver and dispatched symbol.

This does introduce a change though (shown in test changes) where
previously x86 for target annotated FMV sets set the function name to
the assembler name and remangled this. This was hard to reproduce without
resorting to hacks I wasn't comfortable with so the mangling is changed
to append ".ifunc" which matches clang.

This change also refactors expand_target_clone using
targetm.mangle_decl_assembler_name for mangling and get_clone_versions.

gcc/ChangeLog:

* attribs.cc (make_dispatcher_decl): Refactor to use
targetm.mangle_decl_assembler_name for mangling.
* config/aarch64/aarch64.cc (aarch64_parse_fmv_features): Change to
support string_slice.
(aarch64_process_target_version_attr): Ditto.
(get_feature_mask_for_version): Ditto.
(aarch64_mangle_decl_assembler_name): Refactor to handle FMV dispatched
symbol and resolver.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use
aarch64_mangle_decl_assembler_name for mangling.
(aarch64_generate_version_dispatcher_body): Ditto.
* config/i386/i386-features.cc (is_valid_asm_symbol): Moved from
multiple_target.cc.
(create_new_asm_name): Moved from gcc/multiple_target.cc.
(ix86_mangle_function_version_assembler_name): Refactor to handle FMV
dispatched symbol and resolver.
(ix86_mangle_decl_assembler_name): Ditto.
(ix86_get_function_versions_dispatcher): Refactor to use
ix86_mangle_decl_assembler_name for mangling.
(make_resolver_func): Ditto.
* config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): Refactor to
handle FMV dispatched symbol and resolver.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use riscv_mangle_decl_assembler_name
for mangling.
(riscv_generate_version_dispatcher_body): Ditto.
* config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): Refactor
to handle FMV dispatched symbol and resolver.
(make_resolver_func): Refactor to use
rs6000_mangle_function_version_assembler_name for mangling.
(is_valid_asm_symbol): Moved from gcc/multiple_target.cc.
(create_new_asm_name): Ditto.
(rs6000_mangle_function_version_assembler_name): Refactor to handle FMV
dispatched symbol and resolver.
* multiple_target.cc (create_dispatcher_calls): Refactored to use
targetm.mangle_decl_assembler_name for mangling.
(is_valid_asm_symbol): Moved to target specific code.
(create_new_asm_name): Ditto.
(expand_target_clones): Refactored to use
targetm.mangle_decl_assembler_name for mangling.

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Added logic to remangle FMV decls when
merging.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv-symbols1.C: Change FMV mangling.
* g++.target/i386/mv-symbols3.C: Ditto.
* g++.target/i386/mv-symbols4.C: Ditto.
* g++.target/i386/mv-symbols5.C: Ditto.
---
 gcc/attribs.cc  |  25 +++-
 gcc/config/aarch64/aarch64.cc   | 141 ---
 gcc/config/i386/i386-features.cc|  90 +---
 gcc/config/riscv/riscv.cc   |  95 ++---
 gcc/config/rs6000/rs6000.cc | 104 +-
 gcc/cp/decl.cc  |  13 ++
 gcc/multiple_target.cc  | 146 +---
 gcc/testsuite/g++.target/i386/mv-symbols1.C |  12 +-
 gcc/testsuite/g++.target/i386/mv-symbols3.C |  10 +-
 gcc/testsuite/g++.target/i386/mv-symbols4.C |  10 +-
 gcc/testsuite/g++.target/i386/mv-symbols5.C |  10 +-
 11 files changed, 375 insertions(+), 281 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index cb25845715d..687e6d4143a 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pretty-print.h"
 #include "intl.h"
 #include "gcc-urlifier.h"
+#include "cgraph.h"
 
 /* Table of the tables of attributes (common, language, format, machine)
searched.  */
@@ -1271,18 +1272,13 @@ common_function_versions (tree fn1, tree fn2)
 tree
 make_dispatcher_decl (const tree decl)
 {
-  tree func_decl;
-  char *func_name;
-  tree fn_type, func_type;
-
-  func_name = xstrdup (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  tree func_decl, 

Re: [PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1

2025-02-03 Thread Jakub Jelinek
On Mon, Feb 03, 2025 at 03:30:41PM +0100, Richard Biener wrote:
> The following checks we have a scalar int shift mode before
> enforcing it.  As AVR shows the mode can be a signed _Accum mode
> as well.
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> 
> OK if that succeeds?
> 
> Thanks,
> Richard.
> 
>   PR rtl-optimization/117611
>   * combine.cc (simplify_shift_const_1): Bail if not
>   scalar int mode.

LGTM.

>   * gcc.target/avr/pr117611.c: New testcase.

I don't see anything AVR specific here.
Move to gcc.dg/fixed-point/pr117611.c ?

> new file mode 100644
> index 000..c76093f12d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/avr/pr117611.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +_Accum acc1 (_Accum x)
> +{
> +return x << 16;
> +}
> -- 
> 2.43.0

Jakub



[PATCH] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-02-03 Thread Monk Chiang
According to Section 3.4.2, Vector Register Grouping, in the RISC-V
Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN
---
 gcc/config/riscv/riscv-v.cc   |   8 +-
 gcc/config/riscv/riscv-vector-switch.def  |  84 ++---
 .../gcc.target/riscv/rvv/autovec/pr111391-2.c |   2 +-
 .../gcc.target/riscv/rvv/base/abi-14.c|  84 ++---
 .../gcc.target/riscv/rvv/base/abi-16.c|  98 +++
 .../gcc.target/riscv/rvv/base/abi-18.c| 112 +-
 .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c |  73 
 7 files changed, 268 insertions(+), 193 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9847439ca77..24f3127e71d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode)
   int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
   if (size < TARGET_MIN_VLEN)
{
+ /* Follow rule LMUL >= SEW / ELEN.  */
+ int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2;
  int factor = TARGET_MIN_VLEN / size;
  if (inner_size == 8)
-   factor = MIN (factor, 8);
+   factor = MIN (factor, 8 / elen);
  else if (inner_size == 16)
-   factor = MIN (factor, 4);
+   factor = MIN (factor, 4 / elen);
  else if (inner_size == 32)
-   factor = MIN (factor, 2);
+   factor = MIN (factor, 2 / elen);
  else if (inner_size == 64)
factor = MIN (factor, 1);
  else
diff --git a/gcc/config/riscv/riscv-vector-switch.def 
b/gcc/config/riscv/riscv-vector-switch.def
index 23744d076f9..1b0d61940a6 100644
--- a/gcc/config/riscv/riscv-vector-switch.def
+++ b/gcc/config/riscv/riscv-vector-switch.def
@@ -64,13 +64,13 @@ Encode the ratio of SEW/LMUL into the mask types.
   |BI   |RVVM1BI|RVVMF2BI|RVVMF4BI|RVVMF8BI|RVVMF16BI|RVVMF32BI|RVVMF64BI|  */
 
 /* Return 'REQUIREMENT' for machine_mode 'MODE'.
-   For example: 'MODE' = RVVMF64BImode needs TARGET_MIN_VLEN > 32.  */
+   For example: 'MODE' = RVVMF64BImode needs TARGET_VECTOR_ELEN_64.  */
 #ifndef ENTRY
 #define ENTRY(MODE, REQUIREMENT, VLMUL, RATIO)
 #endif
 
 /* Disable modes if TARGET_MIN_VLEN == 32.  */
-ENTRY (RVVMF64BI, TARGET_MIN_VLEN > 32, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F8, 
64)
+ENTRY (RVVMF64BI, TARGET_VECTOR_ELEN_64, TARGET_XTHEADVECTOR ? LMUL_1 
:LMUL_F8, 64)
 ENTRY (RVVMF32BI, true, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F4, 32)
 ENTRY (RVVMF16BI, true, TARGET_XTHEADVECTOR ? LMUL_1 : LMUL_F2 , 16)
 ENTRY (RVVMF8BI, true, LMUL_1, 8)
@@ -85,7 +85,7 @@ ENTRY (RVVM2QI, true, LMUL_2, 4)
 ENTRY (RVVM1QI, true, LMUL_1, 8)
 ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16)
 ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32)
-ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
+ENTRY (RVVMF8QI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32.  */
 ENTRY (RVVM8HI, true, LMUL_8, 2)
@@ -93,7 +93,7 @@ ENTRY (RVVM4HI, true, LMUL_4, 4)
 ENTRY (RVVM2HI, true, LMUL_2, 8)
 ENTRY (RVVM1HI, true, LMUL_1, 16)
 ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32)
-ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
+ENTRY (RVVMF4HI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_BF_16.  */
 ENTRY (RVVM8BF, TARGET_VECTOR_ELEN_BF_16, LMUL_8, 2)
@@ -109,21 +109,21 @@ ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4)
 ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8)
 ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16)
 ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16 && !TARGET_XTHEADVECTOR, LMUL_F2, 32)
-ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32 && 
!TARGET_XTHEADVECTOR, LMUL_F4, 64)
+ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_64 && 
!TARGET_XTHEADVECTOR, LMUL_F4, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32.  */
 ENTRY (RVVM8SI, true, LMUL_8, 4)
 ENTRY (RVVM4SI, true, LMUL_4, 8)
 ENTRY (RVVM2SI, true, LMUL_2, 16)
 ENTRY (RVVM1SI, true, LMUL_1, 32)
-ENTRY (RVVMF2SI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F2, 64)
+ENTRY (RVVMF2SI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F2, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_32.  */
 ENTRY (RVVM8SF, TARGET_VECTOR_ELEN_FP_32, LMUL_8, 4)
 ENTRY (RVVM4SF, TARGET_VECTOR_ELEN_FP_32, LMUL_4, 8)
 ENTRY (RVVM2SF, TARGET_VECTOR_ELEN_FP_32, LMUL_2, 16)
 ENTRY (RVVM1SF, TARGET_VECTOR_ELEN_FP_32, LMUL_1, 32)
-ENTRY (RVVMF2SF, TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && 
!TARGET_XTHEADVECTOR, LMUL_F2, 64)
+ENTRY (RVVMF2SF, TARGET_VECTOR_ELEN_FP_32 && TARGET_VECTOR_ELEN_64 && 
!TARGET_XTHEADVECTOR, LMUL_F2, 64)
 
 /* Disable modes if !TARGET_VECTOR_ELEN_64.  */
 ENTRY (RVVM8DI, TARGET_VECTOR_ELE

Re: [PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1

2025-02-03 Thread Richard Biener
On Mon, 3 Feb 2025, Jakub Jelinek wrote:

> On Mon, Feb 03, 2025 at 03:30:41PM +0100, Richard Biener wrote:
> > The following checks we have a scalar int shift mode before
> > enforcing it.  As AVR shows the mode can be a signed _Accum mode
> > as well.
> > 
> > Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> > 
> > OK if that succeeds?
> > 
> > Thanks,
> > Richard.
> > 
> > PR rtl-optimization/117611
> > * combine.cc (simplify_shift_const_1): Bail if not
> > scalar int mode.
> 
> LGTM.
> 
> > * gcc.target/avr/pr117611.c: New testcase.
> 
> I don't see anything AVR specific here.
> Move to gcc.dg/fixed-point/pr117611.c ?

Done and pushed.

Richard.

> > new file mode 100644
> > index 000..c76093f12d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/avr/pr117611.c
> > @@ -0,0 +1,7 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os" } */
> > +
> > +_Accum acc1 (_Accum x)
> > +{
> > +return x << 16;
> > +}
> > -- 
> > 2.43.0
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] optabs: Fix widening optabs for vec-mode -> scalar-mode [PR116926]

2025-02-03 Thread Andrew Pinski
r15-4317-ga6f4404689f12 tried to add support for widending optabs
for vec-mode -> scalar-mode but it misunderstood how FOR_EACH_MODE worked,
the limit in this case is not inclusive. Which means setting limit to from,
would cause the loop not be executed at all. This fixes by setting the
limit to be the next mode after from mode.

Note the original version that added the widening optabs for vec-mode -> 
scalar-mode
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665021.html) didn't 
have this
bug, only the second version with suggested change
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665068.html) dud. The 
suggested
change missed this issue with FOR_EACH_MODE.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/116926

gcc/ChangeLog:

* optabs-query.cc (find_widening_optab_handler_and_mode): Fix
limit for `vec-mode -> scalar-mode` case.

Signed-off-by: Andrew Pinski 
---
 gcc/optabs-query.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 65eeb5d8e51..f5ca98da818 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -492,7 +492,7 @@ find_widening_optab_handler_and_mode (optab op, 
machine_mode to_mode,
 {
   gcc_checking_assert (VECTOR_MODE_P (from_mode)
   && GET_MODE_INNER (from_mode) < to_mode);
-  limit_mode = from_mode;
+  limit_mode = GET_MODE_NEXT_MODE (from_mode).require ();
 }
   else
 gcc_checking_assert (GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode)
-- 
2.43.0



[PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-02-03 Thread Monk Chiang
According to Section 3.4.2, Vector Register Grouping, in the RISC-V
Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN

gcc/ChangeLog:
* config/riscv/riscv-v.cc: Add restrict for insert LMUL.
config/riscv/riscv-vector-builtins-types.def:
Use RVV_REQUIRE_ELEN_64 to check LMUL number.
config/riscv/riscv-vector-switch.def: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr111391-2.c: Update test.
gcc.target/riscv/rvv/base/abi-14.c: Update test.
gcc.target/riscv/rvv/base/abi-16.c: Update test.
gcc.target/riscv/rvv/base/abi-18.c: Update test.
gcc.target/riscv/rvv/base/vsetvl_zve32f.c: New test.
---
 gcc/config/riscv/riscv-v.cc   |   8 +-
 .../riscv/riscv-vector-builtins-types.def | 322 +-
 gcc/config/riscv/riscv-vector-switch.def  |  84 ++---
 .../gcc.target/riscv/rvv/autovec/pr111391-2.c |   2 +-
 .../gcc.target/riscv/rvv/base/abi-14.c|  84 ++---
 .../gcc.target/riscv/rvv/base/abi-16.c|  98 +++---
 .../gcc.target/riscv/rvv/base/abi-18.c| 112 +++---
 .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c |  73 
 8 files changed, 429 insertions(+), 354 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9847439ca77..24f3127e71d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode)
   int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
   if (size < TARGET_MIN_VLEN)
{
+ /* Follow rule LMUL >= SEW / ELEN.  */
+ int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2;
  int factor = TARGET_MIN_VLEN / size;
  if (inner_size == 8)
-   factor = MIN (factor, 8);
+   factor = MIN (factor, 8 / elen);
  else if (inner_size == 16)
-   factor = MIN (factor, 4);
+   factor = MIN (factor, 4 / elen);
  else if (inner_size == 32)
-   factor = MIN (factor, 2);
+   factor = MIN (factor, 2 / elen);
  else if (inner_size == 64)
factor = MIN (factor, 1);
  else
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 6b98b93dfb6..857b63758a0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -369,20 +369,20 @@ along with GCC; see the file COPYING3. If not see
 #define DEF_RVV_XFQF_OPS(TYPE, REQUIRE)
 #endif
 
-DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_I_OPS (vint8mf4_t, 0)
 DEF_RVV_I_OPS (vint8mf2_t, 0)
 DEF_RVV_I_OPS (vint8m1_t, 0)
 DEF_RVV_I_OPS (vint8m2_t, 0)
 DEF_RVV_I_OPS (vint8m4_t, 0)
 DEF_RVV_I_OPS (vint8m8_t, 0)
-DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_I_OPS (vint16mf2_t, 0)
 DEF_RVV_I_OPS (vint16m1_t, 0)
 DEF_RVV_I_OPS (vint16m2_t, 0)
 DEF_RVV_I_OPS (vint16m4_t, 0)
 DEF_RVV_I_OPS (vint16m8_t, 0)
-DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_I_OPS (vint32m1_t, 0)
 DEF_RVV_I_OPS (vint32m2_t, 0)
 DEF_RVV_I_OPS (vint32m4_t, 0)
@@ -392,20 +392,20 @@ DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
 
-DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint8mf4_t, 0)
 DEF_RVV_U_OPS (vuint8mf2_t, 0)
 DEF_RVV_U_OPS (vuint8m1_t, 0)
 DEF_RVV_U_OPS (vuint8m2_t, 0)
 DEF_RVV_U_OPS (vuint8m4_t, 0)
 DEF_RVV_U_OPS (vuint8m8_t, 0)
-DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint16mf2_t, 0)
 DEF_RVV_U_OPS (vuint16m1_t, 0)
 DEF_RVV_U_OPS (vuint16m2_t, 0)
 DEF_RVV_U_OPS (vuint16m4_t, 0)
 DEF_RVV_U_OPS (vuint16m8_t, 0)
-DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint32m1_t, 0)
 DEF_RVV_U_OPS (vuint32m2_t, 0)
 DEF_RVV_U_OPS (vuint32m4_t, 0)
@@ -415,21 +415,21 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
 
-DEF_RVV_F_OPS (vbfloat16mf4_t, RVV_REQUIRE_ELEN_BF_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_F_OPS (vbfloat16mf4_t, RVV_REQUIRE_ELEN_BF_16 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_F_OPS (vbfloat16mf2_t, RVV_REQUIRE_ELEN_BF_16)
 DEF_RVV_F_OPS (vbfloat16m1_t,  RVV_REQUIRE_ELEN_BF_16)
 DEF_RVV_F_OPS (vbfloat16m2_t,  RVV_REQUIRE_ELEN_BF_16)
 DEF_RVV_F_OPS (vbfloat16m4_t,  RVV_REQUIRE_ELEN_BF_16)
 DEF_RVV_F_OPS (vbfloat16m8_t,  RVV_REQUIRE_ELEN_BF_16)
 
-DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)

Re: [PATCH] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-02-03 Thread Kito Cheng
cc Robin an Ju-Zhe

On Tue, Feb 4, 2025 at 3:16 PM Monk Chiang  wrote:
>
> According to Section 3.4.2, Vector Register Grouping, in the RISC-V
> Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN
> ---
>  gcc/config/riscv/riscv-v.cc   |   8 +-
>  gcc/config/riscv/riscv-vector-switch.def  |  84 ++---
>  .../gcc.target/riscv/rvv/autovec/pr111391-2.c |   2 +-
>  .../gcc.target/riscv/rvv/base/abi-14.c|  84 ++---
>  .../gcc.target/riscv/rvv/base/abi-16.c|  98 +++
>  .../gcc.target/riscv/rvv/base/abi-18.c| 112 +-
>  .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c |  73 
>  7 files changed, 268 insertions(+), 193 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9847439ca77..24f3127e71d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode)
>int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
>if (size < TARGET_MIN_VLEN)
> {
> + /* Follow rule LMUL >= SEW / ELEN.  */
> + int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2;
>   int factor = TARGET_MIN_VLEN / size;
>   if (inner_size == 8)
> -   factor = MIN (factor, 8);
> +   factor = MIN (factor, 8 / elen);
>   else if (inner_size == 16)
> -   factor = MIN (factor, 4);
> +   factor = MIN (factor, 4 / elen);
>   else if (inner_size == 32)
> -   factor = MIN (factor, 2);
> +   factor = MIN (factor, 2 / elen);
>   else if (inner_size == 64)
> factor = MIN (factor, 1);
>   else
> diff --git a/gcc/config/riscv/riscv-vector-switch.def 
> b/gcc/config/riscv/riscv-vector-switch.def
> index 23744d076f9..1b0d61940a6 100644
> --- a/gcc/config/riscv/riscv-vector-switch.def
> +++ b/gcc/config/riscv/riscv-vector-switch.def
> @@ -64,13 +64,13 @@ Encode the ratio of SEW/LMUL into the mask types.
>|BI   |RVVM1BI|RVVMF2BI|RVVMF4BI|RVVMF8BI|RVVMF16BI|RVVMF32BI|RVVMF64BI|  
> */
>
>  /* Return 'REQUIREMENT' for machine_mode 'MODE'.
> -   For example: 'MODE' = RVVMF64BImode needs TARGET_MIN_VLEN > 32.  */
> +   For example: 'MODE' = RVVMF64BImode needs TARGET_VECTOR_ELEN_64.  */
>  #ifndef ENTRY
>  #define ENTRY(MODE, REQUIREMENT, VLMUL, RATIO)
>  #endif
>
>  /* Disable modes if TARGET_MIN_VLEN == 32.  */
> -ENTRY (RVVMF64BI, TARGET_MIN_VLEN > 32, TARGET_XTHEADVECTOR ? LMUL_1 
> :LMUL_F8, 64)
> +ENTRY (RVVMF64BI, TARGET_VECTOR_ELEN_64, TARGET_XTHEADVECTOR ? LMUL_1 
> :LMUL_F8, 64)
>  ENTRY (RVVMF32BI, true, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F4, 32)
>  ENTRY (RVVMF16BI, true, TARGET_XTHEADVECTOR ? LMUL_1 : LMUL_F2 , 16)
>  ENTRY (RVVMF8BI, true, LMUL_1, 8)
> @@ -85,7 +85,7 @@ ENTRY (RVVM2QI, true, LMUL_2, 4)
>  ENTRY (RVVM1QI, true, LMUL_1, 8)
>  ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16)
>  ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32)
> -ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
> +ENTRY (RVVMF8QI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
>
>  /* Disable modes if TARGET_MIN_VLEN == 32.  */
>  ENTRY (RVVM8HI, true, LMUL_8, 2)
> @@ -93,7 +93,7 @@ ENTRY (RVVM4HI, true, LMUL_4, 4)
>  ENTRY (RVVM2HI, true, LMUL_2, 8)
>  ENTRY (RVVM1HI, true, LMUL_1, 16)
>  ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32)
> -ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
> +ENTRY (RVVMF4HI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
>
>  /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_BF_16.  */
>  ENTRY (RVVM8BF, TARGET_VECTOR_ELEN_BF_16, LMUL_8, 2)
> @@ -109,21 +109,21 @@ ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4)
>  ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8)
>  ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16)
>  ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16 && !TARGET_XTHEADVECTOR, LMUL_F2, 
> 32)
> -ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32 && 
> !TARGET_XTHEADVECTOR, LMUL_F4, 64)
> +ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_64 && 
> !TARGET_XTHEADVECTOR, LMUL_F4, 64)
>
>  /* Disable modes if TARGET_MIN_VLEN == 32.  */
>  ENTRY (RVVM8SI, true, LMUL_8, 4)
>  ENTRY (RVVM4SI, true, LMUL_4, 8)
>  ENTRY (RVVM2SI, true, LMUL_2, 16)
>  ENTRY (RVVM1SI, true, LMUL_1, 32)
> -ENTRY (RVVMF2SI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F2, 64)
> +ENTRY (RVVMF2SI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F2, 64)
>
>  /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_32.  */
>  ENTRY (RVVM8SF, TARGET_VECTOR_ELEN_FP_32, LMUL_8, 4)
>  ENTRY (RVVM4SF, TARGET_VECTOR_ELEN_FP_32, LMUL_4, 8)
>  ENTRY (RVVM2SF, TARGET_VECTOR_ELEN_FP_32, LMUL_2, 16)
>  ENTRY (RVVM1SF, TARGET_VECTOR_ELEN_FP_32, LMUL_1, 32)
> -ENTRY (RVVMF2SF, TARGET_VECTOR_

Re: [patch, fortran] Add modular exponentiation for unsigned

2025-02-03 Thread Thomas Koenig




Regression-tested on x86_64.


Seems I didn't look closely enough, I will check and resubmit.

Best regards

Thomas



Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Jonathan Wakely
On Sun, 2 Feb 2025, 18:10 Thomas Koenig via Gcc,  wrote:

> Hi,
>
> I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
> to gcc-patches also, as normal, but got back an e-mail that it
> was too large. and that a moderator would look at it.
>
> Maybe the limits can be increased a bit, sometimes patches can
> be quite large, especially if they contain large test cases
> or a large number of generated files.
>

The limits for gcc-patches are already larger than other lists, but 560kB
is pretty big. You can gzip the patch if it's too large.


> (Does anybody actually look at the messages, as promised in the e-mail?
>


I don't know about that list. There are moderators and mod queues for other
gcc lists.


>
>


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread Richard Biener
On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu  wrote:
>
> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> Author: Surya Kumari Jangala 
> Date:   Tue Jun 25 08:37:49 2024 -0500
>
> ira: Scale save/restore costs of callee save registers with block 
> frequency
>
> scales the cost of saving/restoring a callee-save hard register in epilogue
> and prologue with the entry block frequency, which, if not optimizing for
> size, is 1, for all targets.  As the result, callee-saved registers
> may not be used to preserve local variable values across calls on some
> targets, like x86.  Add a target hook for the callee-saved register cost
> scale in epilogue and prologue used by IRA.  The default version of this
> target hook returns 1 if optimizing for size, otherwise returns the entry
> block frequency.  Add an x86 version of this target hook to restore the
> old behavior prior to the above commit.
>
> PR rtl-optimization/111673
> PR rtl-optimization/115932
> PR rtl-optimization/116028
> PR rtl-optimization/117081
> PR rtl-optimization/117082
> PR rtl-optimization/118497
> * ira-color.cc (assign_hard_reg): Call the target hook for the
> callee-saved register cost scale in epilogue and prologue.
> * target.def (ira_callee_saved_register_cost_scale): New target
> hook.
> * targhooks.cc (default_ira_callee_saved_register_cost_scale):
> New.
> * targhooks.h (default_ira_callee_saved_register_cost_scale):
> Likewise.
> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> New.
> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
> New.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/config/i386/i386.cc | 11 +++
>  gcc/doc/tm.texi |  8 
>  gcc/doc/tm.texi.in  |  2 ++
>  gcc/ira-color.cc|  3 +--
>  gcc/target.def  | 12 
>  gcc/targhooks.cc|  8 
>  gcc/targhooks.h |  1 +
>  7 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f89201684a8..3128973ba79 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
>return false;
>  }
>
> +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> +
> +static int
> +ix86_ira_callee_saved_register_cost_scale (int)
> +{
> +  return 1;
> +}
> +
>  /* Return true if a set of DST by the expression SRC should be allowed.
> This prevents complex sets of likely_spilled hard regs before split1.  */
>
> @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
>  #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
> ix86_preferred_output_reload_class
>  #undef TARGET_CLASS_LIKELY_SPILLED_P
>  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> +  ix86_ira_callee_saved_register_cost_scale
>
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 0de24eda6f0..9f42913a4ef 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for given 
> pseudo from
>The default version of this target hook always returns given class.
>  @end deftypefn
>
> +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> (int @var{hard_regno})
> +A target hook which returns the callee-saved register @var{hard_regno}
> +cost scale in epilogue and prologue used by IRA.
> +
> +The default version of this target hook returns 1 if optimizing for
> +size, otherwise returns the entry block frequency.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
>  A target hook which returns true if we use LRA instead of reload pass.
>
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 631d04131e3..6dbe22581ca 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -2388,6 +2388,8 @@ in the reload pass.
>
>  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
>
> +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> +
>  @hook TARGET_LRA_P
>
>  @hook TARGET_REGISTER_PRIORITY
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 0699b349a1a..233060e1587 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>  + ira_memory_move_cost[mode][rclass][1])
> * saved_nregs / hard_regno_nregs (hard_regno,
>   mode) - 1)
> -  * (optimize_size ? 1 :
> -   

Re: Patch held up in gcc-patches due to size

2025-02-03 Thread Richard Biener
On Mon, Feb 3, 2025 at 9:55 AM Jonathan Wakely  wrote:
>
>
>
> On Sun, 2 Feb 2025, 18:10 Thomas Koenig via Gcc,  wrote:
>>
>> Hi,
>>
>> I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
>> to gcc-patches also, as normal, but got back an e-mail that it
>> was too large. and that a moderator would look at it.
>>
>> Maybe the limits can be increased a bit, sometimes patches can
>> be quite large, especially if they contain large test cases
>> or a large number of generated files.
>
>
> The limits for gcc-patches are already larger than other lists, but 560kB is 
> pretty big. You can gzip the patch if it's too large.
>
>>
>> (Does anybody actually look at the messages, as promised in the e-mail?
>
>
>
> I don't know about that list. There are moderators and mod queues for other 
> gcc lists.

I don't think we ever unlock too large mails.  But I'm not sure the
message you get
can be altered individually based on the reason of the moderation.

Richard.

>
>>
>>


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 5:27 PM Richard Biener
 wrote:
>
> On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu  wrote:
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala 
> > Date:   Tue Jun 25 08:37:49 2024 -0500
> >
> > ira: Scale save/restore costs of callee save registers with block 
> > frequency
> >
> > scales the cost of saving/restoring a callee-save hard register in epilogue
> > and prologue with the entry block frequency, which, if not optimizing for
> > size, is 1, for all targets.  As the result, callee-saved registers
> > may not be used to preserve local variable values across calls on some
> > targets, like x86.  Add a target hook for the callee-saved register cost
> > scale in epilogue and prologue used by IRA.  The default version of this
> > target hook returns 1 if optimizing for size, otherwise returns the entry
> > block frequency.  Add an x86 version of this target hook to restore the
> > old behavior prior to the above commit.
> >
> > PR rtl-optimization/111673
> > PR rtl-optimization/115932
> > PR rtl-optimization/116028
> > PR rtl-optimization/117081
> > PR rtl-optimization/117082
> > PR rtl-optimization/118497
> > * ira-color.cc (assign_hard_reg): Call the target hook for the
> > callee-saved register cost scale in epilogue and prologue.
> > * target.def (ira_callee_saved_register_cost_scale): New target
> > hook.
> > * targhooks.cc (default_ira_callee_saved_register_cost_scale):
> > New.
> > * targhooks.h (default_ira_callee_saved_register_cost_scale):
> > Likewise.
> > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> > New.
> > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
> > * doc/tm.texi: Regenerated.
> > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
> > New.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/i386.cc | 11 +++
> >  gcc/doc/tm.texi |  8 
> >  gcc/doc/tm.texi.in  |  2 ++
> >  gcc/ira-color.cc|  3 +--
> >  gcc/target.def  | 12 
> >  gcc/targhooks.cc|  8 
> >  gcc/targhooks.h |  1 +
> >  7 files changed, 43 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index f89201684a8..3128973ba79 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
> >return false;
> >  }
> >
> > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> > +
> > +static int
> > +ix86_ira_callee_saved_register_cost_scale (int)
> > +{
> > +  return 1;
> > +}
> > +
> >  /* Return true if a set of DST by the expression SRC should be allowed.
> > This prevents complex sets of likely_spilled hard regs before split1.  
> > */
> >
> > @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
> >  #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
> > ix86_preferred_output_reload_class
> >  #undef TARGET_CLASS_LIKELY_SPILLED_P
> >  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> > +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> > +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> > +  ix86_ira_callee_saved_register_cost_scale
> >
> >  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
> >  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 0de24eda6f0..9f42913a4ef 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for 
> > given pseudo from
> >The default version of this target hook always returns given class.
> >  @end deftypefn
> >
> > +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> > (int @var{hard_regno})
> > +A target hook which returns the callee-saved register @var{hard_regno}
> > +cost scale in epilogue and prologue used by IRA.
> > +
> > +The default version of this target hook returns 1 if optimizing for
> > +size, otherwise returns the entry block frequency.
> > +@end deftypefn
> > +
> >  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
> >  A target hook which returns true if we use LRA instead of reload pass.
> >
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 631d04131e3..6dbe22581ca 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -2388,6 +2388,8 @@ in the reload pass.
> >
> >  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> >
> > +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> > +
> >  @hook TARGET_LRA_P
> >
> >  @hook TARGET_REGISTER_PRIORITY
> > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > index 0699b349a1a..233060e1587 100644
> > --- a/gcc/ira-color.cc
> > +++ b/gcc/ira-color.cc
> > @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno

Re: [PATCH 0/61] Improve Mips target

2025-02-03 Thread Richard Biener
On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic
 wrote:
>
> This patch series improves the support for the mips64r6 target in GCC,
> includes the enhancements to the general bug fixes and contains other
> MIPS ISA and processor enablement.
>
> These patches are cherry-picked from the mips_rel/11_2_0/master
> and mips_rel/9_3_0/master branches from the MIPS' repository:
> https://github.com/MIPS/gcc .
> Further details on the individual changes are included in the
> respective patches.

Please split up this series at least into patches that solely affect mips/
and send patches that touch middle-end parts separately.  A 61 patches
series is unlikely to be looked at this way.

Richard.


Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 5:21 PM Richard Biener
 wrote:
>
> On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu  wrote:
> >
> > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> >  wrote:
> > >
> > >
> > >
> > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu :
> > > >
> > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > >  wrote:
> > > >>
> > > >>
> > > >>
> > >  Am 02.02.2025 um 08:00 schrieb H.J. Lu :
> > > >>>
> > > >>> Don't increase callee-saved register cost by 1000x, which leads to 
> > > >>> that
> > > >>> callee-saved registers aren't used to preserve local variable values
> > > >>> across calls, by capping the scale to 300.
> > > >>
> > > >>>   PR rtl-optimization/111673
> > > >>>   PR rtl-optimization/115932
> > > >>>   PR rtl-optimization/116028
> > > >>>   PR rtl-optimization/117081
> > > >>>   PR rtl-optimization/118497
> > > >>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > > >>>   scale to 300.
> > > >>>
> > > >>> Signed-off-by: H.J. Lu 
> > > >>> ---
> > > >>> gcc/ira-color.cc | 16 ++--
> > > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > > >>> index 0699b349a1a..707ff188250 100644
> > > >>> --- a/gcc/ira-color.cc
> > > >>> +++ b/gcc/ira-color.cc
> > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool 
> > > >>> retry_p)
> > > >>> /* We need to save/restore the hard register in
> > > >>>epilogue/prologue.  Therefore we increase the cost.  */
> > > >>> {
> > > >>> +int scale;
> > > >>> +if (optimize_size)
> > > >>> +  scale = 1;
> > > >>> +else
> > > >>> +  {
> > > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > >>> +/* Don't increase callee-saved register cost by 1000x,
> > > >>> +   which leads to that callee-saved registers aren't
> > > >>> +   used to preserve local variable values across calls,
> > > >>> +   by capping the scale to 300.  */
> > > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > > >>> +  scale = 300;
> > > >>
> > > >> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have 
> > > >> expected to scale this down to [0, 300] or is MAX a magic value?
> > > >
> > > > There are
> > > >
> > > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > >   This constant does not need to be high, as in infrequently executed
> > > >   regions we want to count instructions equivalently to optimize for
> > > >   size instead of speed.  */
> > > > #define REG_FREQ_MAX 1000
> > > >
> > > > /* Compute register frequency from the BB frequency.  When optimizing 
> > > > for size,
> > > >   or profile driven feedback is available and the function is never 
> > > > executed,
> > > >   frequency is always equivalent.  Otherwise rescale the basic block
> > > >   frequency.  */
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)  
> > > >   \
> > > >   || !cfun->cfg->count_max.initialized_p 
> > > > ()) \
> > > >  ? REG_FREQ_MAX 
> > > >  \
> > > >  : ((bb)->count.to_frequency (cfun) 
> > > >  \
> > > >* REG_FREQ_MAX / BB_FREQ_MAX)
> > > >  \
> > > >  ? ((bb)->count.to_frequency (cfun) 
> > > >  \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX)   
> > > >  \
> > > >  : 1)
> > > >
> > > > 1000 is the default.  If it isn't 1000, it isn't the default.  I only 
> > > > want
> > > > to get a more reasonable default scale, instead of 1000.   Lower
> > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > >
> > > I see.  Why not adjust the above macro then?  That would be a bit more 
> > > obvious.  Like use MAX/2 or so?
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala 
> > Date:   Tue Jun 25 08:37:49 2024 -0500
> >
> > ira: Scale save/restore costs of callee save registers with block 
> > frequency
> >
> > uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
> > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > not as a cost scale.  Maybe the above commit should be reverted and we add
> > a target hook for callee-saved register cost scale.  Each target can choose
> > a proper cost scale, install of increasing the cost by 1000x for everyone.
>
> I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> as it doesn't seem to be used.  The comment talks about profile feedback,
> but for example with -fprofile-correction or -fpartial-profile this
> test looks odd.
> In fact optimize_function_for_size_p should already handle this correctly.
>
> Also REG_FREQ_FROM_BB simply document

  1   2   >