date:20250429

[PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-29 Thread H.J. Lu

AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
avoid the following regressions with

$ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"

FAIL: gcc.dg/pr105911.c (internal compiler error: in lra_split_hard_reg_for, at 
lra-assigns.cc:1863)
FAIL: gcc.dg/pr105911.c (test for excess errors)
FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times vpro[lr]q 
29
FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw

Tested with glibc master branch at

commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
Author: Samuel Thibault 
Date:   Sun Mar 2 15:16:45 2025 +0100

htl: move pthread_once into libc

and built Linux kernel 6.13.5 on x86-64.

PR target/119083
* config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
and BREG.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fcdfb3f1f5c..ddefc0f88d9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20757,7 +20757,6 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
   case AREG:
   case DREG:
   case CREG:
-  case BREG:
   case AD_REGS:
   case SIREG:
   case DIREG:
-- 
2.49.0

Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

2025-04-29 Thread Andrew MacLeod



On 4/28/25 17:26, Andrew MacLeod wrote:
I have committed this patch to trunk after bootstrap/regression 
testing again on trunk.


I'll get to gcc14/15 once I flush the current queue.

Andrew


On 4/17/25 06:44, Richard Biener wrote:
On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod  
wrote:

This was a fun one!   An actual bug, and it took a while to sort out.
After chasing down some red herrings, this turns out to be an issue of
interaction between the range and value masks and intervening 
calculations.


The original patch from 11/2023 adjusts intersection so that it can
enhance subranges based on the value mask.  ie in this testcase

[irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1

   If adjust_range() were called on this, it would eliminate the 
trailing

mask/value bit ranges that are invalid and turn it into :

[-INF, -3][1, 1][4, 2147483626] MASK 0xfffc VALUE 0x1

reflecting the lower bits into the range.   The problem develops 
because

we only apply adjust_range ()  during intersection in an attempt to
avoid expensive work when it isnt needed.

Unfortunately, that is what triggers this infinite loop. Rangers cache
propagates changes, and the algorithm is designed to always improve the
range.  In this case, the first iteration through, _11 receives the
above value, [irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1
which via the mask, excludes 0, 2 and 3.

The ensuing calculations in block 7 do not trigger a successful
intersection operation, and thus the range pairs are never expanded to
eliminate the lower ranges, and it triggers another change in values
which leads to the next iteration being less precise, but not obviously
so. [irange] int [-INF, 2147483644] MASK 0xfffd VALUE 0x0 is a
result of the calculation.   As ranges as suppose to always get better
with this algorithm, we simply compare for difference.. and this range
is different, and thus we replace it. It only excludes 2 and 3.

Next iteration through the less precise range DOES trigger an
intersection operation in block 7, and when that is expanded to 
[irange]
int [-INF, 1][4, 2147483644] MASK 0xfffd VALUE 0x0 using that we 
can

again create the more precise range for _11 that started the cycle. and
we go on and on and on.

If we fix this so that we always expand subranges to reflect the lower
bits in a bitmask, the initial value starts with

[irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffc VALUE 0x1

And everything falls into place as it should.  The fix is to be
consistent about expanding those lower subranges.

I also added a couple of minor performance tweaks to avoid unnecessary
work, along with removing adjust_range () directly into
set_range_from_bitmask () .

I started at a 0.2% overall compilation increase (1.8% in VRP). In the
end, this patch is down to 0.6% in VRP, and only 0.08% overall, so
manageable for all the extra work.

It also causes a few ripples in the testsuite so 3 test cases also
needed adjustment:

   * gcc.dg/pr83072-2.c :  With the addition of the expanded ranges, 
CCP

use to export a global:
  Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffe
VALUE 0x1
and now
 Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffe VALUE 0x1
Which in turn enables forwprop to collapse part of the testcase much
earlier. So I turned off forwprop for the testcase

* gcc.dg/tree-ssa/phi-opt-value-5.c  : WIth the expanded ranges, CCP2
pass use to export:
 Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffe
VALUE 0x1
and now
 Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffe VALUE 0x1
which in turn makes the following comment obsolete as the optimization
does happen earlier.:
/* fdiv1 requires until later than phiopt2 to be able to detect that
 d is non-zero. to be able to remove the conditional.  */
Adjusted the testcase to expect everything to be taken care of by
phi-opt2 pass.

   * gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported:
 Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK
0xfff0 VALUE 0x0
and then EVRP refined that and stored it, then the testcase tested for:
 Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK
0xfff0 VALUE 0x0
Now, CCP itself exported the expanded range, so there is nothing for 
VRP

to do.
adjusted the testcase to look for the expanded range in CCP.

Now we never get into this situation where the bitmask is explicitly
applied in some places and not others.

Bootstraps on x86_64-pc-linux-gnu with no regressions. Finally.   Is
this OK for trunk, or should I hold off a little bit?
Please wait a little bit until after 15.1 is out.  It's then OK for 
trunk in

stage1 and backports when no issues show up.

Thanks,
Richard.


Andrew

This is now in trunk.   Attached are the patches for gcc15 and gcc14.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

Do you want me to check it in for either or both branches?

Andrew
Fr

Re: [PATCH] PR tree-optimization/95801 - infer non-zero for integral division RHS.

2025-04-29 Thread Andrew MacLeod



On 4/28/25 17:26, Andrew MacLeod wrote:
I have committed this patch to trunk after bootstrap/regression 
testing again on trunk.


I'll get to gcc14/15 once I flush the current queue.

Andrew

On 1/23/25 04:39, Richard Biener wrote:
On Wed, Jan 22, 2025 at 12:49 AM Andrew MacLeod  
wrote:

This patch simply adds an op2_range to operator_div which returns
non-zero if the LHS is not undefined.  This means given and integral
division:

 x = y / z

'z' will have a range of   [-INF, -1] [1, +INF]  after execution of the
statement.

This is relatively straightforward and resolves the PR, but I also get
that we might not want to proliferate an inferred range of undefined
behavior at this late stage.

OK for trunk, or defer to stage 1?  Are there any flags that need to be
checked to make this valid?

Stage 1 please.  I don't think this needs any flags.

Richard.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

Andrew
This is now in trunk.   Attached are the patches for gcc15, gcc14, and 
gcc13.


Bootstrapped with no regressions on x86_64-pc-linux-gnu.

Do you want me to check it in for any or all of those branches?    I 
cant go back further than gcc13 due to a lack of inferred range processing.


Andrew
From 05ea5fd870eaa632eec6a90f1ea171bc0bc1f571 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 21 Jan 2025 11:49:12 -0500
Subject: [PATCH 2/2] Infer non-zero for integral division RHS.

Adding op2_range for operator_div allows ranger to notice the divisor
is non-zero after execution.

	PR tree-optimization/95801
	gcc/
	* range-op.cc (operator_div::op2_range): New.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr95801.c: New.
---
 gcc/range-op.cc | 16 
 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c | 13 +
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index f72b4ae92cf..5c0bcdc3b37 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2415,8 +2415,11 @@ operator_widen_mult_unsigned::wi_fold (irange &r, tree type,
 class operator_div : public cross_product_operator
 {
   using range_operator::update_bitmask;
+  using range_operator::op2_range;
 public:
   operator_div (tree_code div_kind) { m_code = div_kind; }
+  bool op2_range (irange &r, tree type, const irange &lhs, const irange &,
+		  relation_trio) const;
   virtual void wi_fold (irange &r, tree type,
 		const wide_int &lh_lb,
 		const wide_int &lh_ub,
@@ -2436,6 +2439,19 @@ static operator_div op_floor_div (FLOOR_DIV_EXPR);
 static operator_div op_round_div (ROUND_DIV_EXPR);
 static operator_div op_ceil_div (CEIL_DIV_EXPR);
 
+// Set OP2 to non-zero if the LHS isn't UNDEFINED.
+bool
+operator_div::op2_range (irange &r, tree type, const irange &lhs,
+			 const irange &, relation_trio) const
+{
+  if (!lhs.undefined_p ())
+{
+  r.set_nonzero (type);
+  return true;
+}
+  return false;
+}
+
 bool
 operator_div::wi_op_overflows (wide_int &res, tree type,
 			   const wide_int &w0, const wide_int &w1) const
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c
new file mode 100644
index 000..c3c80a045cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-options "-O2 -fdump-tree-evrp" }
+
+int always1(int a, int b) {
+if (a / b)
+return b != 0;
+return 1;
+}
+
+// If b != 0 is optimized by recognizing divide by 0 cannot happen,
+// there should be no PHI node.
+
+// { dg-final { scan-tree-dump-not "PHI" "evrp" } }
-- 
2.45.0

From e68da32b60f24c52179fb239620f8fc3f160f56c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 29 Apr 2025 10:27:29 -0400
Subject: [PATCH 3/3] Infer non-zero for integral division RHS.

Adding op2_range for operator_div allows ranger to notice the divisor
is non-zero after execution.

	PR tree-optimization/95801
	gcc/
	* range-op.cc (operator_div::op2_range): New.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr95801.c: New.
---
 gcc/range-op.cc | 16 
 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c | 13 +
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6ea0d9935eb..b776e9da170 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2334,8 +2334,11 @@ operator_widen_mult_unsigned::wi_fold (irange &r, tree type,
 
 class operator_div : public cross_product_operator
 {
+  using range_operator::op2_range;
 public:
   operator_div (tree_code div_kind) { m_code = div_kind; }
+  bool op2_range (irange &r, tree type, const irange &lhs, const irange &,
+		  relation_trio) const;
   virtual void wi_fold (irange &r, tree type,
 		const wide_int &lh_lb,
 		const wide_int &lh_ub,
@@ -2355,6 +2358,19 @@ static operator_div op_floor_div (FLOOR_DIV_EXPR);
 static operator_div op_round_

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 11:52 PM Jonathan Wakely  wrote:

> On Tue, 29 Apr 2025 at 14:55, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Tue, Apr 29, 2025 at 2:55 PM Luc Grosheintz 
> wrote:
> >>
> >> This implements std::extents from  according to N4950 and
> >> contains partial progress towards PR107761.
> >>
> >> If an extent changes its type, there's a precondition in the standard,
> >> that the value is representable in the target integer type. This
> >> precondition is not checked at runtime.
> >>
> >> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> >> For extents this precondition is always violated and results in
> >> calling __builtin_trap. For all other specializations it's checked via
> >> __glibcxx_assert.
> >>
> >> PR libstdc++/107761
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >> * include/std/mdspan (extents): New class.
> >> * src/c++23/std.cc.in: Add 'using std::extents'.
> >>
> >> Signed-off-by: Luc Grosheintz 
> >> ---
> >>  libstdc++-v3/include/std/mdspan  | 262 +++
> >>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
> >>  2 files changed, 267 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> >> index 4094a416d1e..39ced1d6301 100644
> >> --- a/libstdc++-v3/include/std/mdspan
> >> +++ b/libstdc++-v3/include/std/mdspan
> >> @@ -33,6 +33,12 @@
> >>  #pragma GCC system_header
> >>  #endif
> >>
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >>  #define __glibcxx_want_mdspan
> >>  #include 
> >>
> >> @@ -41,6 +47,262 @@
> >>  namespace std _GLIBCXX_VISIBILITY(default)
> >>  {
> >>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> +  namespace __mdspan
> >> +  {
> >> +template
> >> +  class _ExtentsStorage
> >> +  {
> >> +  public:
> >> +   static consteval bool
> >> +   _S_is_dyn(size_t __ext) noexcept
> >> +   { return __ext == dynamic_extent; }
> >> +
> >> +   template
> >> + static constexpr _IndexType
> >> + _S_int_cast(const _OIndexType& __other) noexcept
> >> + { return _IndexType(__other); }
> >> +
> >> +   static constexpr size_t _S_rank = _Extents.size();
> >> +
> >> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> >> +   // of dynamic extents up to (and not including) __r.
> >> +   //
> >> +   // If __r is the index of a dynamic extent, then
> >> +   // _S_dynamic_index[__r] is the index of that extent in
> >> +   // _M_dynamic_extents.
> >> +   static constexpr auto _S_dynamic_index = [] consteval
> >> +   {
> >> + array __ret;
> >> + size_t __dyn = 0;
> >> + for(size_t __i = 0; __i < _S_rank; ++__i)
> >> +   {
> >> + __ret[__i] = __dyn;
> >> + __dyn += _S_is_dyn(_Extents[__i]);
> >> +   }
> >> + __ret[_S_rank] = __dyn;
> >> + return __ret;
> >> +   }();
> >> +
> >> +   static constexpr size_t _S_rank_dynamic =
> _S_dynamic_index[_S_rank];
> >> +
> >> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r]
> is the
> >> +   // index of the __r-th dynamic extent in _Extents.
> >> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> >> +   {
> >> + array __ret;
> >> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> >> +   if (_S_is_dyn(_Extents[__i]))
> >> + __ret[__r++] = __i;
> >> + return __ret;
> >> +   }();
> >> +
> >> +   static constexpr size_t
> >> +   _S_static_extent(size_t __r) noexcept
> >> +   { return _Extents[__r]; }
> >> +
> >> +   constexpr _IndexType
> >> +   _M_extent(size_t __r) const noexcept
> >> +   {
> >> + auto __se = _Extents[__r];
> >> + if (__se == dynamic_extent)
> >> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> >> + else
> >> +   return __se;
> >> +   }
> >> +
> >> +   template
> >> + constexpr void
> >> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> >> + {
> >> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> >> + {
> >> +   size_t __di = __i;
> >> +   if constexpr (_OtherRank != _S_rank_dynamic)
> >> + __di = _S_dynamic_index_inv[__i];
> >> +   _M_dynamic_extents[__i] =
> _S_int_cast(__get_extent(__di));
> >> + }
> >> + }
> >> +
> >> +   constexpr
> >> +   _ExtentsStorage() noexcept = default;
> >> +
> >> +   template
> >> + constexpr
> >> + _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
> >> + __other) noexcept
> >> + {
> >> +   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
> >> + { return __other._M_extent(__i); });
> >> + }
> >> +
> >> +   template
> >> +

[PATCH v2] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-29 Thread Jerry Zhang Jian

The Zve32x extension depends on the Zicsr extension.
Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr

gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-19.c: set the march to rv64i_zve32x
  instead of rv64gc_zve32x to avoid Zicsr implied by g, add -c to
  avoid multilib not supported in the test time

Signed-off-by: Jerry Zhang Jian 
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-19.c | 34 ++
 2 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 15df22d5377..145a0f2bd95 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zve64f", "f"},
   {"zve64d", "d"},
 
+  {"zve32x", "zicsr"},
   {"zve32x", "zvl32b"},
   {"zve32f", "zve32x"},
   {"zve32f", "zvl32b"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index 2b90702192b..d1d44fec577 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
+/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64 -c -mcmodel=medlow 
-misa-spec=2.2" } */
 
 int main () {
 
@@ -15,40 +15,12 @@ int main () {
 #error "__riscv_i"
 #endif
 
-#if !defined(__riscv_c)
-#error "__riscv_c"
-#endif
-
 #if defined(__riscv_e)
 #error "__riscv_e"
 #endif
 
-#if !defined(__riscv_a)
-#error "__riscv_a"
-#endif
-
-#if !defined(__riscv_m)
-#error "__riscv_m"
-#endif
-
-#if !defined(__riscv_f)
-#error "__riscv_f"
-#endif
-
-#if !defined(__riscv_d)
-#error "__riscv_d"
-#endif
-
-#if defined(__riscv_v)
-#error "__riscv_v"
-#endif
-
-#if defined(__riscv_zvl128b)
-#error "__riscv_zvl128b"
-#endif
-
-#if defined(__riscv_zvl64b)
-#error "__riscv_zvl64b"
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
 #endif
 
 #if !defined(__riscv_zvl32b)
-- 
2.49.0

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 3:53 PM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 9:34 PM Richard Biener
>  wrote:
> >
> > On Tue, Apr 29, 2025 at 2:33 PM H.J. Lu  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 6:46 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES 
> > > > > > > > > to return
> > > > > > > > > true, all integer arguments smaller than int are passed as 
> > > > > > > > > int:
> > > > > > > > >
> > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > > > extern int baz (char c1);
> > > > > > > > >
> > > > > > > > > int
> > > > > > > > > foo (char c1)
> > > > > > > > > {
> > > > > > > > >   return baz (c1);
> > > > > > > > > }
> > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > > > .file "x.c"
> > > > > > > > > .text
> > > > > > > > > .p2align 4
> > > > > > > > > .globl foo
> > > > > > > > > .type foo, @function
> > > > > > > > > foo:
> > > > > > > > > .LFB0:
> > > > > > > > > .cfi_startproc
> > > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > > movl %eax, 4(%esp)
> > > > > > > > > jmp baz
> > > > > > > > > .cfi_endproc
> > > > > > > > > .LFE0:
> > > > > > > > > .size foo, .-foo
> > > > > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > > > > > .section .note.GNU-stack,"",@progbits
> > > > > > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > > > > > >
> > > > > > > > > But integer promotion:
> > > > > > > > >
> > > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > > movl %eax, 4(%esp)
> > > > > > > > >
> > > > > > > > > isn't necessary if incoming arguments are copied to outgoing 
> > > > > > > > > arguments
> > > > > > > > > directly.
> > > > > > > > >
> > > > > > > > > Add a new target hook, 
> > > > > > > > > TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, defaulting
> > > > > > > > > to return nullptr.  If the new target hook returns 
> > > > > > > > > non-nullptr, use it to
> > > > > > > > > get the outgoing small integer argument.  The x86 target hook 
> > > > > > > > > returns the
> > > > > > > > > value of the corresponding incoming argument as int if it can 
> > > > > > > > > be used as
> > > > > > > > > the outgoing argument.  If callee is a global function, we 
> > > > > > > > > always properly
> > > > > > > > > extend the incoming small integer arguments in callee.  If 
> > > > > > > > > callee is a
> > > > > > > > > local function, since DECL_ARG_TYPE has the original small 
> > > > > > > > > integer type,
> > > > > > > > > we will extend the incoming small integer arguments in callee 
> > > > > > > > > if needed.
> > > > > > > > > It is safe only if
> > > > > > > > >
> > > > > > > > > 1. Caller and callee are not nested functions.
> > > > > > > > > 2. Caller and callee use the same ABI.
> > > > > > > >
> > > > > > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > > > > > should apply to all of them, no?
> > > > > > >
> > > > > > > When the arguments are passed in different registers in different 
> > > > > > > ABIs,
> > > > > > > we have to copy them anyway.
> > > > > >
> > > > > > But optimization can elide copies easily, but not easily elide
> > > > > > sign-/zero-extensions.
> > > > >
> > > > > What I meant was that caller and callee have different ABIs.
> > > > > Optimizer can't elide copies since incoming arguments and outgoing
> > > > > arguments are in different registers.  They have to be moved.
> > > > >
> > > > > > > >
> > > > > > > > > 3. The incoming argument and the outgoing argument are in the 
> > > > > > > > > same
> > > > > > > > > location.
> > > > > > > >
> > > > > > > > Why's that?  Can't we move them but still elide the 
> > > > > > > > sign-/zero-extension?
> > > > > > >
> > > > > > > If they aren't in the same locations, we have to move them anyway.
> > > > > > > This patch tries to avoid necessary moves of incoming arguments to
> > > > > > > outgoing arguments.
> > > > > >
> > > > > > That's not exactly how you presented it, but you convenitently used
> > > > > > x86 stack argument passing.  That might be difficult to elide, but 
> > > > > > is
> > > > > > also uncommon for "small integer types" - does the same issue not
> > > > > > apply to other arguments passed on the stack as well?
> > > > >
> > > > > It applies to both passing in registers and on stack.   It is an 
> > > > > issue only
> > > > > for small integer types due to sign-/zero-extensions at call sites.  
> > > > > My
> > > > > patch elides sign-/zero-extensions when incoming arguments a

Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  wrote:
>
> When we have an empty function, things can go wrong with
> cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> the only thing the function does is a call to __builtin_unreachable,
> let's expand that to a __builtin_trap instead. For most targets that
> is one instruction wide so it won't hurt things that much and we get
> correct behavior for exceptions and some linkers will be better for it.
>
> The only thing I have a concern about is that some targets still
> don't define a trap instruction. I tried to emit a nop instead of
> an abort but that nop is removed during RTL DCE.
> Should we just push targets to define a trap instead?
> E.g. BPF, avr and sh are the 3 semi active targets which still don't
> have a trap defined.

Do any of those targets have the cfi_startproc/cfi_endproc issue
or exceptions are relevant on those?

I'd say guard this with targetm.have_trap (), there's the chance that
say on avr the expansion to abort() might fail to link in a
freestanding environment.

As for the nop, if you mark it volatile does it prevail?

> The QOI idea for basic block reorder is recorded as PR 120004.
>
> Changes since v1:
> * v2: Move to final gimple cfg cleanup instead of expand and use
>   BUILT_IN_UNREACHABLE_TRAP.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR middle-end/109267
>
> gcc/ChangeLog:
>
> * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If the 
> first
> non debug statement in the first (and only) basic block is a call
> to __builtin_unreachable change it to a call to __builtin_trap.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr109267-1.c: New test.
> * gcc.dg/pr109267-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
>  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
>  gcc/tree-cfgcleanup.cc| 14 ++
>  3 files changed, 42 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> b/gcc/testsuite/gcc.dg/pr109267-1.c
> new file mode 100644
> index 000..d6df2c3b49a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* PR middle-end/109267 */
> +
> +int f(void)
> +{
> +  __builtin_unreachable();
> +}
> +
> +/* This unreachable should be changed to be a trap. */
> +
> +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
> "optimized"} } */
> +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" "optimized"} 
> } */
> diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> b/gcc/testsuite/gcc.dg/pr109267-2.c
> new file mode 100644
> index 000..6cd1419a1e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* PR middle-end/109267 */
> +void g(void);
> +int f(int *t)
> +{
> +  g();
> +  __builtin_unreachable();
> +}
> +
> +/* The unreachable should stay a unreachable. */
> +/* { dg-final { scan-tree-dump-not "__builtin_unreachable trap \\\(" 
> "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "__builtin_unreachable \\\(" 1 
> "optimized"} } */
> diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
> index 9a8a668e12b..38a62499f93 100644
> --- a/gcc/tree-cfgcleanup.cc
> +++ b/gcc/tree-cfgcleanup.cc
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cgraph.h"
>  #include "tree-into-ssa.h"
>  #include "tree-cfgcleanup.h"
> +#include "target.h"
>
>
>  /* The set of blocks in that at least one of the following changes happened:
> @@ -1530,6 +1531,19 @@ execute_cleanup_cfg_post_optimizing (void)
>cleanup_dead_labels ();
>if (group_case_labels ())
>  todo |= TODO_cleanup_cfg;
> +
> +  basic_block bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
> +  /* If the first (and only) bb and the only non debug
> + statement is __builtin_unreachable call, then replace it with a trap
> + so the function is at least one instruction in size.  */
> +  if (!gsi_end_p (gsi)
> +  && gimple_call_builtin_p (gsi_stmt (gsi), BUILT_IN_UNREACHABLE))
> +{
> +  gimple_call_set_fndecl (gsi_stmt (gsi), builtin_decl_implicit 
> (BUILT_IN_UNREACHABLE_TRAP));
> +  update_stmt (gsi_stmt (gsi));
> +}
> +
>if ((flag_compare_debug_opt || flag_compare_debug)
>&& flag_dump_final_insns)
>  {
> --
> 2.43.0
>

[PATCH] sreal.h: fix typo in the comment for sreal::max

2025-04-29 Thread Vojtěch Káně

---
 gcc/sreal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/sreal.h b/gcc/sreal.h
index 8700807a131..c5aef1f3a82 100644
--- a/gcc/sreal.h
+++ b/gcc/sreal.h
@@ -118,7 +118,7 @@ public:
 return min;
   }
 
-  /* Global minimum sreal can hold.  */
+  /* Global maximum sreal can hold.  */
   inline static sreal max ()
   {
 sreal max;
-- 
2.30.2

Re: [PATCH] PR tree-optimization/119471 - If the LHS does not contain zero, neither do multiply operands.

2025-04-29 Thread Richard Biener

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod  wrote:
>
>
> On 3/28/25 10:36, Andrew MacLeod wrote:
> > On 3/28/25 03:19, Richard Biener wrote:
> >> On Fri, Mar 28, 2025 at 12:28 AM Andrew MacLeod 
> >> wrote:
> >>> This patch fixes both 119471 and the remainder of 110992.
> >>>
> >>> At issue is we do not recognize that if
> >>>
> >>> "a * b != 0" , then neither "a" nor "b" can be zero.
> >>>
> >>> This is fairly trivial with range-ops.   op1_range and op2_range for
> >>> operator_mult are taught that if the LHS does not contain zero, than
> >>> neither does either operand.
> >>>
> >>> Included are patches for trunk (gcc15), gcc14, and gcc13.  All are
> >>> basically the same few lines.
> >>>
> >>> I presume we want to wait for stage 1 to check this into trunk .
> >>>
> >>> Bootstraps with no regressions on x86_64-pc-linux-gnu on all 3
> >>> branches.  OK for gcc13 and gcc14 branches?
> >> This is OK for branches only after it was on trunk.  Since one of the
> >> PRs is a regression it's technically OK for trunk now.
> >>
> >> Richard.
> >>
> > OK, it should be perfectly safe.  Committed to trunk.
> >
> > Andrew
> >
> This patch was in trunk when gcc15 was forked, so gcc15 is already
> covered.   Attached are the patches for gcc14 and gcc13.
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.
>
> Do you want me to check it in for either or both branches?

None of them I think, only one of the PRs is marked as regression
but it doesn't look important enough and we should be careful
about optimization regressions fixing "late" in the cycle (usually
OK for N.2, but not so much later).  So definitely not 13, not 14
either unless somebody else expresses a strong opinion here.

Richard.

>
> Andrew
>

Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

2025-04-29 Thread Richard Biener

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod  wrote:
>
>
> On 4/28/25 17:26, Andrew MacLeod wrote:
> > I have committed this patch to trunk after bootstrap/regression
> > testing again on trunk.
> >
> > I'll get to gcc14/15 once I flush the current queue.
> >
> > Andrew
> >
> >
> > On 4/17/25 06:44, Richard Biener wrote:
> >> On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod 
> >> wrote:
> >>> This was a fun one!   An actual bug, and it took a while to sort out.
> >>> After chasing down some red herrings, this turns out to be an issue of
> >>> interaction between the range and value masks and intervening
> >>> calculations.
> >>>
> >>> The original patch from 11/2023 adjusts intersection so that it can
> >>> enhance subranges based on the value mask.  ie in this testcase
> >>>
> >>> [irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1
> >>>
> >>>If adjust_range() were called on this, it would eliminate the
> >>> trailing
> >>> mask/value bit ranges that are invalid and turn it into :
> >>>
> >>> [-INF, -3][1, 1][4, 2147483626] MASK 0xfffc VALUE 0x1
> >>>
> >>> reflecting the lower bits into the range.   The problem develops
> >>> because
> >>> we only apply adjust_range ()  during intersection in an attempt to
> >>> avoid expensive work when it isnt needed.
> >>>
> >>> Unfortunately, that is what triggers this infinite loop. Rangers cache
> >>> propagates changes, and the algorithm is designed to always improve the
> >>> range.  In this case, the first iteration through, _11 receives the
> >>> above value, [irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1
> >>> which via the mask, excludes 0, 2 and 3.
> >>>
> >>> The ensuing calculations in block 7 do not trigger a successful
> >>> intersection operation, and thus the range pairs are never expanded to
> >>> eliminate the lower ranges, and it triggers another change in values
> >>> which leads to the next iteration being less precise, but not obviously
> >>> so. [irange] int [-INF, 2147483644] MASK 0xfffd VALUE 0x0 is a
> >>> result of the calculation.   As ranges as suppose to always get better
> >>> with this algorithm, we simply compare for difference.. and this range
> >>> is different, and thus we replace it. It only excludes 2 and 3.
> >>>
> >>> Next iteration through the less precise range DOES trigger an
> >>> intersection operation in block 7, and when that is expanded to
> >>> [irange]
> >>> int [-INF, 1][4, 2147483644] MASK 0xfffd VALUE 0x0 using that we
> >>> can
> >>> again create the more precise range for _11 that started the cycle. and
> >>> we go on and on and on.
> >>>
> >>> If we fix this so that we always expand subranges to reflect the lower
> >>> bits in a bitmask, the initial value starts with
> >>>
> >>> [irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffc VALUE 0x1
> >>>
> >>> And everything falls into place as it should.  The fix is to be
> >>> consistent about expanding those lower subranges.
> >>>
> >>> I also added a couple of minor performance tweaks to avoid unnecessary
> >>> work, along with removing adjust_range () directly into
> >>> set_range_from_bitmask () .
> >>>
> >>> I started at a 0.2% overall compilation increase (1.8% in VRP). In the
> >>> end, this patch is down to 0.6% in VRP, and only 0.08% overall, so
> >>> manageable for all the extra work.
> >>>
> >>> It also causes a few ripples in the testsuite so 3 test cases also
> >>> needed adjustment:
> >>>
> >>>* gcc.dg/pr83072-2.c :  With the addition of the expanded ranges,
> >>> CCP
> >>> use to export a global:
> >>>   Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffe
> >>> VALUE 0x1
> >>> and now
> >>>  Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK
> >>> 0xfffe VALUE 0x1
> >>> Which in turn enables forwprop to collapse part of the testcase much
> >>> earlier. So I turned off forwprop for the testcase
> >>>
> >>> * gcc.dg/tree-ssa/phi-opt-value-5.c  : WIth the expanded ranges, CCP2
> >>> pass use to export:
> >>>  Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffe
> >>> VALUE 0x1
> >>> and now
> >>>  Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK
> >>> 0xfffe VALUE 0x1
> >>> which in turn makes the following comment obsolete as the optimization
> >>> does happen earlier.:
> >>> /* fdiv1 requires until later than phiopt2 to be able to detect that
> >>>  d is non-zero. to be able to remove the conditional.  */
> >>> Adjusted the testcase to expect everything to be taken care of by
> >>> phi-opt2 pass.
> >>>
> >>>* gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported:
> >>>  Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK
> >>> 0xfff0 VALUE 0x0
> >>> and then EVRP refined that and stored it, then the testcase tested for:
> >>>  Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK
> >>> 0xfff0 VALUE 0x0
> >>> Now, CCP itself exported the expanded range, so there is nothing for
> >>> VRP
> >>> t

Re: [PATCH] PR tree-optimization/95801 - infer non-zero for integral division RHS.

2025-04-29 Thread Richard Biener

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod  wrote:
>
>
> On 4/28/25 17:26, Andrew MacLeod wrote:
> > I have committed this patch to trunk after bootstrap/regression
> > testing again on trunk.
> >
> > I'll get to gcc14/15 once I flush the current queue.
> >
> > Andrew
> >
> > On 1/23/25 04:39, Richard Biener wrote:
> >> On Wed, Jan 22, 2025 at 12:49 AM Andrew MacLeod 
> >> wrote:
> >>> This patch simply adds an op2_range to operator_div which returns
> >>> non-zero if the LHS is not undefined.  This means given and integral
> >>> division:
> >>>
> >>>  x = y / z
> >>>
> >>> 'z' will have a range of   [-INF, -1] [1, +INF]  after execution of the
> >>> statement.
> >>>
> >>> This is relatively straightforward and resolves the PR, but I also get
> >>> that we might not want to proliferate an inferred range of undefined
> >>> behavior at this late stage.
> >>>
> >>> OK for trunk, or defer to stage 1?  Are there any flags that need to be
> >>> checked to make this valid?
> >> Stage 1 please.  I don't think this needs any flags.
> >>
> >> Richard.
> >>
> >>> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
> >>>
> >>> Andrew
> This is now in trunk.   Attached are the patches for gcc15, gcc14, and
> gcc13.
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.
>
> Do you want me to check it in for any or all of those branches?I
> cant go back further than gcc13 due to a lack of inferred range processing.

It's not a regression, so please don't backport at all.

Richard.

> Andrew

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-04-29 Thread Jakub Jelinek

On Mon, Apr 28, 2025 at 07:27:31PM +0200, Josef Melcr wrote:
> As for the attribute, I am honestly not too sure about what to do, as clang
> is
> not consistent in with its own indexing, be it with the unknown values, or
> with
> 'this'. I've tried to remain consistent with GCC's indexing style. I guess
> I'll
> leave up to you and the other maintainers to decide. I can implement clangs
> version 1:1, put the attribute in our namespace or rename it. I don't mind
> either way. Another option would be to patch clang to get in line with the
> rest
> of its attributes. It seems like the best option to me, as it would make
> being
> consistent way easier, but it would be problematic, as all code using this
> attribute would need to be updated.

I'll talk to C/C++ FE maintainers what they think.
The attribute is after all not really OpenMP related, it is something
that can/should be used on qsort_r and the likes.

> > Another question is about GOMP_task.  Can you handle any constant
> > propagation into that?  I see you've tried to deal with it by using 2
> > callback attributes but with using 0 for the argument.  Wouldn't it be
> > better to just special case BUILT_IN_GOMP_TASK in IPA?
> Propagating into the body function is currently working. Propagating into
> the
> copy function apparently still needs some work, as the example below causes
> a
> crash, sorry about that. I'll fix that and add tests for GOMP_task.

I think the most important at least for the first version is to handle
propagation into body function if copy function is NULL.  That is quite
common case and easily handleable (though I'd think GOMP_task should be a
special case and not use the attribute at all, just the compiler should
treat it as if it has one if the cpyfn argument is NULL).  The rest could
be handled incrementally.

As for the more complex case, I'm not really sure how can propagation into
body work and propagation into the cpyfn not.  In that case propagation
into cpyfn is the easy case, the type of the passed in structure is the same
as the cpyfn receives.  The cpyfn then fills in a different structure and
that is passed to the body.  So, either some analysis would need to go
through the cpyfn and see ok, we've been able to propagate constant here,
and that constant is then stored into this member of the other argument and
not really modified elsewhere, so we can propagate that further.
Or we could go with extra attributes on the FIELD_DECLs of the 2 structures
from omp lowering in that case, mark for the cpyfn case fields which are
just copied through by the cpyfn unmodified, then one can simply find
FIELD_DECL with the same DECL_NAME/type and propagate that.

> I will look into that to see what can be done, though I'd like to introduce
> such extensions incrementally, as the patch is
> already large enough :)

Sure.

Jakub

Re: [PATCH v3] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-29 Thread Kito Cheng

Seems CI still fail:

https://github.com/ewlu/gcc-precommit-ci/issues/3282#issue-3030037257

Executing on host:
/home/ewlu/precommit-08/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-newlib-stage2/gcc/xgcc
-B/home/ewlu/pre
commit-08/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-newlib-stage2/gcc/
 /home/ewlu/precommit-08/_work/gcc-precommit-ci/gcc-p
recommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/predef-19.c
 -march=rv32gc -mabi=ilp32d -mcmodel=medlow
-fdiagnostics-plain-output-O0
 -O2 -march=rv64i_zve32x -mabi=lp64 -mcmodel=medlow -misa-spec=2.2 -S
 -o predef-19.s(timeout = 600)
spawn -ignore SIGHUP
/home/ewlu/precommit-08/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-newlib-stage2/gcc/xgcc
-B/home/ewlu/p
recommit-08/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-newlib-stage2/gcc/
/home/ewlu/precommit-08/_work/gcc-precommit-ci/gcc-
precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/predef-19.c
-march=rv32gc -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output
-O0 -O2
-march=rv64i_zve32x -mabi=lp64 -mcmodel=medlow -misa-spec=2.2 -S -o predef-19.s
cc1: sorry, unimplemented: Currently the 'V' implementation requires
the 'M' extension
compiler exited with status 1
FAIL: gcc.target/riscv/predef-19.c   -O0  (test for excess errors)
Excess errors:
cc1: sorry, unimplemented: Currently the 'V' implementation requires
the 'M' extension

On Wed, Apr 30, 2025 at 11:05 AM Jerry Zhang Jian
 wrote:
>
> The Zve32x extension depends on the Zicsr extension.
> Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.
>
> gcc/ChangeLog:
> * common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/predef-19.c: set the march to rv64i_zve32x
>   instead of rv64gc_zve32x to avoid Zicsr implied by g
>
> Signed-off-by: Jerry Zhang Jian 
> ---
>  gcc/common/config/riscv/riscv-common.cc|  1 +
>  gcc/testsuite/gcc.target/riscv/predef-19.c | 34 ++
>  2 files changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 15df22d5377..145a0f2bd95 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zve64f", "f"},
>{"zve64d", "d"},
>
> +  {"zve32x", "zicsr"},
>{"zve32x", "zvl32b"},
>{"zve32f", "zve32x"},
>{"zve32f", "zvl32b"},
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
> b/gcc/testsuite/gcc.target/riscv/predef-19.c
> index 2b90702192b..c2e12b6040c 100644
> --- a/gcc/testsuite/gcc.target/riscv/predef-19.c
> +++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
> -misa-spec=2.2" } */
> +/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64 -mcmodel=medlow 
> -misa-spec=2.2" } */
>
>  int main () {
>
> @@ -15,40 +15,12 @@ int main () {
>  #error "__riscv_i"
>  #endif
>
> -#if !defined(__riscv_c)
> -#error "__riscv_c"
> -#endif
> -
>  #if defined(__riscv_e)
>  #error "__riscv_e"
>  #endif
>
> -#if !defined(__riscv_a)
> -#error "__riscv_a"
> -#endif
> -
> -#if !defined(__riscv_m)
> -#error "__riscv_m"
> -#endif
> -
> -#if !defined(__riscv_f)
> -#error "__riscv_f"
> -#endif
> -
> -#if !defined(__riscv_d)
> -#error "__riscv_d"
> -#endif
> -
> -#if defined(__riscv_v)
> -#error "__riscv_v"
> -#endif
> -
> -#if defined(__riscv_zvl128b)
> -#error "__riscv_zvl128b"
> -#endif
> -
> -#if defined(__riscv_zvl64b)
> -#error "__riscv_zvl64b"
> +#if !defined(__riscv_zicsr)
> +#error "__riscv_zicsr"
>  #endif
>
>  #if !defined(__riscv_zvl32b)
> --
> 2.49.0
>

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix several test failures

2025-04-29 Thread François Dumont



On 29/04/2025 08:55, Jonathan Wakely wrote:



On Mon, 28 Apr 2025, 21:37 François Dumont,  wrote:

Much better indeed, there is only the aligned_storage adaptation left.

It will simplify my big versioned namespace patch to use cxx11
abi, very
nice !

 libstdc++: [_GLIBCXX_INLINE_VERSION] Fix several tests failures

 Adapt testsuite v3_target_compile to strip version namespace
from
compiler
 output so that dg-error and dg-warning directives do not need to
consider it.

 Avoid a aligned_storage check as behavior has been fixed only
when
using
 gnu-versioned-namespace as it is an abi breaking change.

 libstdc++-v3/ChangeLog:

 * testsuite/lib/libstdc++.exp (v3_target_compile): Strip
version namespace
 from compiler output.
 * testsuite/20_util/aligned_storage/value.cc
[_GLIBCXX_INLINE_VERSION]:
 Avoid align_msa check.
 * testsuite/20_util/function/cons/70692.cc: Remove now
useless __8 namespace
 pattern.
 * testsuite/23_containers/map/48101_neg.cc: Likewise.
 * testsuite/23_containers/multimap/48101_neg.cc:
Likewise.

Ok to commit ? And maybe backports ?


OK for trunk and 15.

I think the aligned_storage part is not needed for gcc-14, right? The 
rest of ok to backport for gcc-14.


Indeed, now pushed to trunk and gcc-15 and gcc-14 without the 
aligned_storage part.

Re: [PATCH] RISC-V: Minimal support for ssnpm, smnpm and smmpm extensions.

2025-04-29 Thread Kito Cheng

Hi Dongyan:

> diff --git a/gcc/testsuite/gcc.target/riscv/arch-46.c
> b/gcc/testsuite/gcc.target/riscv/arch-46.c
> new file mode 100644
> index ..fb2bdf72597f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-46.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_ssnpm_smnpm_smmpm_sspm_supm
> -mabi=ilp32d" } */
> +int foo()
> +{
> +}
> +/* { missing " for " dg-error 6 ".'error: '-march=rv32gc_ssnpm': ssnpm
> extension supports in rv64 only " } */
> +/* { missing " for " dg-error 6 ".'error: '-march=rv32gc_smnpm': smnpm
> extension supports in rv64 only " } */
> +/* { missing " for " dg-error 6 ".'error: '-march=rv32gc_smmpm': smmpm
> extension supports in rv64 only " } */
> +/* { missing " for " dg-error 6 ".'error: '-march=rv32gc_sspm': sspm
> extension supports in rv64 only " } */
> +/* { missing " for " dg-error 6 ".'error: '-march=rv32gc_supm': supm
> extension supports in rv64 only " } */

Error messages seem not correct, could you check this on your side again?


> --
> 2.43.0

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Jonathan Wakely

On Tue, 29 Apr 2025 at 14:55, Tomasz Kaminski  wrote:
>
>
>
> On Tue, Apr 29, 2025 at 2:55 PM Luc Grosheintz  
> wrote:
>>
>> This implements std::extents from  according to N4950 and
>> contains partial progress towards PR107761.
>>
>> If an extent changes its type, there's a precondition in the standard,
>> that the value is representable in the target integer type. This
>> precondition is not checked at runtime.
>>
>> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
>> For extents this precondition is always violated and results in
>> calling __builtin_trap. For all other specializations it's checked via
>> __glibcxx_assert.
>>
>> PR libstdc++/107761
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/mdspan (extents): New class.
>> * src/c++23/std.cc.in: Add 'using std::extents'.
>>
>> Signed-off-by: Luc Grosheintz 
>> ---
>>  libstdc++-v3/include/std/mdspan  | 262 +++
>>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>>  2 files changed, 267 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/include/std/mdspan 
>> b/libstdc++-v3/include/std/mdspan
>> index 4094a416d1e..39ced1d6301 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -33,6 +33,12 @@
>>  #pragma GCC system_header
>>  #endif
>>
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>>  #define __glibcxx_want_mdspan
>>  #include 
>>
>> @@ -41,6 +47,262 @@
>>  namespace std _GLIBCXX_VISIBILITY(default)
>>  {
>>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +  namespace __mdspan
>> +  {
>> +template
>> +  class _ExtentsStorage
>> +  {
>> +  public:
>> +   static consteval bool
>> +   _S_is_dyn(size_t __ext) noexcept
>> +   { return __ext == dynamic_extent; }
>> +
>> +   template
>> + static constexpr _IndexType
>> + _S_int_cast(const _OIndexType& __other) noexcept
>> + { return _IndexType(__other); }
>> +
>> +   static constexpr size_t _S_rank = _Extents.size();
>> +
>> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
>> +   // of dynamic extents up to (and not including) __r.
>> +   //
>> +   // If __r is the index of a dynamic extent, then
>> +   // _S_dynamic_index[__r] is the index of that extent in
>> +   // _M_dynamic_extents.
>> +   static constexpr auto _S_dynamic_index = [] consteval
>> +   {
>> + array __ret;
>> + size_t __dyn = 0;
>> + for(size_t __i = 0; __i < _S_rank; ++__i)
>> +   {
>> + __ret[__i] = __dyn;
>> + __dyn += _S_is_dyn(_Extents[__i]);
>> +   }
>> + __ret[_S_rank] = __dyn;
>> + return __ret;
>> +   }();
>> +
>> +   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
>> +
>> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
>> +   // index of the __r-th dynamic extent in _Extents.
>> +   static constexpr auto _S_dynamic_index_inv = [] consteval
>> +   {
>> + array __ret;
>> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
>> +   if (_S_is_dyn(_Extents[__i]))
>> + __ret[__r++] = __i;
>> + return __ret;
>> +   }();
>> +
>> +   static constexpr size_t
>> +   _S_static_extent(size_t __r) noexcept
>> +   { return _Extents[__r]; }
>> +
>> +   constexpr _IndexType
>> +   _M_extent(size_t __r) const noexcept
>> +   {
>> + auto __se = _Extents[__r];
>> + if (__se == dynamic_extent)
>> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
>> + else
>> +   return __se;
>> +   }
>> +
>> +   template
>> + constexpr void
>> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
>> + {
>> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
>> + {
>> +   size_t __di = __i;
>> +   if constexpr (_OtherRank != _S_rank_dynamic)
>> + __di = _S_dynamic_index_inv[__i];
>> +   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
>> + }
>> + }
>> +
>> +   constexpr
>> +   _ExtentsStorage() noexcept = default;
>> +
>> +   template
>> + constexpr
>> + _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
>> + __other) noexcept
>> + {
>> +   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
>> + { return __other._M_extent(__i); });
>> + }
>> +
>> +   template
>> + constexpr
>> + _ExtentsStorage(span __exts) noexcept
>> + {
>> +   _M_init_dynamic_extents<_Nm>(
>> + [&__exts](size_t __i) -> const _OIndexType&
>> + { return __exts[__i]; });
>> + }
>> +
>> +  private:
>> +   using _S_storage = __array_traits<_IndexType, 
>> _S_rank_dynami

Re: [PATCH] RISC-V: Allow different dynamic floating point mode to be merged [PR119832]

2025-04-29 Thread Robin Dapp


Although we already try to set the mode needed to FRM_DYN after a function call,
there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may appear
on incoming edges.

Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, FRM_DYN_CALL,
and FRM_DYN_EXIT modes are compatible.


Just a note: Vineet is working on similar issues right now and mentioned that 
this patch/hook might not be necessary.  But it's going tot take some more time 
until his patches are ready.  So we can go ahead here or wait a bit.


--
Regards
Robin

Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-04-29 Thread Tomasz Kaminski

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template
_Rank0_mapping_base
{
   static_assert(_Extents::rank() == 0);

   template
   // explicit, requires goes here
   _Rank0_mapping_base(_Rank0_mapping_base);

// All members layout_type goes her
};

template
_Rank1_mapping_base
{
   static_assert(_Extents::rank() == 1);
  // Static assert for product is much simpler here, as we need to check one

   template
   // explicit, requires goes here
   _Rank1_mapping_base(_Rank1_mapping_base);

  // Call operator can also be simplified
  index_type operator()(index_type i) const // conversion happens at user
side

  // cosntructor from strided_layout of Rank1 goes here.

// All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template
struct extents {};

struct layout
{
template
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template
struct layout::mapping>
: _Rank0_mapping_base>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base> const&);
};

template
struct layout::mapping>
: _Rank1_mapping_base>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base> const&);
};
};

template
requires sizeof..(_Ext) > = 2
struct layout::mapping>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.

On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 179 
>  1 file changed, 179 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 39ced1d6301..e05048a5b93 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  {
> +   typename _Extents::index_type __fwd = 1;
> +   for(size_t __i = 0; __i < __r; ++__i)
> + __fwd *= __exts.extent(__i);
> +   return __fwd;
> +  }
>
As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.

Then instead of having fwd-prod and rev-prod I would have:
template
consteval size_t __static_ext_prod(size_t f, size_t l)
{
  // multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
   // multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}

Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())

> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  {
> +   typename _Extents::index_type __rev = 1;
> +   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
> + __rev *= __exts.extent(__i);
> +   return __rev;
> +  }
> +
>  template
>auto __build_dextents_type(integer_sequence)
> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
> @@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  explicit extents(_Integrals...) ->
>extents()...>;
>
> +  struct layout_left
> +

RE: [PATCH] RISC-V: Allow different dynamic floating point mode to be merged [PR119832]

2025-04-29 Thread Li, Pan2

Kind of surprise that this change doesn't make any of the existing frm tests 
fail(given we have many frm tests).

No comment from myside.

Pan

-Original Message-
From: Kito Cheng  
Sent: Tuesday, April 29, 2025 11:35 AM
To: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; 
jeffreya...@gmail.com; rd...@ventanamicro.com; juzhe.zh...@rivai.ai; Li, Pan2 
; vine...@rivosinc.com
Cc: Kito Cheng 
Subject: [PATCH] RISC-V: Allow different dynamic floating point mode to be 
merged [PR119832]

Although we already try to set the mode needed to FRM_DYN after a function call,
there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may appear
on incoming edges.

Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, FRM_DYN_CALL,
and FRM_DYN_EXIT modes are compatible.

gcc/ChangeLog:

PR target/119832
* config/riscv/riscv.cc riscv_dynamic_frm_mode_p): New.
(riscv_mode_confluence): New.
(TARGET_MODE_CONFLUENCE): Define to riscv_mode_confluence.

gcc/testsuite/ChangeLog:

PR target/119832
* g++.target/riscv/pr119832.C: New test.
---
 gcc/config/riscv/riscv.cc | 37 +++
 gcc/testsuite/g++.target/riscv/pr119832.C | 27 +
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/riscv/pr119832.C

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bad59e248d0..198fe72ef68 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12273,6 +12273,41 @@ riscv_mode_needed (int entity, rtx_insn *insn, 
HARD_REG_SET)
 }
 }
 
+/* Return TRUE if the rouding mode is dynamic.  */
+
+static bool
+riscv_dynamic_frm_mode_p (int mode)
+{
+  return mode == riscv_vector::FRM_DYN
+|| mode == riscv_vector::FRM_DYN_CALL
+|| mode == riscv_vector::FRM_DYN_EXIT;
+}
+
+/* Implement TARGET_MODE_CONFLUENCE.  */
+
+static int
+riscv_mode_confluence (int entity, int mode1, int mode2)
+{
+  switch (entity)
+{
+case RISCV_VXRM:
+  return VXRM_MODE_NONE;
+case RISCV_FRM:
+  {
+   /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible.
+  Although we already try to set the mode needed to FRM_DYN after a
+  function call, there are still some corner cases where both FRM_DYN
+  and FRM_DYN_CALL may appear on incoming edges.  */
+   if (riscv_dynamic_frm_mode_p (mode1)
+   && riscv_dynamic_frm_mode_p (mode2))
+ return riscv_vector::FRM_DYN;
+   return riscv_vector::FRM_NONE;
+  }
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Return TRUE that an insn is asm.  */
 
 static bool
@@ -14356,6 +14391,8 @@ bool need_shadow_stack_push_pop_p ()
 #define TARGET_MODE_EMIT riscv_emit_mode_set
 #undef TARGET_MODE_NEEDED
 #define TARGET_MODE_NEEDED riscv_mode_needed
+#undef TARGET_MODE_CONFLUENCE
+#define TARGET_MODE_CONFLUENCE riscv_mode_confluence
 #undef TARGET_MODE_AFTER
 #define TARGET_MODE_AFTER riscv_mode_after
 #undef TARGET_MODE_ENTRY
diff --git a/gcc/testsuite/g++.target/riscv/pr119832.C 
b/gcc/testsuite/g++.target/riscv/pr119832.C
new file mode 100644
index 000..f4dc480e6d5
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/pr119832.C
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcv -mabi=lp64 -ffast-math" } */
+
+struct ac  {
+  ~ac();
+  void u();
+};
+struct ae {
+  int s;
+  float *ag;
+};
+
+float c;
+
+void ak(ae *al, int n) {
+  ac d;
+  for (int i;i

[PATCH v3] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-29 Thread Jerry Zhang Jian

The Zve32x extension depends on the Zicsr extension.
Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr

gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-19.c: set the march to rv64i_zve32x
  instead of rv64gc_zve32x to avoid Zicsr implied by g

Signed-off-by: Jerry Zhang Jian 
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-19.c | 34 ++
 2 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 15df22d5377..145a0f2bd95 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zve64f", "f"},
   {"zve64d", "d"},
 
+  {"zve32x", "zicsr"},
   {"zve32x", "zvl32b"},
   {"zve32f", "zve32x"},
   {"zve32f", "zvl32b"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index 2b90702192b..c2e12b6040c 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
+/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64 -mcmodel=medlow 
-misa-spec=2.2" } */
 
 int main () {
 
@@ -15,40 +15,12 @@ int main () {
 #error "__riscv_i"
 #endif
 
-#if !defined(__riscv_c)
-#error "__riscv_c"
-#endif
-
 #if defined(__riscv_e)
 #error "__riscv_e"
 #endif
 
-#if !defined(__riscv_a)
-#error "__riscv_a"
-#endif
-
-#if !defined(__riscv_m)
-#error "__riscv_m"
-#endif
-
-#if !defined(__riscv_f)
-#error "__riscv_f"
-#endif
-
-#if !defined(__riscv_d)
-#error "__riscv_d"
-#endif
-
-#if defined(__riscv_v)
-#error "__riscv_v"
-#endif
-
-#if defined(__riscv_zvl128b)
-#error "__riscv_zvl128b"
-#endif
-
-#if defined(__riscv_zvl64b)
-#error "__riscv_zvl64b"
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
 #endif
 
 #if !defined(__riscv_zvl32b)
-- 
2.49.0

RE: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-29 Thread Li, Pan2

Thanks Robin for help.

> as I suggested initializes total with an estimate of the mode size (total = 8 
> for me) before we get to riscv_rtx_cost.  This makes the rest of the
> costs (which we assume to be relative to 4) inaccurate.

I see, that explains how cost value 8 comes from.

> Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

Yes, that is correct, will resend the v3 within this change.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, April 29, 2025 9:47 PM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

> I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I 
> have a try testing.  For example with below change:
>
> +   switch (rcode)
> +   {
> + case VEC_DUPLICATE:
> +   *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
> (1);
> +   break;
> + case PLUS:
> +   {
> +   rtx op_0 = XEXP (x, 0);   +   rtx op_1 = XEXP (x, 1);
> +   if (GET_CODE (op_0) == VEC_DUPLICATE
> +   || GET_CODE (op_1) == VEC_DUPLICATE)
> + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
> (1);
> +   else
> + *total = COSTS_N_INSNS (1);
> +   break;
> +   }
> + default:
> +   *total = COSTS_N_INSNS (1);
> +   break;
> +   }
> +
> +   return true;
>
> For case_0, GR2VR is 0, we will have late-combine as blow:
>   51   │ trying to combine definition of r135 in:
>   52   │11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
>   53   │ into:
>   54   │18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
>   55   │   REG_DEAD r146:RVVM1SI
>   56   │ successfully matched this instruction to *add_vx_rvvm1si:
>   57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
>   58   │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 
>   [ x ]) 0))
>   59   │ (reg:RVVM1SI 146)))
>   60   │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 
>   (weighted: 64.727273); rejecting replacement
>
>
> The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also 
> confused about the how we calculate the
> vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have 
> define_insn_and_split to match the pattern and
> emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec 
> == vadd.vx.
> Please help to correct me if misunderstanding.

Yes, that doesn't look quite correct yet.
I think the issue is that using

  *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);

as I suggested initializes total with an estimate of the mode size (total = 8 
for me) before we get to riscv_rtx_cost.  This makes the rest of the
costs (which we assume to be relative to 4) inaccurate.

So try
  *total = get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
for the vec_dup case and
  *total = COST_N_INSNS (1) + get_vector_costs ()->regmove->GR2VR * 
  COSTS_N_INSNS (1);
for the vx case.

Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

Re: [PATCH] strlen: Handle empty constructor as memset for combining with malloc to calloc [PR87900]

2025-04-29 Thread Hans-Peter Nilsson

Random-typo-spotting-mode activated:

On Sat, 19 Apr 2025, Andrew Pinski wrote:
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c

> +/* zeroing out via a CONSTRUCTOR should be treated similarly as a msmet and

"memset"

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c

> +/* zeroing out via a CONSTRUCTOR should be treated similarly as a msmet and

Ditto.

brgds, H-P

Re: [PATCH] RISC-V: Allow different dynamic floating point mode to be merged [PR119832]

2025-04-29 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2025-04-29 11:35
To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong; pan2.li; 
vineetg
CC: Kito Cheng
Subject: [PATCH] RISC-V: Allow different dynamic floating point mode to be 
merged [PR119832]
Although we already try to set the mode needed to FRM_DYN after a function call,
there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may appear
on incoming edges.
 
Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, FRM_DYN_CALL,
and FRM_DYN_EXIT modes are compatible.
 
gcc/ChangeLog:
 
PR target/119832
* config/riscv/riscv.cc riscv_dynamic_frm_mode_p): New.
(riscv_mode_confluence): New.
(TARGET_MODE_CONFLUENCE): Define to riscv_mode_confluence.
 
gcc/testsuite/ChangeLog:
 
PR target/119832
* g++.target/riscv/pr119832.C: New test.
---
gcc/config/riscv/riscv.cc | 37 +++
gcc/testsuite/g++.target/riscv/pr119832.C | 27 +
2 files changed, 64 insertions(+)
create mode 100644 gcc/testsuite/g++.target/riscv/pr119832.C
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bad59e248d0..198fe72ef68 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12273,6 +12273,41 @@ riscv_mode_needed (int entity, rtx_insn *insn, 
HARD_REG_SET)
 }
}
+/* Return TRUE if the rouding mode is dynamic.  */
+
+static bool
+riscv_dynamic_frm_mode_p (int mode)
+{
+  return mode == riscv_vector::FRM_DYN
+ || mode == riscv_vector::FRM_DYN_CALL
+ || mode == riscv_vector::FRM_DYN_EXIT;
+}
+
+/* Implement TARGET_MODE_CONFLUENCE.  */
+
+static int
+riscv_mode_confluence (int entity, int mode1, int mode2)
+{
+  switch (entity)
+{
+case RISCV_VXRM:
+  return VXRM_MODE_NONE;
+case RISCV_FRM:
+  {
+ /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible.
+Although we already try to set the mode needed to FRM_DYN after a
+function call, there are still some corner cases where both FRM_DYN
+and FRM_DYN_CALL may appear on incoming edges.  */
+ if (riscv_dynamic_frm_mode_p (mode1)
+ && riscv_dynamic_frm_mode_p (mode2))
+   return riscv_vector::FRM_DYN;
+ return riscv_vector::FRM_NONE;
+  }
+default:
+  gcc_unreachable ();
+}
+}
+
/* Return TRUE that an insn is asm.  */
static bool
@@ -14356,6 +14391,8 @@ bool need_shadow_stack_push_pop_p ()
#define TARGET_MODE_EMIT riscv_emit_mode_set
#undef TARGET_MODE_NEEDED
#define TARGET_MODE_NEEDED riscv_mode_needed
+#undef TARGET_MODE_CONFLUENCE
+#define TARGET_MODE_CONFLUENCE riscv_mode_confluence
#undef TARGET_MODE_AFTER
#define TARGET_MODE_AFTER riscv_mode_after
#undef TARGET_MODE_ENTRY
diff --git a/gcc/testsuite/g++.target/riscv/pr119832.C 
b/gcc/testsuite/g++.target/riscv/pr119832.C
new file mode 100644
index 000..f4dc480e6d5
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/pr119832.C
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcv -mabi=lp64 -ffast-math" } */
+
+struct ac  {
+  ~ac();
+  void u();
+};
+struct ae {
+  int s;
+  float *ag;
+};
+
+float c;
+
+void ak(ae *al, int n) {
+  ac d;
+  for (int i;i

RE: Make ix86 cost of VEC_SELECT equivalent to SUBREG same as of SUBREG

2025-04-29 Thread Liu, Hongtao




> -Original Message-
> From: Jan Hubicka 
> Sent: Wednesday, April 30, 2025 4:11 AM
> To: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ro...@nextmovesoftware.com; ubiz...@gmail.com
> Subject: Make ix86 cost of VEC_SELECT equivalent to SUBREG same as of
> SUBREG
> 
> Hi,
> this patch (partly) solves problem in PR119900 where changing ix86_size_cost
> of chap SSE instruction from 2 bytes to 4 bytes regresses imagemagick with
> PGO (119% on core and 54% on Zen)
> 
> There is an interesting chain of problems
>  1) the train run of the SPEC2017 imagick is wrong and it does not train the
> innermost
> loop of morphology apply used in ref run (other loop in the same function 
> is
> trained
> instead)
>  2) rpad pass introduces XMM register with 0 which is used to break
> dependency chains
> in int->double conversion such as:
> 
> vcvtusi2sdq %r14, %xmm4, %xmm1
> 
> xmm1 is the 0 register.  The pass turns itself off if
> optimize_funcition_for_speed
> is false, so it only matches in the cold region because MorphologyApply
> contains
> other hot code.  I think the pass should prevent introducing 0 registers 
> for
> cold code, but that would make imagick situation only worse since it is
> mistrained.
> 
> The code produced is
> 
> 
> tmp_reg:V2DF=vec_merge(vec_duplicate(uns_float(input_reg:DI)),input_reg2
> :V2DF,0x1)
> output_reg:DF=subreg (tmp_reg:V2DF, 0)
> 
>  3) We have splitter translating subreg to vec_select introduced by Roger
> 
> https://gcc.gnu.org/cgit/gcc/commit/?id=faa2202ee7fcf039b2016ce5766a2
> 927526c5f78
> 
> So after register allocation we end up with:
> xmm4:DF=vec_select(xmm4:V2DF,parallel[0])
>  4) late combined pass undoes the optimization:
> 
> trying to combine definition of r24 in:
>   388:
> xmm4:V2DF=vec_merge(vec_duplicate(uns_float(r14:DI)),xmm3:V2DF,0x1)
> into:
>   486: xmm4:DF=vec_select(xmm4:V2DF,parallel)
> successfully matched this instruction to *floatunsdidf2_avx512:
> (set (reg:DF 24 xmm4 [orig:168 _357 ] [168])
> (unsigned_float:DF (reg/v:DI 42 r14 [orig:151 former_height ] [151])))
> original cost = 8 + 8, replacement cost = 16; keeping replacement
> 
> Here the original cost is computed from cost->sse_op and at -Os it used to be
> 4+4 (since sse_op incorrecty cheaper).
> 
> There are multiple problems in this chain of events, but I think first 
> problem is
> costing
> 
>   xmm4:DF=vec_select(xmm4:V2DF,parallel[0])
> 
> as real SSE operation when it will often translate to nothing.  Since 
> VEC_SELECT
> is doing a job of a subreg, I think the rtx cost should be same as of the 
> subreg
> which is 0.
> 
target_insn_cost is used to prevent rpad optimization to be restored by 
late_combine1, looks like it's not sufficient for size_cost.

21804static int
21805ix86_insn_cost (rtx_insn *insn, bool speed)
21806{
21807  int insn_cost = 0;
21808  /* Add extra cost to avoid post_reload late_combine revert
21809 the optimization did in pass_rpad.  */
21810  if (reload_completed
21811  && ix86_rpad_gate ()
21812  && recog_memoized (insn) >= 0
21813  && get_attr_avx_partial_xmm_update (insn)
21814  == AVX_PARTIAL_XMM_UPDATE_TRUE)
21815insn_cost += COSTS_N_INSNS (3);
21816
21817  return insn_cost + pattern_cost (PATTERN (insn), speed);
21818}

> Also I wonder
>  1) why we don't also translate SUBREG of other types (SF, SI, DI etc.) of a
> vector register
> same was as we do DF
>  2) I think
> 
>   (define_insn "sse2_storelpd"
> [(set (match_operand:DF 0 "nonimmediate_operand"
> "=m,x,x,*f,r")
>   (vec_select:DF
> (match_operand:V2DF 1 "nonimmediate_operand" " v,x,m,m,m")
> (parallel [(const_int 0)])))]
> "TARGET_SSE2 && !(MEM_P (operands[0]) && MEM_P
> (operands[1]))"
> "@
>  %vmovlpd\t{%1, %0|%0, %1}
>  #
>  #
>  #
>  #"
> [(set_attr "type" "ssemov,ssemov,ssemov,fmov,imov")
>  (set (attr "prefix_data16")
>(if_then_else (eq_attr "alternative" "0")
>  (const_string "1")
>  (const_string "*")))
>  (set_attr "prefix" "maybe_vex")
>  (set_attr "mode" "V1DF,DF,DF,DF,DF")])
> 
>   (define_split
> [(set (match_operand:DF 0 "register_operand")
>   (vec_select:DF
> (match_operand:V2DF 1 "nonimmediate_operand")
> (parallel [(const_int 0)])))]
> "TARGET_SSE2 && reload_completed"
> [(set (match_dup 0) (match_dup 1))]
> "operands[1] = gen_lowpart (DFmode, operands[1]);")
> 
>effectively hides from the register-allocation the fact that if operand1 
> is same
> as operand0
>the splitter will lead to ellimination of the instruction.  Pehraps we 
> should add
> extra
>alternative "=mx" "0" and increase cost of the others?
Sounds reasonable, add ? to other alternatives?
>  3) cost of
> 
> (set (reg:DF 24 xmm4 [orig:168 _3

Re: [PATCH] PR tree-optimization/119471 - If the LHS does not contain zero, neither do multiply operands.

2025-04-29 Thread Andrew MacLeod



On 3/28/25 10:36, Andrew MacLeod wrote:

On 3/28/25 03:19, Richard Biener wrote:
On Fri, Mar 28, 2025 at 12:28 AM Andrew MacLeod  
wrote:

This patch fixes both 119471 and the remainder of 110992.

At issue is we do not recognize that if

    "a * b != 0" , then neither "a" nor "b" can be zero.

This is fairly trivial with range-ops.   op1_range and op2_range for
operator_mult are taught that if the LHS does not contain zero, than
neither does either operand.

Included are patches for trunk (gcc15), gcc14, and gcc13.  All are
basically the same few lines.

I presume we want to wait for stage 1 to check this into trunk .

Bootstraps with no regressions on x86_64-pc-linux-gnu on all 3
branches.  OK for gcc13 and gcc14 branches?

This is OK for branches only after it was on trunk.  Since one of the
PRs is a regression it's technically OK for trunk now.

Richard.


OK, it should be perfectly safe.  Committed to trunk.

Andrew

This patch was in trunk when gcc15 was forked, so gcc15 is already 
covered.   Attached are the patches for gcc14 and gcc13.


Bootstrapped with no regressions on x86_64-pc-linux-gnu.

Do you want me to check it in for either or both branches?

Andrew

From 7a7e91e59850e61b790571524d0bc337409694a7 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 27 Mar 2025 13:50:16 -0400
Subject: [PATCH 2/3] If the LHS does not contain zero, neither do multiply
 operands.

Given ~[0,0] = op1 * op2, range-ops should determine that neither op1 nor
op2 is zero.  Add this to the operator_mult for op1_range.  op2_range
simply invokes op1_range, so both will be covered.

	PR tree-optimzation/110992.c
	PR tree-optimzation/119471.c
	gcc/
	* range-op.cc (operator_mult::op1_range): If the LHS does not
	contain zero, return non-zero.

	gcc/testsuite/
	* gcc.dg/pr110992.c: New.
	* gcc.dg/pr119471.c: New.
---
 gcc/range-op.cc |  8 
 gcc/testsuite/gcc.dg/pr110992.c | 18 ++
 gcc/testsuite/gcc.dg/pr119471.c | 19 +++
 3 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr110992.c
 create mode 100644 gcc/testsuite/gcc.dg/pr119471.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 97a88dc7efa..275b3ae6891 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1976,6 +1976,14 @@ operator_mult::op1_range (irange &r, tree type,
   if (op2.singleton_p (&offset) && !integer_zerop (offset))
 return range_op_handler (TRUNC_DIV_EXPR, type).fold_range (r, type,
 			   lhs, op2);
+
+  //  ~[0, 0] = op1 * op2  defines op1 and op2 as non-zero.
+  if (!contains_zero_p ((lhs)))
+   {
+ r.set_nonzero (type);
+ return true;
+   }
+
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr110992.c b/gcc/testsuite/gcc.dg/pr110992.c
new file mode 100644
index 000..05e9b9267e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110992.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+void foo (int);
+
+int f(unsigned b, short c)
+{
+  int bt = b;
+  int bt1 = bt;
+  int t = bt1 & -(c!=0);
+ // int t = bt1 * (c!=0);
+
+  if (!t) return 0;
+  foo(bt == 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "foo \\(0\\)" 1 "evrp" } } */
diff --git a/gcc/testsuite/gcc.dg/pr119471.c b/gcc/testsuite/gcc.dg/pr119471.c
new file mode 100644
index 000..4c55d85f77c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr119471.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+int fa(int a, int b)
+{
+  int c = a * b;
+  if (c != 0)
+return (a != 0);
+  return 0;
+}
+int fb(int a, int b)
+{
+  int c = a * b;
+  if (c != 0)
+return (b != 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <1" 2 "evrp" } } */
-- 
2.45.0

From 15f3ea2326f97bbde68420606a885e8d0f87b262 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 27 Mar 2025 13:44:00 -0400
Subject: [PATCH 1/3] If the LHS does not contain zero, neither do multiply
 operands.

Given ~[0,0] = op1 * op2, range-ops should determine that neither op1 nor
op2 is zero.  Add this to the operator_mult for op1_range.  op2_range
simply invokes op1_range, so both will be covered.

	PR tree-optimzation/110992.c
	PR tree-optimzation/119471.c
	gcc/
	* range-op.cc (operator_mult::op1_range): If the LHS does not
	contain zero, return non-zero.

	gcc/testsuite/
	* gcc.dg/pr110992.c: New.
	* gcc.dg/pr119471.c: New.
---
 gcc/range-op.cc |  7 +++
 gcc/testsuite/gcc.dg/pr110992.c | 18 ++
 gcc/testsuite/gcc.dg/pr119471.c | 19 +++
 3 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr110992.c
 create mode 100644 gcc/testsuite/gcc.dg/pr119471.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index d1a1cd73687..6ea0d9935eb 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2140,6 +2140,13 @@ operator_mult::op1_range (irange &r, tree type,
   wide_int offset;
   if (op2.singleton_p (offset) && offset != 0)
 return range_op_ha

Re: [PATCH v2] RISC-V: Recognized Svrsw60t59b extension

2025-04-29 Thread Kito Cheng

LGTM, but pending for the spec ratified, also a minor comment is the
link seems dead, we may use
https://github.com/riscv/riscv-isa-manual/pull/1907 instead

On Fri, Mar 21, 2025 at 8:56 AM Mingzhu Yan  wrote:
>
> This patch support svrsw60t59b extension[1].
> To enable GCC to recognize and process svrsw60t59b extension correctly at 
> compile time.
>
> [1] https://github.com/riscv/Svrsw60t59b
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc:
> (riscv_ext_version_table) New extension.
> (riscv_ext_flag_table) Ditto.
> * config/riscv/riscv-opts.h: New mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-45.c: New test.
>
> gcc/doc:
>
> * doc/invoke.texi(RISC-V Options): New extension
>
> Signed-off-by: Mingzhu Yan 
> ---
>  gcc/common/config/riscv/riscv-common.cc  | 16 +---
>  gcc/config/riscv/riscv.opt   |  2 ++
>  gcc/doc/invoke.texi  |  4 
>  gcc/testsuite/gcc.target/riscv/arch-45.c |  5 +
>  4 files changed, 20 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index b34409adf..c104cc335 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -409,10 +409,11 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
>{"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},
>
> -  {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
> -  {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
> -  {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
> -  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svrsw60t59b", ISA_SPEC_CLASS_NONE, 1, 0},
>
>{"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1732,9 +1733,10 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>RISCV_EXT_FLAG_ENTRY ("zcmp", x_riscv_zc_subext, MASK_ZCMP),
>RISCV_EXT_FLAG_ENTRY ("zcmt", x_riscv_zc_subext, MASK_ZCMT),
>
> -  RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
> -  RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
> -  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
> +  RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
> +  RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
> +  RISCV_EXT_FLAG_ENTRY ("svvptc",  x_riscv_sv_subext, MASK_SVVPTC),
> +  RISCV_EXT_FLAG_ENTRY ("svrsw60t59b", x_riscv_sv_subext, MASK_SVRSW60T59B),
>
>RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
>
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 7515c8ea1..4c6387ab7 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -472,6 +472,8 @@ Mask(SVNAPOT) Var(riscv_sv_subext)
>
>  Mask(SVVPTC) Var(riscv_sv_subext)
>
> +Mask(SVRSW60T59B) Var(riscv_sv_subext)
> +
>  TargetVariable
>  int riscv_ztso_subext
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1819bcdcd..7e61106c5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -31234,6 +31234,10 @@ to @samp{zvks} and @samp{zvkg}.
>  @tab 1.0
>  @tab Page-based memory types extension.
>
> +@item svrsw60t59b
> +@tab 1.0
> +@tab PTE Reserved-for-Software Bits 60-59 extension.
> +
>  @item xcvmac
>  @tab 1.0
>  @tab Core-V multiply-accumulate extension.
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-45.c 
> b/gcc/testsuite/gcc.target/riscv/arch-45.c
> new file mode 100644
> index 0..fe3ee441d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-45.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_svrsw60t59b -mabi=lp64" } */
> +int foo()
> +{
> +}
> --
> 2.43.0
>

[PATCH] MIPS: Fixed the problem that the nop instruction is inserted at the wrong position after enabling '-fpatchable-function-entry='

2025-04-29 Thread Lulu Cheng

Because MIPS function symbol is generated in the prologue function,
this nop generation should be done in prologue.

OK for trunk?

PR target/99217
gcc/ChangeLog:

* config/mips/mips.cc (mips_start_function_definition):
Implements the functionality of '-fpatchable-function-entry='.
(mips_print_patchable_function_entry): Define empty function.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define macro.

Change-Id: If156e635568edb6560d9230f6967e01d207d46b2
---
 gcc/config/mips/mips.cc | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 492fa285477..7478e376fdf 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -7478,6 +7478,9 @@ static void
 mips_start_function_definition (const char *name, bool mips16_p,
tree decl ATTRIBUTE_UNUSED)
 {
+  unsigned HOST_WIDE_INT patch_area_size = crtl->patch_area_size;
+  unsigned HOST_WIDE_INT patch_area_entry = crtl->patch_area_entry;
+
   if (mips16_p)
 fprintf (asm_out_file, "\t.set\tmips16\n");
   else
@@ -7490,6 +7493,10 @@ mips_start_function_definition (const char *name, bool 
mips16_p,
 fprintf (asm_out_file, "\t.set\tnomicromips\n");
 #endif
 
+  /* Emit the patching area before the entry label, if any.  */
+  if (patch_area_entry > 0)
+default_print_patchable_function_entry (asm_out_file,
+   patch_area_entry, true);
   if (!flag_inhibit_size_directive)
 {
   fputs ("\t.ent\t", asm_out_file);
@@ -7501,6 +7508,13 @@ mips_start_function_definition (const char *name, bool 
mips16_p,
 
   /* Start the definition proper.  */
   ASM_OUTPUT_FUNCTION_LABEL (asm_out_file, name, decl);
+
+  /* And the area after the label.  Record it if we haven't done so yet.  */
+  if (patch_area_size > patch_area_entry)
+default_print_patchable_function_entry (asm_out_file,
+   patch_area_size
+   - patch_area_entry,
+   patch_area_entry == 0);
 }
 
 /* End a function definition started by mips_start_function_definition.  */
@@ -23314,6 +23328,20 @@ mips_bit_clear_p (enum machine_mode mode, unsigned 
HOST_WIDE_INT m)
   return false;
 }
 
+/* define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY */
+
+/* The MIPS function start is implemented in the prologue function.
+   TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY needs to be inserted
+   before or after the function name, so this function does not
+   use a public implementation. This function is implemented in
+   mips_start_function_definition. */
+
+void
+mips_print_patchable_function_entry (FILE *file,
+unsigned HOST_WIDE_INT patch_area_size,
+bool record_p)
+{}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -23627,6 +23655,10 @@ mips_bit_clear_p (enum machine_mode mode, unsigned 
HOST_WIDE_INT m)
 #undef TARGET_DOCUMENTATION_NAME
 #define TARGET_DOCUMENTATION_NAME "MIPS"
 
+#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
+#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
+mips_print_patchable_function_entry
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-mips.h"
-- 
2.34.1

Re: [PATCH][gcc13] PR tree-optimization/117287 - Backport new assume implementation

2025-04-29 Thread Andrew MacLeod



On 3/28/25 05:25, Jakub Jelinek wrote:

On Fri, Mar 28, 2025 at 08:12:35AM +0100, Richard Biener wrote:

On Thu, Mar 27, 2025 at 8:14 PM Andrew MacLeod  wrote:

This patch backports the ASSUME support that was rewritten in GCC 15.

Its slightly more complicated than the port to GCC 14 was in that a few
classes have been rewritten. I've isolated them all to tree-assume.cc
which contains the pass.

It has to also bring in the ssa_cache and lazy_ssa_cache from gcc14,
along with some tweaks to those classes to deal with changes in the way
range_allocators worked started in GCC14. Those changes are are all the
top of the tree-assume.cc file. The rest of the file is a carbon copy of
the GCC14 version. (well, what should be... there is an outstanding
debug output support that was never submitted I discovered)

I'm not sure if its worth putting this in GCC13 or not, but I will
submit it and leave it to the release managers :-)  It should be low
risk, especially since assume was experimental support?

I have no strong opinion here besides questioning whether it's
necessary (as you say, assume is experimental) and the fact that
by splicing out the VRP changes to a special place further maintenance
is made more difficult.

IMO, up to you (expecting you'll fix issues if they come up), but would
like to hear a 2nd opinion from Jakub.

I'd probably apply it, it was a wrong-code issue and I'm not sure
users understand assume as experimental.
While the [[assume (...)]]; form is a C++23 feature which is experimental,
we accept that attribute even since C++11 and in C23 and in the
__attribute__((assume (...))); form everywhere and as a documented
extension.

If the ranger changes are done only when users actually use assume rather
than all the time (and only when using non-trivial assumptions, trivial
ones with no side-effects are turned into if (!x) __builtin_unreachable ()),
I think this decreases the risks.

Jakub


I've reattached the patch for gcc13.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

OK for gcc13 branch, or do we consider that "too old" now? :-)

Andrew

From 892a92002f94e2856fdee164b4d620edc69184b5 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 27 Mar 2025 10:51:16 -0400
Subject: [PATCH 1/3] backport new assume implementation and cache.

---
 gcc/Makefile.in|   1 +
 gcc/gimple-range-fold.cc   |  13 -
 gcc/gimple-range-fold.h|  12 +
 gcc/gimple-range.cc| 189 --
 gcc/gimple-range.h |  19 -
 gcc/testsuite/g++.dg/cpp23/pr117287-attr.C |  38 ++
 gcc/tree-assume.cc | 650 +
 gcc/tree-vrp.cc|  68 ---
 8 files changed, 701 insertions(+), 289 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/pr117287-attr.C
 create mode 100644 gcc/tree-assume.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 775aaa1b3c4..1d9e10127ca 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1633,6 +1633,7 @@ OBJS = \
 	ubsan.o \
 	sanopt.o \
 	sancov.o \
+	tree-assume.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 180f349eda9..e2bb294624f 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -103,21 +103,8 @@ fur_source::register_relation (edge e ATTRIBUTE_UNUSED,
 {
 }
 
-// This version of fur_source will pick a range up off an edge.
-
-class fur_edge : public fur_source
-{
-public:
-  fur_edge (edge e, range_query *q = NULL);
-  virtual bool get_operand (vrange &r, tree expr) override;
-  virtual bool get_phi_operand (vrange &r, tree expr, edge e) override;
-private:
-  edge m_edge;
-};
-
 // Instantiate an edge based fur_source.
 
-inline
 fur_edge::fur_edge (edge e, range_query *q) : fur_source (q)
 {
   m_edge = e;
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 68c6d7743e9..0a028e31be0 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -149,6 +149,18 @@ protected:
   relation_oracle *m_oracle;
 };
 
+// This version of fur_source will pick a range up off an edge.
+
+class fur_edge : public fur_source
+{
+public:
+  fur_edge (edge e, range_query *q = NULL);
+  virtual bool get_operand (vrange &r, tree expr) override;
+  virtual bool get_phi_operand (vrange &r, tree expr, edge e) override;
+private:
+  edge m_edge;
+};
+
 // This class uses ranges to fold a gimple statement producing a range for
 // the LHS.  The source of all operands is supplied via the fur_source class
 // which provides a range_query as well as a source location and any other
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index b4de8dd4ef9..e1f283c774c 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -729,192 +729,3 @@ disable_ranger (struct function *fun)
   fun->x_range_query = NULL;
 }
 
-// ---

[pushed] i386: Skip sub-RTXes of memory operand in ix86_update_stack_alignment

2025-04-29 Thread Uros Bizjak

Skip sub-RTXes of the memory operand if stack access register is
not mentioned in the operand.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_update_stack_alignment): Skip sub-RTXes
of the memory operand if stack access register is not mentioned in
the operand.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ae2386785af..bfd9cac215a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -8495,13 +8495,18 @@ ix86_update_stack_alignment (rtx, const_rtx pat, void 
*data)
   FOR_EACH_SUBRTX (iter, array, pat, ALL)
 {
   auto op = *iter;
-  if (MEM_P (op) && reg_mentioned_p (p->reg, op))
+  if (MEM_P (op))
{
- unsigned int alignment = MEM_ALIGN (op);
+ if (reg_mentioned_p (p->reg, XEXP (op, 0)))
+   {
+ unsigned int alignment = MEM_ALIGN (op);
 
- if (alignment > *p->stack_alignment)
-   *p->stack_alignment = alignment;
- break;
+ if (alignment > *p->stack_alignment)
+   *p->stack_alignment = alignment;
+ break;
+   }
+ else
+   iter.skip_subrtxes ();
}
 }
 }

[PATCH v3] libstdc++: Cleanup and stabilize format _Spec<_CharT> and _Pres_type.

2025-04-29 Thread Tomasz Kamiński

These patch makes following changes to _Pres_type values:
 * _Pres_esc is replaced with separate _M_debug flag.
 * _Pres_s, _Pres_p do not overlap with _Pres_none.
 * hexadecimal presentation use same values for pointer, integer
   and floating point types.

Instead of `_M_reserved` and `_M_reserved2` bitfields, the members of
_Spec<_CharT> are rearranged so the class contains 16bit of tail padding.
Derived classes (like _ChronoSpec<_CharT>) can reuse the storage for initial
members. We also add _SpecBase as the base class for _Spec<_CharT> to make
it non-C++98 POD, which allows tail padding to be reused on Itanium ABI.

Finally, the format enumerators are defined as enum class with unsigned
char as underlying type, followed by using enum to bring names in scope.
_Term_char was adjusted for consistency.

The '?' is changed to separate _M_debug flag, to allow debug format to be
independent from the presentation type, and applied to multiple presentation
types. For example it could be used to trigger memberwise or reflection based
formatting.

The _M_format_character and _M_format_character_escaped functions are merged
to single function that handle normal and debug presentation. In particular
this would allow future support for '?c' for printing integer types as escaped
character. _S_character_width is also folded in the merged function.

Decoupling _Pres_s value from _Pres_none, allows it to be used for string
presentation for range formatting, and removes the need for separate _Pres_seq
and _Pres_str. This does not affect formatting of bool as 
__formatter_int::_M_parse
overrides default value of _M_type. And with separation of the _M_debug flag,
__formatter_str::format behavior is now agnostic to _M_type value.

The values for integer presentation types, are arranged so textual presentations
(_Prec_s, _Pres_c) are grouped together. For consistency floating point
hexadecimal presentation uses the same values as integer ones.

New _Pres_p and setting for _M_alt enables using some spec to configure 
formatting
of  uintptr_t with __formatter_int, and const void* with __formatter_ptr.
Differentiating it from _Pres_none would allow future of formatter
that would require explicit presentation type to be specified. This would allow
std::vector to be formatter directly with '{::p}' format spec.

The constructors for __formatter_int and _formatter_ptr from _Spec<_CharT>,
now also set default presentation modes, as format functions expects them.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_ChronoSpec<_CharT>::_M_locale_specific):
Declare as bit filed in tail-padding..
* include/bits/formatfwd.h (__format::_Align): Defined as enum class
and add using enum.
* include/std/format (__format::_Pres_type, __format::_Sign)
(__format::_WidthPrec,  __format::_Arg_t): Defined as enum class and
add using enum.
(_Pres_type::_Pres_esc): Replace with _Pres_max.
(_Pres_type::_Pres_seq, _Pres_type::_Pres_str): Remove.
(__format::_Pres_type): Updated values of enumerators as described 
above.
(_Spec<_CharT>): Rearranged members to have 16bits of tail-padding.
(_Spec<_CharT>::_M_debug): Defined.
(_Spec<_CharT>::_M_reserved, _Spec<_CharT>::_M_reserved2): Removed.
(_Spec<_CharT>::_M_parse_fill_and_align, _Spec<_CharT>::_M_parse_sign)
(__format::__write_padded_as_spec): Adjusted default value checks.
(__format::_Term_char): Add using enum and rename enumertors.
(__format::__should_escape_ascii): Adjusted _Term_char uses.
(__formatter_str<_CharT>::parse): Set _Pres_s if specifed and _M_debug
instead of _Pres_esc.
(__formatter_str<_CharT>::set_debug_format): Set _M_debug instead of
_Pres_esc.
(__formatter_str<_CharT>::format, 
__formatter_str<_CharT>::_M_format_range):
Check _M_debug instead of _Prec_esc.
(__formatter_str<_CharT>::_M_format_escaped): Adjusted _Term_char uses.
(__formatter_int<_CharT>::__formatter_int(_Spec<_CharT>)): Set _Pres_d 
if
default presentation type is not set.
(__formatter_int<_CharT>::_M_parse): Adjusted default value checks.
(__formatter_int<_CharT>::_M_do_parse): Set _M_debug instead of 
_Pres_esc.
(__formatter_int<_CharT>::_M_format_character): Handle escaped 
presentation.
(__formatter_int<_CharT>::_M_format_character_escaped)
(__formatter_int<_CharT>::_S_character_width): Merged into 
_M_format_character.
(__formatter_ptr<_CharT>::__formatter_ptr(_Spec<_CharT>)): Set _Pres_p 
if default
presentation type is not set.
(__formatter_ptr<_CharT>::parse): Add default __type parameter, store 
_Pres_p,
and handle _M_alt to be consistent with meaning for integers.
(__foramtter_ptr<_CharT>::_M_set_default): Define.
(__format::__pack_arg_types, std::basic_format_args): Add necessary 
casts.
(formatter<_CharT,

Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-04-29 Thread Uros Bizjak

On Tue, Apr 29, 2025 at 9:56 AM H.J. Lu  wrote:
>
> Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy
> propagation pass can eliminate multiple __tls_get_addr calls.

__tls_get_addr needs to be called with 16-byte aligned stack, I don't
think the compiler will correctly handle required call alignment if
you emit the call without emit_libcall_block.

Uros.

>
> gcc/
>
> PR target/81501
> * config/i386/i386-protos.h (ix86_split_tls_local_dynamic_base_64):
> New.
> * config/i386/i386.cc (ix86_split_tls_local_dynamic_base_64): New.
> (legitimize_tls_address): Don't emit the 64-bit UNSPEC_TLS_LD_BASE
> as a call.
> * config/i386/i386.md (*tls_local_dynamic_base_64_): Renamed
> to ...
> (@tls_local_dynamic_base_call_64_): This.  Replace
> (match_operand 2) with (const_int 0).
> (@tls_local_dynamic_base_64_): Change call to unspec.
> (*tls_local_dynamic_base_64_): New.
>
> gcc/testsuite/
>
> PR target/81501
> * gcc.target/i386/pr81501-1.c: New test.
>
> OK for master?
>
> Thanks.
>
> --
> H.J.

Re: [PATCH v2] libstdc++: Use 'if constexpr' to slightly simplify

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 10:58 AM Jonathan Wakely  wrote:

> This will hardly make a dent in the very slow compile times for 
> but it seems worth doing anyway.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/regex_compiler.h: Replace _GLIBCXX17_CONSTEXPR
> with constexpr and disable diagnostics with pragmas.
> (_AnyMatcher::operator()): Use constexpr-if instead of tag
> dispatching. Postpone calls to _M_translate until after checking
> result of earlier calls.
> (_AnyMatcher::_M_apply): Remove both overloads.
> (_BracketMatcher::operator(), _BracketMatcher::_M_ready):
> Replace tag dispatching with 'if constexpr'.
> (_BracketMatcher::_M_apply(_CharT, true_type)): Remove.
> (_BracketMatcher::_M_apply(_CharT, false_type)): Remove second
> parameter.
> (_BracketMatcher::_M_make_cache): Remove both overloads.
> * include/bits/regex_compiler.tcc (_BracketMatcher::_M_apply):
> Remove second parameter.
> * include/bits/regex_executor.tcc: Replace _GLIBCXX17_CONSTEXPR
> with constexpr and disable diagnostics with pragmas.
> (_Executor::_M_handle_backref): Replace __glibcxx_assert with
> constexpr-if and use __builtin_unreachable for non-DFS mode
> specialization.
> (_Executor::_M_handle_accept): Mark _S_opcode_backref case as
> unreachable for non-DFS mode.
> ---
>
> Tested x86_64-linux.
>
> The v2 patch uses constexpr-if in  as well, and
> optimizes _AnyMatcher so it doesn't do all the _M_translate calls up
> front.
>

>  libstdc++-v3/include/bits/regex_compiler.h   | 67 
>  libstdc++-v3/include/bits/regex_compiler.tcc |  2 +-
>  libstdc++-v3/include/bits/regex_executor.tcc | 62 ++
>  3 files changed, 64 insertions(+), 67 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/regex_compiler.h
> b/libstdc++-v3/include/bits/regex_compiler.h
> index f24c7e3baa6..21e7065e066 100644
> --- a/libstdc++-v3/include/bits/regex_compiler.h
> +++ b/libstdc++-v3/include/bits/regex_compiler.h
> @@ -38,6 +38,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>
>  _GLIBCXX_END_NAMESPACE_CXX11
>
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
>  namespace __detail
>  {
>/**
> @@ -221,9 +223,9 @@ namespace __detail
>_CharT
>_M_translate(_CharT __ch) const
>{
> -   if _GLIBCXX17_CONSTEXPR (__icase)
> +   if constexpr (__icase)
>   return _M_traits.translate_nocase(__ch);
> -   else if _GLIBCXX17_CONSTEXPR (__collate)
> +   else if constexpr (__collate)
>   return _M_traits.translate(__ch);
> else
>   return __ch;
> @@ -285,7 +287,7 @@ namespace __detail
>bool
>_M_match_range(_CharT __first, _CharT __last, _CharT __ch) const
>{
> -   if _GLIBCXX17_CONSTEXPR (!__icase)
> +   if constexpr (!__icase)
>   return __first <= __ch && __ch <= __last;
> else
>   return this->_M_in_range_icase(__first, __last, __ch);
> @@ -376,26 +378,20 @@ namespace __detail
>
>bool
>operator()(_CharT __ch) const
> -  { return _M_apply(__ch, typename is_same<_CharT, char>::type()); }
> -
> -  bool
> -  _M_apply(_CharT __ch, true_type) const
>{
> -   auto __c = _M_translator._M_translate(__ch);
> -   auto __n = _M_translator._M_translate('\n');
> -   auto __r = _M_translator._M_translate('\r');
> -   return __c != __n && __c != __r;
> -  }
> -
> -  bool
> -  _M_apply(_CharT __ch, false_type) const
> -  {
> -   auto __c = _M_translator._M_translate(__ch);
> -   auto __n = _M_translator._M_translate('\n');
> -   auto __r = _M_translator._M_translate('\r');
> -   auto __u2028 = _M_translator._M_translate(u'\u2028');
> -   auto __u2029 = _M_translator._M_translate(u'\u2029');
> -   return __c != __n && __c != __r && __c != __u2028 && __c !=
> __u2029;
> +   const auto __c = _M_translator._M_translate(__ch);
> +   if (__c == _M_translator._M_translate('\n'))
> + return false;
> +   if (__c == _M_translator._M_translate('\r'))
> + return false;
> +   if constexpr (!is_same<_CharT, char>::value)
> + {
> +   if (__c == _M_translator._M_translate(u'\u2028')) // line sep
> + return false;
> +   if (__c == _M_translator._M_translate(u'\u2029')) // para sep
> + return false;
> + }
> +   return true;
>}
>
>_TransT _M_translator;
> @@ -441,7 +437,10 @@ namespace __detail
>operator()(_CharT __ch) const
>{
> _GLIBCXX_DEBUG_ASSERT(_M_is_ready);
> -   return _M_apply(__ch, _UseCache());
> +   if constexpr (_UseCache::value)
> + if (!(__ch & 0x80)) [[__likely__]]
> +   return _M_cache[static_cast<_UnsignedCharT>(__ch)];
> +   return _M_apply(__ch);
>}
>
>

Re: [PATCH] libstdc++: Use constexpr-if for C++11 and C++14

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 11:05 AM Jonathan Wakely  wrote:

> Replace remaining uses of _GLIBCXX17_CONSTEXPR for constexpr-if, so that
> we always use constexpr-if in C++11 and C++14. Diagnostic pragmas are
> used to suppress diagnostics.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/char_traits.h (char_traits::assign): Use
> constexpr-if unconditionally for C++11 and C++14.
> * include/bits/locale_conv.h (__do_str_codecvt): Likewise.
> * include/bits/ostream.h (basic_ostream::_S_cast_flt): Likewise.
> * include/bits/random.tcc (shuffle_order_engine::operator())
> (seed_seq::seed_seq(Iter, Iter)): Likewise.
> * include/bits/shared_ptr_base.h (_Sp_counted_base::_M_release):
> Likewise.
> * include/bits/stl_tree.h (_Rb_tree::_M_move_data): Likewise.
> * include/bits/uniform_int_dist.h
> (uniform_int_distribution::operator()): Likewise.
> * include/bits/valarray_array.h (__valarray_default_construct)
> (__valarray_fill_construct, __valarray_copy_construct)
> (__valarray_copy_construct, __valarray_destroy_elements):
> Likewise.
> * include/experimental/numeric (lcm): Likewise.
> * include/std/bit (__rotl, __rotr, __countl_zero, __countr_zero)
> (__popcount, __bit_ceil) Likewise.:
> * include/std/bitset (operator>>): Likewise.
> * include/std/charconv (__to_chars_8, __to_chars_i)
> (__from_chars_alnum_to_val, from_chars): Likewise.
> * include/tr2/dynamic_bitset (__dynamic_bitset_base): Likewise.
> * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
> line number.
> ---
>
> Tested x86_64-linux.
>
LGTM

>
>  libstdc++-v3/include/bits/char_traits.h   |  5 +++-
>  libstdc++-v3/include/bits/locale_conv.h   |  7 +++--
>  libstdc++-v3/include/bits/ostream.h   |  4 +--
>  libstdc++-v3/include/bits/random.tcc  | 10 +--
>  libstdc++-v3/include/bits/shared_ptr_base.h   |  5 +++-
>  libstdc++-v3/include/bits/stl_tree.h  |  5 +++-
>  libstdc++-v3/include/bits/uniform_int_dist.h  |  7 +++--
>  libstdc++-v3/include/bits/valarray_array.h| 15 +++
>  libstdc++-v3/include/experimental/numeric |  5 +++-
>  libstdc++-v3/include/std/bit  | 27 ++-
>  libstdc++-v3/include/std/bitset   | 12 ++---
>  libstdc++-v3/include/std/charconv | 15 ++-
>  libstdc++-v3/include/tr2/dynamic_bitset   | 10 +--
>  .../26_numerics/random/pr60037-neg.cc |  2 +-
>  14 files changed, 87 insertions(+), 42 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/char_traits.h
> b/libstdc++-v3/include/bits/char_traits.h
> index 67e18e89784..5ca34669302 100644
> --- a/libstdc++-v3/include/bits/char_traits.h
> +++ b/libstdc++-v3/include/bits/char_traits.h
> @@ -284,7 +284,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>  #endif
>
> -  if _GLIBCXX17_CONSTEXPR (sizeof(_CharT) == 1 &&
> __is_trivial(_CharT))
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
> +  if _GLIBCXX_CONSTEXPR (sizeof(_CharT) == 1 && __is_trivial(_CharT))
> {
>   if (__n)
> {
> @@ -298,6 +300,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   for (std::size_t __i = 0; __i < __n; ++__i)
> __s[__i] = __a;
> }
> +#pragma GCC diagnostic pop
>return __s;
>  }
>
> diff --git a/libstdc++-v3/include/bits/locale_conv.h
> b/libstdc++-v3/include/bits/locale_conv.h
> index 076e14ff762..b08795dcaf5 100644
> --- a/libstdc++-v3/include/bits/locale_conv.h
> +++ b/libstdc++-v3/include/bits/locale_conv.h
> @@ -85,16 +85,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   return false;
> }
>
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
>// The codecvt facet will only return noconv when the types are
>// the same, so avoid instantiating basic_string::assign otherwise
> -  if _GLIBCXX17_CONSTEXPR (is_same -  typename _Codecvt::extern_type>())
> +  if constexpr (is_same +   typename _Codecvt::extern_type>::value)
> if (__result == codecvt_base::noconv)
>   {
> __outstr.assign(__first, __last);
> __count = __last - __first;
> return true;
>   }
> +#pragma GCC diagnostic pop
>
>__outstr.resize(__outchars);
>__count = __next - __first;
> diff --git a/libstdc++-v3/include/bits/ostream.h
> b/libstdc++-v3/include/bits/ostream.h
> index d19a76ab247..caa47bead4b 100644
> --- a/libstdc++-v3/include/bits/ostream.h
> +++ b/libstdc++-v3/include/bits/ostream.h
> @@ -499,9 +499,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif
>   __sign = __builtin_signbit(__f) ? _To(-1.0) : _To(+1.0);
>
> - if _GLIBCXX17_CONSTEXPR (__is_same(_To, double))

[pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread Uros Bizjak

MOVS instructions allow segment override of their source operand, e.g.:

rep movsq %gs:(%rsi), (%rdi)

where %rsi is the address of the source location (with %gs segment override)
and %rdi is the address of the destination location.

The testcase improves from (-O2 -mno-sse -mtune=generic):

xorl%eax, %eax
.L2:
movl%eax, %edx
addl$8, %eax
movq%gs:m(%rdx), %rcx
movq%rcx, (%rdi,%rdx)
cmpl$240, %eax
jb.L2
ret

to:
movl$m, %esi
movl$30, %ecx
rep movsq %gs:(%rsi), (%rdi)
ret

PR 111657

gcc/ChangeLog:

* config/i386/i386-expand.cc (alg_usable_p): Remove have_as bool
argument and add dst_as and src_as address space arguments.  Reject
libcall algorithm with dst_as and src_as in the non-default address
spaces.  Reject rep_prefix_{1,4,8}_byte algorithms with dst_as in
the non-default address space.
(decide_alg): Remove have_as bool argument and add dst_as and src_as
address space arguments.  Update calls to alg_usable_p.
(ix86_expand_set_or_cpymem): Update call to decide_alg.
* config/i386/i386.md (strmov): Do not fail if operand[3] (source)
is in the non-default address space.  Expand with gen_strmov_singleop
only when operand[1] (destination) is in the default address space.
(*strmovdi_rex_1): Determine memory operands from insn pattern.
Allow only when destination is in the default address space.
Rewrite asm template to use explicit operands.
(*strmovsi_1): Ditto.
(*strmovhi_1): DItto.
(*strmovqi_1): Ditto.
(*rep_movdi_rex64): Ditto.
(*rep_movsi): Ditto.
(*rep_movqi): Ditto.
(*strsetdi_rex_1): Determine memory operands from insn pattern.
Allow only when destination is in the default address space.
(*strsetsi_1): Ditto.
(*strsethi_1): Ditto.
(*strsetqi_1): Ditto.
(*rep_stosdi_rex64): Ditto.
(*rep_stossi): Ditto.
(*rep_stosqi): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111657-1.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 388e65192e4..f1cc85b4531 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -8907,31 +8907,33 @@ expand_set_or_cpymem_constant_prologue (rtx dst, rtx 
*srcp, rtx destreg,
 /* Return true if ALG can be used in current context.
Assume we expand memset if MEMSET is true.  */
 static bool
-alg_usable_p (enum stringop_alg alg, bool memset, bool have_as)
+alg_usable_p (enum stringop_alg alg, bool memset,
+ addr_space_t dst_as, addr_space_t src_as)
 {
   if (alg == no_stringop)
 return false;
   /* It is not possible to use a library call if we have non-default
  address space.  We can do better than the generic byte-at-a-time
  loop, used as a fallback.  */
-  if (alg == libcall && have_as)
+  if (alg == libcall &&
+  !(ADDR_SPACE_GENERIC_P (dst_as) && ADDR_SPACE_GENERIC_P (src_as)))
 return false;
   if (alg == vector_loop)
 return TARGET_SSE || TARGET_AVX;
   /* Algorithms using the rep prefix want at least edi and ecx;
  additionally, memset wants eax and memcpy wants esi.  Don't
  consider such algorithms if the user has appropriated those
- registers for their own purposes, or if we have a non-default
- address space, since some string insns cannot override the segment.  */
+ registers for their own purposes, or if we have the destination
+ in the non-default address space, since string insns cannot
+ override the destination segment.  */
   if (alg == rep_prefix_1_byte
   || alg == rep_prefix_4_byte
   || alg == rep_prefix_8_byte)
 {
-  if (have_as)
-   return false;
   if (fixed_regs[CX_REG]
  || fixed_regs[DI_REG]
- || (memset ? fixed_regs[AX_REG] : fixed_regs[SI_REG]))
+ || (memset ? fixed_regs[AX_REG] : fixed_regs[SI_REG])
+ || !ADDR_SPACE_GENERIC_P (dst_as))
return false;
 }
   return true;
@@ -8941,8 +8943,8 @@ alg_usable_p (enum stringop_alg alg, bool memset, bool 
have_as)
 static enum stringop_alg
 decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size,
unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size,
-   bool memset, bool zero_memset, bool have_as,
-   int *dynamic_check, bool *noalign, bool recur)
+   bool memset, bool zero_memset, addr_space_t dst_as,
+   addr_space_t src_as, int *dynamic_check, bool *noalign, bool recur)
 {
   const struct stringop_algs *algs;
   bool optimize_for_speed;
@@ -8974,7 +8976,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
expected_size,
   for (i = 0; i < MAX_STRINGOP_ALGS; i++)
 {
   enum stringop_alg candidate = algs->size[i].alg;
-  bool usable = alg_usable_p (candidate, memset, have_as);
+  bool usable = alg_usable_p (candidate, memset, dst_as, src_as);
   a

[PATCH 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-04-29 Thread Christopher Bazley

Fixed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119317

Tested on AArch64 using the test case provided by the bug
reporter:

int fun()
{
  main:
  while(1)
continue main;
}

Without the fix, this program failed to compile:

test.c: In function ‘fun’:
test.c:5:14: error: ‘continue’ statement operand ‘main’
does not refer to a named loop
5 | continue main;
  |  ^~~~
test.c:3:3: warning: label ‘main’ defined but not used
\[-Wunused-label\]
3 |   main:
  |   ^~~~

gcc/xgcc -B gcc -std=gnu2y -O1 -ggdb2 -Wall -c \
  test.c -o test.o

Christopher Bazley (1):
  Fix BZ 119317: named loops (C2y) with debug info

 gcc/c/c-decl.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.43.0

[PATCH v5 01/10] libstdc++: Setup internal FTM for mdspan.

2025-04-29 Thread Luc Grosheintz

Uses the FTM infrastructure to create an internal feature testing macro
for partial availability of mdspan; which is then used to hide the
contents of the header mdspan when compiling against a standard prior to
C++23.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add internal feature testing macro
__glibcxx_mdspan.
* include/bits/version.h: Regenerate.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/bits/version.def | 9 +
 libstdc++-v3/include/bits/version.h   | 9 +
 2 files changed, 18 insertions(+)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 737b3f421bf..a0b5553ed04 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -998,6 +998,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = mdspan;
+  no_stdname = true; // FIXME: remove
+  values = {
+v = 1; // FIXME: 202207
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = ssize;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 59ff0cee043..1bb97847ce6 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1115,6 +1115,15 @@
 #endif /* !defined(__cpp_lib_span) && defined(__glibcxx_want_span) */
 #undef __glibcxx_want_span
 
+#if !defined(__cpp_lib_mdspan)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_mdspan 1L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_mdspan)
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_mdspan) && defined(__glibcxx_want_mdspan) */
+#undef __glibcxx_want_mdspan
+
 #if !defined(__cpp_lib_ssize)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_ssize 201902L
-- 
2.49.0

[PATCH v5 07/10] libstdc++: Implement layout_right from mdspan.

2025-04-29 Thread Luc Grosheintz

Implement the parts of layout_left that depend on layout_right; and the
parts of layout_right that don't depend on layout_stride.

libstdc++/ChangeLog:

* include/std/mdspan (layout_right): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 147 
 1 file changed, 147 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index e05048a5b93..583792b5269 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -330,6 +330,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_right
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -427,6 +433,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_extents(__other.extents())
{ }
 
+  template
+   requires (_Extents::rank() <= 1
+ && is_constructible_v<_Extents, _OExtents>)
+   constexpr explicit(!is_convertible_v<_OExtents, _Extents>)
+   mapping(const layout_right::mapping<_OExtents>& __other) noexcept
+   : _M_extents(__other.extents())
+   { }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -483,6 +497,139 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[no_unique_address]] extents_type _M_extents;
 };
 
+  namespace __mdspan
+  {
+template
+struct _LinearIndexRight
+{
+  template
+   static constexpr typename _Extents::index_type
+   _S_value(typename _Extents::index_type __accumulated,
+const _Extents&) noexcept
+   { return __accumulated; }
+
+  template
+   static constexpr typename _Extents::index_type
+   _S_value(typename _Extents::index_type __accumulated,
+const _Extents& __exts, typename _Extents::index_type __idx,
+_Indices... __indices) noexcept
+   {
+ // (...) * __exts[r-1] + __indices[r-1];
+ return _LinearIndexRight<_Count + 1>::_S_value(
+ __accumulated * __exts.extent(_Count) + __idx, __exts,
+ __indices...);
+   }
+};
+
+template
+  constexpr typename _Extents::index_type
+  __linear_index_right(const _Extents& __exts, _Indices... __indices)
+  {
+   return _LinearIndexRight<0>::_S_value(0, __exts, __indices...);
+  }
+  }
+
+  template
+class layout_right::mapping
+{
+  static_assert(__mdspan::__layout_extent<_Extents>,
+   "The size of extents_type is not representable as index_type.");
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_right;
+
+  constexpr
+  mapping() noexcept = default;
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  constexpr
+  mapping(const extents_type& __extents) noexcept
+  : _M_extents(__extents)
+  { }
+
+  template
+   requires (is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const mapping<_OExtents>& __other) noexcept
+   : _M_extents(__other.extents())
+   { }
+
+  template
+   requires (extents_type::rank() <= 1
+   && is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const layout_left::mapping<_OExtents>& __other) noexcept
+   : _M_extents(__other.extents())
+   { }
+
+  constexpr mapping&
+  operator=(const mapping&) noexcept = default;
+
+  constexpr const extents_type&
+  extents() const noexcept { return _M_extents; }
+
+  constexpr index_type
+  required_span_size() const noexcept
+  { return __mdspan::__fwd_prod(_M_extents, _M_extents.rank()); }
+
+  template<__mdspan::__valid_index_type... _Indices>
+   requires (sizeof...(_Indices) == extents_type::rank())
+   constexpr index_type
+   operator()(_Indices... __indices) const noexcept
+   {
+ return __mdspan::__linear_index_right(
+   _M_extents, static_cast(__indices)...);
+   }
+
+  static constexpr bool
+  is_always_unique() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_always_exhaustive() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_always_strided() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_unique() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_exhaustive() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_strided() noexcept
+  { return true; }
+
+  constexpr index_type
+  stride(rank_type __i) const noexcept
+  requires (extents_type::rank() > 0)
+  {
+   __glibcxx_assert(__i < extents_type::rank());
+   return __mdspan::__re

[PATCH v5 00/10] Implement extents and layouts from the mdspan header.

2025-04-29 Thread Luc Grosheintz

This patch series follows up on:
https://gcc.gnu.org/pipermail/libstdc++/2025-April/061078.html

As agreed, I'm appending commits that add the layouts to this patch
series. Each layout is added in a separate commit and tests are added in
the immediately following commit.

Changes since v4 to std::extents related code:
* Use else-branch with constexpr if.
* Introduce `_S_is_compatible_extents` and use it to improve the quality
  of the generated code for `operator==`.
* Make `_S_is_dynamic` consteval.
* Silence warning in two cases of `(_Counts, 0)...` about unused
  expression (left of the comma operator).
* Use `cmp_equal` to compare values of with different integer types.
* Fix missing include .
* The following bugs in test code were fixed:
  - assumed that `size_t` was 64 bits wide (arm).
  - assumed that `cstdint` is imported implicitly (aarch64).
  - tested that a function pointer was not null, instead of calling
the function.
  - used and incorrect integer type.

New since v4:
* `layout_left`, `layout_right` and `layout_stride`.

In the implementation of `layout_stride::is_exhaustive`, I encounter an
issue that the standard requires that there exists a permutation of the
ranks, such that the permutated layout is a `layout_left` [0]. If the size
of the extent is zero, then this condition implies that the method
`is_exhaustive` returns false for some choices of strides, even though
the layout mapping requirements for `is_exhaustive` are always satisfied
trivially, i.e. because for every element in the empty set anything is
true. One example is `extents = {0, 0, 1}` and `strides = {1, 1, 1}`.
Currently, I haven't been able to come up with a way of checking the
condition (other than brute force checking all permutations), which is
why I've implemented a slightly different check, that satisfies the
layout mapping requirements, but also return true for the example
provided above.

[0]: https://eel.is/c++draft/mdspan.layout#stride.obs-5.2

Thank you Tomasz for the excellent review of v4.

Luc Grosheintz (10):
  libstdc++: Setup internal FTM for mdspan.
  libstdc++: Add header mdspan to the build-system.
  libstdc++: Implement std::extents [PR107761].
  libstdc++: Add tests for std::extents.
  libstdc++: Implement layout_left from mdspan.
  libstdc++: Add tests for layout_left.
  libstdc++: Implement layout_right from mdspan.
  libstdc++: Add tests for layout_right.
  libstdc++: Implement layout_stride from mdspan.
  libstdc++: Add tests for layout_stride.

 libstdc++-v3/doc/doxygen/user.cfg.in  |   1 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |   9 +
 libstdc++-v3/include/precompiled/stdc++.h |   1 +
 libstdc++-v3/include/std/mdspan   | 863 ++
 libstdc++-v3/src/c++23/std.cc.in  |   6 +-
 .../mdspan/extents/class_mandates_neg.cc  |   8 +
 .../23_containers/mdspan/extents/ctor_copy.cc |  82 ++
 .../23_containers/mdspan/extents/ctor_ints.cc |  62 ++
 .../mdspan/extents/ctor_shape.cc  | 160 
 .../mdspan/extents/custom_integer.cc  |  87 ++
 .../23_containers/mdspan/extents/misc.cc  | 224 +
 .../mdspan/layouts/class_mandate_neg.cc   |  23 +
 .../23_containers/mdspan/layouts/ctors.cc | 301 ++
 .../23_containers/mdspan/layouts/mapping.cc   | 431 +
 .../23_containers/mdspan/layouts/stride.cc| 359 
 18 files changed, 2627 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/include/std/mdspan
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

-- 
2.49.0

[PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Luc Grosheintz

This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This
precondition is not checked at runtime.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

PR libstdc++/107761

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): New class.
* src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 262 +++
 libstdc++-v3/src/c++23/std.cc.in |   6 +-
 2 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 4094a416d1e..39ced1d6301 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,12 @@
 #pragma GCC system_header
 #endif
 
+#include 
+#include 
+#include 
+#include 
+#include 
+
 #define __glibcxx_want_mdspan
 #include 
 
@@ -41,6 +47,262 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+template
+  class _ExtentsStorage
+  {
+  public:
+   static consteval bool
+   _S_is_dyn(size_t __ext) noexcept
+   { return __ext == dynamic_extent; }
+
+   template
+ static constexpr _IndexType
+ _S_int_cast(const _OIndexType& __other) noexcept
+ { return _IndexType(__other); }
+
+   static constexpr size_t _S_rank = _Extents.size();
+
+   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
+   // of dynamic extents up to (and not including) __r.
+   //
+   // If __r is the index of a dynamic extent, then
+   // _S_dynamic_index[__r] is the index of that extent in
+   // _M_dynamic_extents.
+   static constexpr auto _S_dynamic_index = [] consteval
+   {
+ array __ret;
+ size_t __dyn = 0;
+ for(size_t __i = 0; __i < _S_rank; ++__i)
+   {
+ __ret[__i] = __dyn;
+ __dyn += _S_is_dyn(_Extents[__i]);
+   }
+ __ret[_S_rank] = __dyn;
+ return __ret;
+   }();
+
+   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
+
+   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
+   // index of the __r-th dynamic extent in _Extents.
+   static constexpr auto _S_dynamic_index_inv = [] consteval
+   {
+ array __ret;
+ for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
+   if (_S_is_dyn(_Extents[__i]))
+ __ret[__r++] = __i;
+ return __ret;
+   }();
+
+   static constexpr size_t
+   _S_static_extent(size_t __r) noexcept
+   { return _Extents[__r]; }
+
+   constexpr _IndexType
+   _M_extent(size_t __r) const noexcept
+   {
+ auto __se = _Extents[__r];
+ if (__se == dynamic_extent)
+   return _M_dynamic_extents[_S_dynamic_index[__r]];
+ else
+   return __se;
+   }
+
+   template
+ constexpr void
+ _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+ {
+   size_t __di = __i;
+   if constexpr (_OtherRank != _S_rank_dynamic)
+ __di = _S_dynamic_index_inv[__i];
+   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
+ }
+ }
+
+   constexpr
+   _ExtentsStorage() noexcept = default;
+
+   template
+ constexpr
+ _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
+ __other) noexcept
+ {
+   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
+ { return __other._M_extent(__i); });
+ }
+
+   template
+ constexpr
+ _ExtentsStorage(span __exts) noexcept
+ {
+   _M_init_dynamic_extents<_Nm>(
+ [&__exts](size_t __i) -> const _OIndexType&
+ { return __exts[__i]; });
+ }
+
+  private:
+   using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
+   [[no_unique_address]] _S_storage _M_dynamic_extents;
+  };
+
+template
+  concept __valid_index_type =
+   is_convertible_v<_OIndexType, _SIndexType> &&
+   is_nothrow_constructible_v<_SIndexType, _OIndexType>;
+
+template
+  concept
+  __valid_static_extent = _Extent == dynamic_extent
+   || _Extent <= numeric_limits<_IndexType>::max();
+  }
+
+  template
+class extents
+{
+  static_assert(is_integral_v<_IndexType>, "_IndexType must be integral.");
+  static_assert(
+ (_

[PATCH v5 08/10] libstdc++: Add tests for layout_right.

2025-04-29 Thread Luc Grosheintz

Adds tests for layout_right and for the parts of layout_left that depend
on layout_right.

libstdc++/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add tests for
layout_right and the interaction with layout_left.
* testsuite/23_containers/mdspan/layouts/mapping.cc: ditto.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  1 +
 .../23_containers/mdspan/layouts/ctors.cc | 90 +++
 .../23_containers/mdspan/layouts/mapping.cc   | 20 +
 3 files changed, 111 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index 2fd27d0bd35..fdebda8bd06 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -17,5 +17,6 @@ template
   };
 
 A a_left; // { dg-error "required from" }
+A a_right;   // { dg-error "required from" }
 
 // { dg-prune-output "not representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index c7cf5501628..5c54b440083 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -116,6 +116,92 @@ namespace from_same_layout
 }
 }
 
+// ctor: mapping(layout_{right,left}::mapping)
+namespace from_left_or_right
+{
+  template
+constexpr void
+test_constructible()
+{
+  static_assert(!std::is_constructible_v<
+ typename SLayout::mapping>,
+ typename OLayout::mapping>>);
+
+  static_assert(!std::is_constructible_v<
+ typename SLayout::mapping>,
+ typename OLayout::mapping>>);
+
+  static_assert(!std::is_constructible_v<
+ typename SLayout::mapping>,
+ typename OLayout::mapping>>);
+}
+
+  template
+constexpr void
+test_convertible()
+{
+  constexpr bool expected = std::is_convertible_v<
+   typename OMapping::extents_type, typename SMapping::extents_type>;
+  static_assert(std::is_convertible_v == expected);
+}
+
+  template
+constexpr void
+test_convertible_all()
+{
+  test_convertible<
+ typename OLayout::mapping>,
+ typename SLayout::mapping>>();
+
+  test_convertible<
+ typename OLayout::mapping>,
+ typename SLayout::mapping>>();
+
+  test_convertible<
+ typename OLayout::mapping>,
+ typename SLayout::mapping>>();
+
+  test_convertible<
+ typename OLayout::mapping>,
+ typename SLayout::mapping>>();
+
+  test_convertible<
+ typename OLayout::mapping>,
+ typename SLayout::mapping>>();
+}
+
+  template
+constexpr bool
+test_ctor(Extents exts)
+{
+  typename OLayout::mapping m1(exts);
+  typename SLayout::mapping m2(m1);
+  VERIFY(m1.extents() == m2.extents());
+  return true;
+}
+
+  template
+constexpr bool
+test_ctor_all()
+{
+  test_ctor(std::extents{});
+  test_ctor(std::extents{});
+  test_ctor(std::extents{3});
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_constructible();
+  test_convertible_all();
+
+  test_ctor_all();
+  static_assert(test_ctor_all());
+}
+}
+
 template
 constexpr void
 test_all()
@@ -128,5 +214,9 @@ int
 main()
 {
   test_all();
+  test_all();
+
+  from_left_or_right::test_all();
+  from_left_or_right::test_all();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
index 1d7d4c4e1c7..f59dd75d01f 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
@@ -239,6 +239,15 @@ template<>
 VERIFY(m.stride(1) == 3);
   }
 
+template<>
+  constexpr void
+  test_stride_2d()
+  {
+std::layout_right::mapping> m;
+VERIFY(m.stride(0) == 5);
+VERIFY(m.stride(1) == 1);
+  }
+
 template
   constexpr void
   test_stride_3d();
@@ -253,6 +262,16 @@ template<>
 VERIFY(m.stride(2) == 3*5);
   }
 
+template<>
+  constexpr void
+  test_stride_3d()
+  {
+std::layout_right::mapping m(std::dextents(3, 5, 7));
+VERIFY(m.stride(0) == 35);
+VERIFY(m.stride(1) == 7);
+VERIFY(m.stride(2) == 1);
+  }
+
 template
   constexpr bool
   test_stride_all()
@@ -367,5 +386,6 @@ int
 main()
 {
   test_all();
+  test_all();
   return 0;
 }
-- 
2.49.0

[PATCH v5 04/10] libstdc++: Add tests for std::extents.

2025-04-29 Thread Luc Grosheintz

A prior commit added std::extents, this commit adds the tests. The bulk
is focussed on testing the constructors. These are split into three
groups:

1. the ctor from other extents and the copy ctor,
2. the ctor from a pack of integer-like objects,
3. the ctor from shapes, i.e. span and array.

For each group check that the ctor:
* produces an object with the expected values for extent,
* is implicit if and only if required,
* is constexpr,
* doesn't change the rank of the extent.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_shape.cc: New test.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: New test.
* testsuite/23_containers/mdspan/extents/misc.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/extents/class_mandates_neg.cc  |   8 +
 .../23_containers/mdspan/extents/ctor_copy.cc |  82 +++
 .../23_containers/mdspan/extents/ctor_ints.cc |  62 +
 .../mdspan/extents/ctor_shape.cc  | 160 +
 .../mdspan/extents/custom_integer.cc  |  87 +++
 .../23_containers/mdspan/extents/misc.cc  | 224 ++
 6 files changed, 623 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
new file mode 100644
index 000..b654e3920a8
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++23 } }
+#include
+
+std::extents e1; // { dg-error "from here" }
+std::extents e2;// { dg-error "from here" }
+// { dg-prune-output "dynamic or representable as _IndexType" }
+// { dg-prune-output "must be integral" }
+// { dg-prune-output "invalid use of incomplete type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
new file mode 100644
index 000..a7b3a169301
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
@@ -0,0 +1,82 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+// Test the copy ctor and the ctor from other extents.
+
+constexpr auto dyn = std::dynamic_extent;
+
+// Not constructible
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+// Nothrow constructible
+static_assert(std::is_nothrow_constructible_v,
+ std::extents>);
+static_assert(std::is_nothrow_constructible_v,
+ std::extents>);
+
+// Implicit conversion
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+template
+  constexpr void
+  test_ctor(const Other& other)
+  {
+auto e = std::extents(other);
+VERIFY(e == other);
+  }
+
+constexpr int
+test_all()
+{
+  auto e0 = std::extents();
+  test_ctor(e0);
+
+  auto e1 = std::extents();
+  test_ctor(e1);
+  test_ctor(e1);
+  test_ctor(e1);
+
+  auto e2 = std::extents{1, 2, 3};
+  test_ctor(e2);
+  test_ctor(e2);
+  test_ctor(e2);
+  return true;
+}
+
+int
+main()
+{
+  test_all();
+  static_assert(test_all());
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/23_containers

[PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-04-29 Thread Luc Grosheintz

Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++/ChangeLog:

* include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 179 
 1 file changed, 179 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace __mdspan
   {
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __fwd = 1;
+   for(size_t __i = 0; __i < __r; ++__i)
+ __fwd *= __exts.extent(__i);
+   return __fwd;
+  }
+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __rev = 1;
+   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+ __rev *= __exts.extent(__i);
+   return __rev;
+  }
+
 template
   auto __build_dextents_type(integer_sequence)
-> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 explicit extents(_Integrals...) ->
   extents()...>;
 
+  struct layout_left
+  {
+template
+  class mapping;
+  };
+
+  namespace __mdspan
+  {
+template
+  constexpr bool __is_extents = false;
+
+template
+  constexpr bool __is_extents> = true;
+
+template
+struct _LinearIndexLeft
+{
+  template
+   static constexpr typename _Extents::index_type
+   _S_value(const _Extents& __exts, typename _Extents::index_type __idx,
+_Indices... __indices) noexcept
+   {
+ return __idx + __exts.extent(_Count)
+   * _LinearIndexLeft<_Count + 1>::_S_value(__exts, __indices...);
+   }
+
+  template
+   static constexpr typename _Extents::index_type
+   _S_value(const _Extents&) noexcept
+   { return 0; }
+};
+
+template
+  constexpr typename _Extents::index_type
+  __linear_index_left(const _Extents& __exts, _Indices... __indices)
+  {
+   return _LinearIndexLeft<0>::_S_value(__exts, __indices...);
+  }
+
+template
+  consteval bool
+  __is_representable_product(array<_Tp, _Nm> __factors)
+  {
+   size_t __rest = numeric_limits<_IndexType>::max();
+   for(size_t __i = 0; __i < _Nm; ++__i)
+   {
+ if (__factors[__i] == 0)
+   return true;
+ __rest /= _IndexType(__factors[__i]);
+   }
+   return __rest > 0;
+  }
+
+template
+  consteval array
+  __static_extents_array()
+  {
+   array __exts;
+   for(size_t __i = 0; __i < _Extents::rank(); ++__i)
+ __exts[__i] = _Extents::static_extent(__i);
+   return __exts;
+  }
+
+template
+  concept __representable_size = _Extents::rank_dynamic() != 0
+  || __is_representable_product<_IndexType>(
+ __static_extents_array<_Extents>());
+
+template
+  concept __layout_extent = __representable_size<
+   _Extents, typename _Extents::index_type>;
+  }
+
+  template
+class layout_left::mapping
+{
+  static_assert(__mdspan::__layout_extent<_Extents>,
+   "The size of extents_type is not representable as index_type.");
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_left;
+
+  constexpr
+  mapping() noexcept = default;
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  constexpr
+  mapping(const extents_type& __extents) noexcept
+  : _M_extents(__extents)
+  { }
+
+  template
+   requires (is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const mapping<_OExtents>& __other) noexcept
+   : _M_extents(__other.extents())
+   { }
+
+  constexpr mapping&
+  operator=(const mapping&) noexcept = default;
+
+  constexpr const extents_type&
+  extents() const noexcept { return _M_extents; }
+
+  constexpr index_type
+  required_span_size() const noexcept
+  { return __mdspan::__fwd_prod(_M_extents, _M_extents.rank()); }
+
+  template<__mdspan::__valid_index_type... _Indices>
+   requires (sizeof...(_Indices) == extents_type::rank())
+   constexpr index_type
+   operator()(_Indices... __indices) const noexcept
+   {
+ return __mdspan::__linear_index_left(
+   _M_extents, static_cast(__indices)...);
+   }
+
+  static constexpr bool
+  is_always_unique()

[PATCH] c++: UNBOUND_CLASS_TEMPLATE context substitution [PR119981]

2025-04-29 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/15/14?

-- >8 --

In r15-123 and r14-11434 we unconditionally set processing_template_decl
when substituting the context of an UNBOUND_CLASS_TEMPLATE, in order to
handle instantiation of the dependently scoped friend declaration

  template
  template
  friend class A::B;

where the scope A remains dependent after instantiation.  But this
turns out to misbehave for the UNBOUND_CLASS_TEMPLATE in the below
testcase

  g<[]{}>::template fn

since with the flag set substituting the args of test3 into the lambda
causes us to defer the substitution and yield a lambda that still looks
dependent, which in turn make g<[]{}> still dependent and not suitable
for qualified name lookup.

This patch restricts setting processing_template_decl during
UNBOUND_CLASS_TEMPLATE substitution to the case where there are multiple
levels of captured template parameters, as in the friend declaration.
(This means we need to substitute the template parameter list(s) first,
which makes sense since they lexically appear first.)

PR c++/119981
PR c++/119378

gcc/cp/ChangeLog:

* pt.cc (tsubst) : Substitute
into template parameter list first.  When substituting the
context, only set processing_template_decl if there's more
than one level of template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ15.C: New test.
---
 gcc/cp/pt.cc   | 20 +---
 gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C | 17 +
 2 files changed, 30 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e8d342f99f6d..26ed9de430c0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17181,18 +17181,24 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
 
 case UNBOUND_CLASS_TEMPLATE:
   {
-   ++processing_template_decl;
-   tree ctx = tsubst_entering_scope (TYPE_CONTEXT (t), args,
- complain, in_decl);
-   --processing_template_decl;
tree name = TYPE_IDENTIFIER (t);
+   if (name == error_mark_node)
+ return error_mark_node;
+
tree parm_list = DECL_TEMPLATE_PARMS (TYPE_NAME (t));
+   parm_list = tsubst_template_parms (parm_list, args, complain);
+   if (parm_list == error_mark_node)
+ return error_mark_node;
 
-   if (ctx == error_mark_node || name == error_mark_node)
+   if (parm_list && TMPL_PARMS_DEPTH (parm_list) > 1)
+ ++processing_template_decl;
+   tree ctx = tsubst_entering_scope (TYPE_CONTEXT (t), args,
+ complain, in_decl);
+   if (parm_list && TMPL_PARMS_DEPTH (parm_list) > 1)
+ --processing_template_decl;
+   if (ctx == error_mark_node)
  return error_mark_node;
 
-   if (parm_list)
- parm_list = tsubst_template_parms (parm_list, args, complain);
return make_unbound_class_template (ctx, name, parm_list, complain);
   }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C
new file mode 100644
index ..90160a52a6ef
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C
@@ -0,0 +1,17 @@
+// PR c++/119981
+// { dg-do compile { target c++20 } }
+
+template class P>
+struct mp_copy_if{};
+
+template
+struct g {
+  template struct fn{};
+};
+
+template
+void test3() {
+  mp_copy_if::template fn> b;
+}
+
+template void test3();
-- 
2.49.0.459.gf65182a99e

[PATCH v5 09/10] libstdc++: Implement layout_stride from mdspan.

2025-04-29 Thread Luc Grosheintz

Implements the remaining parts of layout_left and layout_right; and all
of layout_stride.

libstdc++/ChangeLog:

* include/std/mdspan(layout_stride): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 227 
 1 file changed, 227 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 583792b5269..344f89c4287 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -336,6 +336,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_stride
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -441,6 +447,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_extents(__other.extents())
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other)
+   : _M_extents(__other.extents())
+   { }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -567,6 +580,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_extents(__other.extents())
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
+   : _M_extents(__other.extents())
+   { }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -630,6 +650,213 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[no_unique_address]] _Extents _M_extents;
 };
 
+  namespace __mdspan
+  {
+template
+  concept __mapping_of =
+   is_same_v,
+ _Mapping>;
+
+template
+  concept __standardized_mapping = __mapping_of
+  || __mapping_of
+  || __mapping_of;
+
+template
+  concept __mapping_like = requires
+  {
+   requires __is_extents;
+   { M::is_always_strided() } -> same_as;
+   { M::is_always_exhaustive() } -> same_as;
+   { M::is_always_unique() } -> same_as;
+   bool_constant::value;
+   bool_constant::value;
+   bool_constant::value;
+  };
+
+template
+  constexpr typename _Mapping::index_type
+  __offset_impl(const _Mapping& __m, index_sequence<_Counts...>) noexcept
+  {
+   return __m(((void) _Counts, 0)...);
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __offset(const _Mapping& __m) noexcept
+  {
+   return __offset_impl(__m,
+   make_index_sequence<_Mapping::extents_type::rank()>());
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __linear_index_strides_impl(const _Mapping& __m,
+ index_sequence<_Counts...>,
+ _Indices... __indices)
+  {
+   return ((__indices * __m.stride(_Counts)) + ... + 0);
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __linear_index_strides(const _Mapping& __m,
+_Indices... __indices)
+  {
+return __linear_index_strides_impl(__m,
+  make_index_sequence<_Mapping::extents_type::rank()>(), __indices...);
+  }
+  }
+
+  template
+class layout_stride::mapping
+{
+  static_assert(__mdspan::__layout_extent<_Extents>,
+   "The size of extents_type is not representable as index_type.");
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_stride;
+
+  constexpr
+  mapping() noexcept
+  {
+   auto __stride = index_type(1);
+   for(size_t __i = extents_type::rank(); __i > 0; --__i)
+ {
+   _M_strides[__i - 1] = __stride;
+   __stride *= _M_extents.extent(__i - 1);
+ }
+  }
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  template<__mdspan::__valid_index_type _OIndexType>
+   constexpr
+   mapping(const extents_type& __exts,
+   span<_OIndexType, extents_type::rank()> __strides) noexcept
+   : _M_extents(__exts)
+   {
+ for(size_t __i = 0; __i < extents_type::rank(); ++__i)
+   _M_strides[__i] = index_type(as_const(__strides[__i]));
+   }
+
+  template<__mdspan::__valid_index_type _OIndexType>
+   constexpr
+   mapping(const extents_type& __exts,
+   const array<_OIndexType, extents_type::rank()>& __strides)
+   noexcept
+   : mapping(__exts,
+ span(__strides))
+   { }
+
+  template<__mdspan::__mapping_like _StridedMapping>
+   requires (is_constructible_v
+&& _StridedMapping::is_always_unique

[PATCH 1/1] Fix BZ 119317: named loops (C2y) with debug info

2025-04-29 Thread Christopher Bazley

Named loops (C2y) could not previously be compiled with
-O1 and -ggdb2 or higher because the label preceding
a loop (or switch) could not be found when using such
command lines.

This could be observed by compiling
gcc/gcc/testsuite/gcc.dg/c2y-named-loops-1.c with
the provoking command line (or any minimal example such
as that cited in the bug report).

The fix was simply to ignore the tree nodes inserted
for debugging information.

Base commit is ae4c22ab05501940e345ee799be3aa36ffa7269a

gcc/c/ChangeLog:

* c-decl.cc (c_get_loop_names): Do not prematurely
end the search for a label that names a loop or
switch statement upon encountering a DEBUG_BEGIN_STMT.
Instead, ignore any instances of DEBUG_BEGIN_STMT.
---
 gcc/c/c-decl.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index c778c7febfa..468b5e90a2f 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -13898,7 +13898,8 @@ c_get_loop_names (tree before_labels, bool switch_p, 
tree *last_p)
  ++ret;
}
}
-  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR)
+  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR &&
+  TREE_CODE (stmt) != DEBUG_BEGIN_STMT)
break;
 }
   if (last)
-- 
2.43.0

[PATCH v5 06/10] libstdc++: Add tests for layout_left.

2025-04-29 Thread Luc Grosheintz

Implements a suite of tests for the currently implemented parts of
layout_left. The individual tests are templated over the layout type, to
allow reuse as more layouts are added.

libstdc++/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: New test.
* testsuite/23_containers/mdspan/layouts/ctors.cc: New test.
* testsuite/23_containers/mdspan/layouts/mapping.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  21 +
 .../23_containers/mdspan/layouts/ctors.cc | 132 +++
 .../23_containers/mdspan/layouts/mapping.cc   | 371 ++
 3 files changed, 524 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
new file mode 100644
index 000..2fd27d0bd35
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++23 } }
+#include
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  struct A
+  {
+static constexpr size_t n = (size_t(1) << 7) - 1;
+
+typename Layout::mapping> m0;
+typename Layout::mapping> m1;
+typename Layout::mapping> m2;
+
+using extents_type = std::extents;
+typename Layout::mapping m3; // { dg-error "required from" }
+  };
+
+A a_left; // { dg-error "required from" }
+
+// { dg-prune-output "not representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
new file mode 100644
index 000..c7cf5501628
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -0,0 +1,132 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+// ctor: mapping(const extents&)
+namespace from_extents
+{
+  template
+constexpr bool
+is_implicit()
+{
+  return std::is_nothrow_constructible_v
+&& std::is_convertible_v;
+}
+
+  template
+constexpr void
+test_implicit()
+{
+  static_assert(is_implicit>,
+   std::extents>());
+
+  static_assert(!is_implicit<
+ typename Layout::mapping>,
+ std::extents>());
+}
+
+  template
+constexpr void
+test_ctor_extents(Extents exts)
+{
+  typename Layout::mapping m(exts);
+  VERIFY(m.extents() == exts);
+}
+
+  template
+constexpr bool
+test_ctor_extents_all()
+{
+  test_ctor_extents(std::extents());
+  test_ctor_extents(std::extents());
+  test_ctor_extents(std::extents(3));
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  static_assert(!std::is_constructible_v<
+ typename Layout::mapping>,
+ std::extents>);
+
+  static_assert(!std::is_constructible_v<
+ typename Layout::mapping>,
+ std::extents>);
+
+  test_implicit();
+  test_ctor_extents_all();
+  static_assert(test_ctor_extents_all());
+}
+}
+
+// ctor: mapping(mapping)
+namespace from_same_layout
+{
+  template
+constexpr bool
+test_ctor_mapping()
+{
+  std::extents e;
+  typename Layout::mapping> m1(e);
+  typename Layout::mapping> m2(m1);
+
+  VERIFY(m1.extents() == m2.extents());
+  return true;
+}
+
+  template
+constexpr void
+test_constructible()
+{
+  static_assert(!std::is_constructible_v<
+ typename Layout::mapping>,
+ typename Layout::mapping>>);
+
+  static_assert(std::is_nothrow_constructible_v<
+ typename Layout::mapping>,
+ typename Layout::mapping>>);
+}
+
+  template
+constexpr void
+test_convertible()
+{
+  static_assert(std::is_convertible_v<
+ typename Layout::mapping>,
+ typename Layout::mapping>>);
+
+  static_assert(!std::is_convertible_v<
+ typename Layout::mapping>,
+ typename Layout::mapping>>);
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_ctor_mapping();
+  static_assert(test_ctor_mapping());
+  test_constructible();
+  test_convertible();
+}
+}
+
+template
+constexpr void
+test_all()
+{
+  from_extents::test_all();
+  from_same_layout::test_all();
+}
+
+int
+main()
+{
+  test_all();
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
new file mode 100644
index 000..1d7d4c4e1c7
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
@@

[PATCH v5 10/10] libstdc++: Add tests for layout_stride.

2025-04-29 Thread Luc Grosheintz

Implements the tests for layout_stride and for the features of the other
two layouts that depend on layout_stride.

libstdc++/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add test for
layout_stride and the interaction with other layouts.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Ditto.
* testsuite/23_containers/mdspan/layouts/stride.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |   1 +
 .../23_containers/mdspan/layouts/ctors.cc |  79 
 .../23_containers/mdspan/layouts/mapping.cc   |  42 +-
 .../23_containers/mdspan/layouts/stride.cc| 359 ++
 4 files changed, 480 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index fdebda8bd06..f9fa6212d4d 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -18,5 +18,6 @@ template
 
 A a_left; // { dg-error "required from" }
 A a_right;   // { dg-error "required from" }
+A a_stride; // { dg-error "required from" }
 
 // { dg-prune-output "not representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index 5c54b440083..fc479ae596e 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -202,12 +202,91 @@ namespace from_left_or_right
 }
 }
 
+// ctor: mapping(layout_stride::mapping)
+namespace from_stride
+{
+  template
+constexpr void
+test_constructible()
+{
+  static_assert(!std::is_constructible_v<
+ typename Layout::mapping>,
+ std::layout_stride::mapping>>);
+
+  static_assert(!std::is_constructible_v<
+ typename Layout::mapping>,
+ std::layout_stride::mapping>>);
+
+  static_assert(std::is_constructible_v<
+ typename Layout::mapping>,
+ std::layout_stride::mapping>>);
+}
+
+  template
+constexpr void
+test_convertible()
+{
+  static_assert(!std::is_convertible_v<
+ std::layout_stride::mapping>,
+ typename Layout::mapping>>);
+
+  static_assert(std::is_convertible_v<
+ std::layout_stride::mapping>,
+ typename Layout::mapping>>);
+}
+
+  template
+constexpr std::array
+strides(Mapping m)
+{
+  std::array 
s;
+  if constexpr (Mapping::extents_type::rank() > 0)
+   for(size_t i = 0; i < Mapping::extents_type::rank(); ++i)
+ s[i] = m.stride(i);
+  return s;
+}
+
+  template
+constexpr bool
+test_ctor_from_stride(Extents exts)
+{
+  typename Layout::mapping m(exts);
+  std::layout_stride::mapping m1(exts, strides(m));
+  typename Layout::mapping m2(m1);
+  VERIFY(m1.extents() == m2.extents());
+  return true;
+}
+
+  template
+constexpr bool
+test_ctor_from_stride_all()
+{
+  test_ctor_from_stride(std::extents{});
+  test_ctor_from_stride(std::extents{});
+  test_ctor_from_stride(std::extents{3, 5, 7});
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_constructible();
+  test_convertible();
+
+  test_ctor_from_stride_all();
+  static_assert(test_ctor_from_stride_all());
+}
+}
+
 template
 constexpr void
 test_all()
 {
   from_extents::test_all();
   from_same_layout::test_all();
+  from_stride::test_all();
 }
 
 int
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
index f59dd75d01f..affa1be56d0 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
@@ -11,6 +11,7 @@ template
   {
 using M = typename Layout::mapping;
 static_assert(std::__mdspan::__is_extents);
+static_assert(std::__mdspan::__mapping_like);
 static_assert(std::copyable);
 static_assert(std::is_nothrow_move_constructible_v);
 static_assert(std::is_nothrow_move_assignable_v);
@@ -28,6 +29,8 @@ template
 
 static_assert(M::is_always_unique() && M::is_unique());
 static_assert(M::is_always_strided() && M::is_strided());
+if constexpr (!std::is_same_v)
+  static_assert(M::is_always_exhaustive() && M::is_exhaustive());
 return true;
   }
 
@@ -94,6 +97,20 @@ template
 { return Mapping(exts); }
   };
 
+template<>
+  struct Mapping3dFactory
+  {
+using Mapping = std::layout_stride::mapping>;
+
+static conste

Re: [PATCH] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-29 Thread Kito Cheng

Seems like the testcase will fail
https://github.com/ewlu/gcc-precommit-ci/issues/3278#issuecomment-2837806049

> diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
> b/gcc/testsuite/gcc.target/riscv/predef-19.c
> index 2b90702192b..b29e60f9b99 100644
> --- a/gcc/testsuite/gcc.target/riscv/predef-19.c
> +++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
> -misa-spec=2.2" } */
> +/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64d -mcmodel=medlow 
> -misa-spec=2.2" } */

I didn't look at the log yet, but I guess that because the abi is
LP64D without D since you use I rather than GC here.

>
>  int main () {
>
> @@ -15,28 +15,12 @@ int main () {
>  #error "__riscv_i"
>  #endif
>
> -#if !defined(__riscv_c)
> -#error "__riscv_c"
> -#endif
> -
>  #if defined(__riscv_e)
>  #error "__riscv_e"
>  #endif
>
> -#if !defined(__riscv_a)
> -#error "__riscv_a"
> -#endif
> -
> -#if !defined(__riscv_m)
> -#error "__riscv_m"
> -#endif
> -
> -#if !defined(__riscv_f)
> -#error "__riscv_f"
> -#endif
> -
> -#if !defined(__riscv_d)
> -#error "__riscv_d"
> +#if !defined(__riscv_zicsr)
> +#error "__riscv_zicsr"
>  #endif
>
>  #if defined(__riscv_v)
> --
> 2.49.0
>

[PATCH v6 1/2] RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.

2025-04-29 Thread Kito Cheng

From: yulong 

This version is same as v5, but rebase to trunk, send out to trigger CI.

This commit adds intrinsics support for Xsfvcp extension.
Diff with V4: Delete the sifive_vector.h file.

Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/ChangeLog:

* config/riscv/constraints.md (Ou01): New constraint.
(Ou02): Ditto.
* config/riscv/generic-vector-ooo.md (vec_sf_vcp): New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-c.cc (riscv_pragma_intrinsic): Add xsfvcp strings.
* config/riscv/riscv-vector-builtins-shapes.cc (struct sf_vcix_se_def): 
New function.
(struct sf_vcix_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_X2_U_OPS): New 
type.
(DEF_RVV_X2_WU_OPS): Ditto.
(vuint8mf8_t): Ditto.
(vuint8mf4_t): Ditto.
(vuint8mf2_t): Ditto.
(vuint8m1_t): Ditto.
(vuint8m2_t): Ditto.
(vuint8m4_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_X2_U_OPS): New 
builtins def.
(DEF_RVV_X2_WU_OPS): Ditto.
(rvv_arg_type_info::get_scalar_float_type): Ditto.
(function_instance::modifies_global_state_p): Ditto.
* config/riscv/riscv-vector-builtins.def (v_x): New base type.
(i): Ditto.
(v_i): Ditto.
(xv): Ditto.
(iv): Ditto.
(fv): Ditto.
(vvv): Ditto.
(xvv): Ditto.
(ivv): Ditto.
(fvv): Ditto.
(vvw): Ditto.
(xvw): Ditto.
(ivw): Ditto.
(fvw): Ditto.
(v_vv): Ditto.
(v_xv): Ditto.
(v_iv): Ditto.
(v_fv): Ditto.
(v_vvv): Ditto.
(v_xvv): Ditto.
(v_ivv): Ditto.
(v_fvv): Ditto.
(v_vvw): Ditto.
(v_xvw): Ditto.
(v_ivw): Ditto.
(v_fvw): Ditto.
(x2_vector): Ditto.
(scalar_float): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New 
extension.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct rvv_arg_type_info): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/sifive-vector-builtins-bases.cc (class sf_vc): New 
function.
(BASE): New base_name.
* config/riscv/sifive-vector-builtins-bases.h: New function_base.
* config/riscv/sifive-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS): New intrinsics def.
(sf_vc): Ditto.
* config/riscv/sifive-vector.md (@sf_vc_x_se): New RTL mode.
(@sf_vc_v_x_se): Ditto.
(@sf_vc_v_x): Ditto.
(@sf_vc_i_se): Ditto.
(@sf_vc_v_i_se): Ditto.
(@sf_vc_v_i): Ditto.
(@sf_vc_vv_se): Ditto.
(@sf_vc_v_vv_se): Ditto.
(@sf_vc_v_vv): Ditto.
(@sf_vc_xv_se): Ditto.
(@sf_vc_v_xv_se): Ditto.
(@sf_vc_v_xv): Ditto.
(@sf_vc_iv_se): Ditto.
(@sf_vc_v_iv_se): Ditto.
(@sf_vc_v_iv): Ditto.
(@sf_vc_fv_se): Ditto.
(@sf_vc_v_fv_se): Ditto.
(@sf_vc_v_fv): Ditto.
(@sf_vc_vvv_se): Ditto.
(@sf_vc_v_vvv_se): Ditto.
(@sf_vc_v_vvv): Ditto.
(@sf_vc_xvv_se): Ditto.
(@sf_vc_v_xvv_se): Ditto.
(@sf_vc_v_xvv): Ditto.
(@sf_vc_ivv_se): Ditto.
(@sf_vc_v_ivv_se): Ditto.
(@sf_vc_v_ivv): Ditto.
(@sf_vc_fvv_se): Ditto.
(@sf_vc_v_fvv_se): Ditto.
(@sf_vc_v_fvv): Ditto.
(@sf_vc_vvw_se): Ditto.
(@sf_vc_v_vvw_se): Ditto.
(@sf_vc_v_vvw): Ditto.
(@sf_vc_xvw_se): Ditto.
(@sf_vc_v_xvw_se): Ditto.
(@sf_vc_v_xvw): Ditto.
(@sf_vc_ivw_se): Ditto.
(@sf_vc_v_ivw_se): Ditto.
(@sf_vc_v_ivw): Ditto.
(@sf_vc_fvw_se): Ditto.
(@sf_vc_v_fvw_se): Ditto.
(@sf_vc_v_fvw): Ditto.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md: New vtype.

---
 gcc/config/riscv/constraints.md   |  10 +
 gcc/config/riscv/generic-vector-ooo.md|   4 +
 gcc/config/riscv/genrvv-type-indexer.cc   |   9 +
 gcc/config/riscv/riscv-c.cc   |   3 +-
 .../riscv/riscv-vector-builtins-shapes.cc |  48 +
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def |  40 +
 gcc/config/riscv/riscv-vector-builtins.cc | 362 +++-
 gcc/config/riscv/riscv-vector-builtins.def|  30 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   8 +
 gc

Re: [PATCH v5 02/10] libstdc++: Add header mdspan to the build-system.

2025-04-29 Thread Jonathan Wakely

On Tue, 29 Apr 2025 at 13:59, Luc Grosheintz  wrote:
>
> Creates a nearly empty header mdspan and adds it to the build-system and
> Doxygen config file.
>
> libstdc++-v3/ChangeLog:
>
> * doc/doxygen/user.cfg.in: Add .
> * include/Makefile.am: Ditto.
> * include/Makefile.in: Ditto.
> * include/precompiled/stdc++.h: Ditto.
> * include/std/mdspan: New file.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
>  libstdc++-v3/include/Makefile.am  |  1 +
>  libstdc++-v3/include/Makefile.in  |  1 +
>  libstdc++-v3/include/precompiled/stdc++.h |  1 +
>  libstdc++-v3/include/std/mdspan   | 48 +++
>  5 files changed, 52 insertions(+)
>  create mode 100644 libstdc++-v3/include/std/mdspan
>
> diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
> b/libstdc++-v3/doc/doxygen/user.cfg.in
> index 19ae67a67ba..e926c6707f6 100644
> --- a/libstdc++-v3/doc/doxygen/user.cfg.in
> +++ b/libstdc++-v3/doc/doxygen/user.cfg.in
> @@ -880,6 +880,7 @@ INPUT  = 
> @srcdir@/doc/doxygen/doxygroups.cc \
>   include/list \
>   include/locale \
>   include/map \
> + include/mdspan \
>   include/memory \
>   include/memory_resource \
>   include/mutex \
> diff --git a/libstdc++-v3/include/Makefile.am 
> b/libstdc++-v3/include/Makefile.am
> index 537774c2668..1140fa0dffd 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -38,6 +38,7 @@ std_freestanding = \
> ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> +   ${std_srcdir}/mdspan \
> ${std_srcdir}/memory \
> ${std_srcdir}/numbers \
> ${std_srcdir}/numeric \
> diff --git a/libstdc++-v3/include/Makefile.in 
> b/libstdc++-v3/include/Makefile.in
> index 7b96b2207f8..c96e981acd6 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -396,6 +396,7 @@ std_freestanding = \
> ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> +   ${std_srcdir}/mdspan \
> ${std_srcdir}/memory \
> ${std_srcdir}/numbers \
> ${std_srcdir}/numeric \
> diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
> b/libstdc++-v3/include/precompiled/stdc++.h
> index f4b312d9e47..e7d89c92704 100644
> --- a/libstdc++-v3/include/precompiled/stdc++.h
> +++ b/libstdc++-v3/include/precompiled/stdc++.h
> @@ -228,6 +228,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
> new file mode 100644
> index 000..4094a416d1e
> --- /dev/null
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -0,0 +1,48 @@
> +//  -*- C++ -*-
> +
> +// Copyright (C) 2025 Free Software Foundation, Inc.

I've just noticed that this file claims to be copyright FSF, but if
you're contributing under the https://gcc.gnu.org/dco.html terms
rather than via a copyright assignment to the FSF, then that's
incorrect.

Please see the  header for the DCO-compatible way to mention
that the header is covered by copyright without being overly precise.

Otherwise these patches look good and I'll start pushing them this
week - thanks!


> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// .
> +
> +/** @file mdspan
> + *  This is a Standard C++ Library header.
> + */
> +
> +#ifndef _GLIBCXX_MDSPAN
> +#define _GLIBCXX_MDSPAN 1
> +
> +#ifdef _GLIBCXX_SYSHDR
> +#pragma GCC system_header
> +#endif
> +
> +#define __glibcxx_want_mdspan
> +#include 
> +
> +#ifdef __glibcxx_mdspan
> +
> +namespace std _GLIBCXX_VISIBILITY(default)
> +{
> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
> +_GLIBCXX_END_NAMESPACE_VERSION
> +}
> +#endif
> +#endif
> --
> 2.49.0
>

[PATCH v6 2/2] RISC-V: Add intrinsics testcases for SiFive Xsfvcp extensions.

2025-04-29 Thread Kito Cheng

From: yulong 

This commit adds testcases for Xsfvcp.

Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xsfvector/sf_vc_f.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vc_i.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vc_v.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vc_x.c: New test.
---
 .../gcc.target/riscv/rvv/xsfvector/sf_vc_f.c  |  88 +++
 .../gcc.target/riscv/rvv/xsfvector/sf_vc_i.c  | 132 +
 .../gcc.target/riscv/rvv/xsfvector/sf_vc_v.c  | 107 ++
 .../gcc.target/riscv/rvv/xsfvector/sf_vc_x.c  | 138 ++
 4 files changed, 465 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_v.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_x.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
new file mode 100644
index 000..7667e56a4c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
@@ -0,0 +1,88 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_xsfvcp -mabi=lp64d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sifive_vector.h"
+
+typedef _Float16 float16_t;
+typedef float float32_t;
+typedef double float64_t;
+
+/*
+** test_sf_vc_v_fv_u16mf4:
+** ...
+** vsetivli\s+zero+,0+,e16+,mf4,ta,ma+
+** sf\.vc\.v\.fv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vuint16mf4_t test_sf_vc_v_fv_u16mf4(vuint16mf4_t vs2, float16_t fs1, size_t 
vl) {
+return __riscv_sf_vc_v_fv_u16mf4(1, vs2, fs1, vl);
+}
+
+/*
+** test_sf_vc_v_fv_se_u16mf4:
+** ...
+** vsetivli\s+zero+,0+,e16+,mf4,ta,ma+
+** sf\.vc\.v\.fv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vuint16mf4_t test_sf_vc_v_fv_se_u16mf4(vuint16mf4_t vs2, float16_t fs1, size_t 
vl) {
+return __riscv_sf_vc_v_fv_se_u16mf4(1, vs2, fs1, vl);
+}
+
+/*
+** test_sf_vc_fv_se_u16mf2:
+** ...
+** vsetivli\s+zero+,0+,e16+,mf2,ta,ma+
+** sf\.vc\.fv\t[0-9]+,[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+void test_sf_vc_fv_se_u16mf2(vuint16mf2_t vs2, float16_t fs1, size_t vl) {
+__riscv_sf_vc_fv_se_u16mf2(1, 3, vs2, fs1, vl);
+}
+
+/*
+** test_sf_vc_v_fvv_u16m1:
+** ...
+** vsetivli\s+zero+,0+,e16+,m1,ta,ma+
+** sf\.vc\.v\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vuint16m1_t test_sf_vc_v_fvv_u16m1(vuint16m1_t vd, vuint16m1_t vs2, float16_t 
fs1, size_t vl) {
+return __riscv_sf_vc_v_fvv_u16m1(1, vd, vs2, fs1, vl);
+}
+
+/*
+** test_sf_vc_v_fvv_se_u16m1:
+** ...
+** vsetivli\s+zero+,0+,e16+,m1,ta,ma+
+** sf\.vc\.v\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vuint16m1_t test_sf_vc_v_fvv_se_u16m1(vuint16m1_t vd, vuint16m1_t vs2, 
float16_t fs1, size_t vl) {
+return __riscv_sf_vc_v_fvv_se_u16m1(1, vd, vs2, fs1, vl);
+}
+
+/*
+** test_sf_vc_fvv_se_u32m8:
+** ...
+** vsetivli\s+zero+,0+,e32+,m8,ta,ma+
+** sf\.vc\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+void test_sf_vc_fvv_se_u32m8(vuint32m8_t vd, vuint32m8_t vs2, float32_t fs1, 
size_t vl) {
+__riscv_sf_vc_fvv_se_u32m8(1, vd, vs2, fs1, vl);
+}
+
+
+/*
+** test_sf_vc_fvw_se_u32m2:
+** ...
+** vsetivli\s+zero+,0+,e32+,m2,ta,ma+
+** sf\.vc\.fvw\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+void test_sf_vc_fvw_se_u32m2(vuint64m4_t vd, vuint32m2_t vs2, float32_t fs1, 
size_t vl) {
+__riscv_sf_vc_fvw_se_u32m2(1, vd, vs2, fs1, vl);
+}
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
new file mode 100644
index 000..5528cc52ac7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
@@ -0,0 +1,132 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_xsfvcp -mabi=lp64d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sifive_vector.h"
+
+
+/*
+** test_sf_vc_v_i_u16m4:
+** ...
+** vsetivli\s+zero+,0+,e16+,m4,ta,ma+
+** sf\.vc\.v\.i\t[0-9]+,[0-9]+,v[0-9]+,[0-9]+
+** ...
+*/
+vuint16m4_t test_sf_vc_v_i_u16m4(size_t vl) {
+return __riscv_sf_vc_v_i_u16m4(1, 2, 4, vl);
+}
+
+/*
+** test_sf_vc_v_i_se_u16m4:
+** ...
+** vsetivli\s+zero+,0+,e16+,m4,ta,ma+
+** sf\.vc\.v\.i\t[0-9]+,[0-9]+,v[0-9]+,[0-9]+
+** ...
+*/
+vuint16m4_t test_sf_vc_v_i_se_u16m4(size_t vl) {
+return __riscv_sf_vc_v_i_se_u16m4(1, 2, 4, vl);
+}
+
+/*
+** test_sf_vc_i_se_u16mf4:
+** ...
+** vsetivli\s+zero+,0+,e16+,mf4,ta,ma+
+** sf\.vc\.i\t[0-9]+,[0-9]+,[0-9]+,[0-9]+
+** ...
+*/
+void test_sf_vc_i_se_u16mf4(size_t vl) {
+__riscv_sf_vc_i_se_u16mf4(1, 2, 3, 4, vl);
+}
+
+/*
+** test_sf_vc_v_iv_u32m2:
+** ...
+** vsetivli\s+zero+,0+,e32+,m2,ta,ma+
+** sf\.vc\.v\.iv\t[0-9]+,v[0-9]+,v[0-9]+,[0-9]+
+** ...
+*/
+vuint32m2_t test_sf_vc_v_iv_u32m2(vuint32m2_t vs2, size_t vl) {
+re

RE: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-29 Thread Li, Pan2

I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I 
have a try testing.  For example with below change:

+   switch (rcode)
+   {
+ case VEC_DUPLICATE:
+   *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+   break;
+ case PLUS:
+   {
+   rtx op_0 = XEXP (x, 0); 
+   rtx op_1 = XEXP (x, 1);
+   if (GET_CODE (op_0) == VEC_DUPLICATE
+   || GET_CODE (op_1) == VEC_DUPLICATE)
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+   else
+ *total = COSTS_N_INSNS (1);
+   break;
+   }
+ default:
+   *total = COSTS_N_INSNS (1);
+   break;
+   }
+
+   return true;

For case_0, GR2VR is 0, we will have late-combine as blow:
  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
  53   │ into:
  54   │18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
  55   │   REG_DEAD r146:RVVM1SI
  56   │ successfully matched this instruction to *add_vx_rvvm1si:
  57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ 
x ]) 0))
  59   │ (reg:RVVM1SI 146)))
  60   │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 
(weighted: 64.727273); rejecting replacement

For case_0, GR2VR is 1, we will have late-combine as blow:
  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
  53   │ into:
  54   │18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
  55   │   REG_DEAD r146:RVVM1SI
  56   │ successfully matched this instruction to *add_vx_rvvm1si:
  57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ 
x ]) 0))
  59   │ (reg:RVVM1SI 146)))
  60   │ original cost = 12 + 4 (weighted: 43.043637), replacement cost = 12 
(weighted: 97.090910); rejecting replacement

For case_0, GR2VR is 2, we will have late-combine as blow:
  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
  53   │ into:
  54   │18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
  55   │   REG_DEAD r146:RVVM1SI
  56   │ successfully matched this instruction to *add_vx_rvvm1si:
  57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ 
x ]) 0))
  59   │ (reg:RVVM1SI 146)))
  60   │ original cost = 16 + 4 (weighted: 46.603637), replacement cost = 16 
(weighted: 129.454546); rejecting replacement

The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also 
confused about the how we calculate the
vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have 
define_insn_and_split to match the pattern and
emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec 
== vadd.vx.
Please help to correct me if misunderstanding.

- cut line -

> Could you give a short example where that doesn't work so I can have a quick 
> look myself?

Sorry for misleading, it is not something doesn't work, just would like to make 
it clean before next step as it
is quit complicate up to a point.
However, I use below example for testing, case_0 for only vadd.vv in loop, 
while case_1 for vmv and vadd.vv in loop.

   1   │ #include 
   2   │
   3   │ #define T int32_t
   4   │ #define OP +
   5   │
   6   │ void
   7   │ test_vx_binary_case_0 (T * restrict out, T * restrict in, T x, 
unsigned n)
   8   │ {
   9   │   for (unsigned i = 0; i < n; i++)
  10   │ out[i] = in[i] OP x;
  11   │ }
  12   │
  13   │ void
  14   │ test_vx_binary_case_1 (T * restrict out, T * restrict a, T b, unsigned 
n)
  15   │ {
  16   │   unsigned k = 0;
  17   │
  18   │   T xb = b + 3;
  19   │
  20   │   while (k < n) {
  21   │ out[k + 0] = a[k + 0] OP xb;
  22   │ out[k + 1] = a[k + 1] OP xb;
  23   │ out[k + 2] = a[k + 2] OP xb;
  24   │ out[k + 3] = a[k + 3] OP xb;
  25   │ k += 4;
  26   │
  27   │ xb = xb ^ 0x3f;
  28   │
  29   │ out[k + 0] = a[k + 0] OP xb;
  30   │ out[k + 1] = a[k + 1] OP xb;
  31   │ out[k + 2] = a[k + 2] OP xb;
  32   │ out[k + 3] = a[k + 3] OP xb;
  33   │ k += 4;
  34   │   }
  35   │ }

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, April 29, 2025 2:31 PM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

> But this is not that good enough here if my understanding is correct.
> As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR,

But it should.  Can't we do something like:

  if (riscv_v_ext_mode_p (mode))
{

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
> > > >
> > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to return
> > > > true, all integer arguments smaller than int are passed as int:
> > > >
> > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > extern int baz (char c1);
> > > >
> > > > int
> > > > foo (char c1)
> > > > {
> > > >   return baz (c1);
> > > > }
> > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > .file "x.c"
> > > > .text
> > > > .p2align 4
> > > > .globl foo
> > > > .type foo, @function
> > > > foo:
> > > > .LFB0:
> > > > .cfi_startproc
> > > > movsbl 4(%esp), %eax
> > > > movl %eax, 4(%esp)
> > > > jmp baz
> > > > .cfi_endproc
> > > > .LFE0:
> > > > .size foo, .-foo
> > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > .section .note.GNU-stack,"",@progbits
> > > > [hjl@gnu-tgl-3 pr14907]$
> > > >
> > > > But integer promotion:
> > > >
> > > > movsbl 4(%esp), %eax
> > > > movl %eax, 4(%esp)
> > > >
> > > > isn't necessary if incoming arguments are copied to outgoing arguments
> > > > directly.
> > > >
> > > > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, 
> > > > defaulting
> > > > to return nullptr.  If the new target hook returns non-nullptr, use it 
> > > > to
> > > > get the outgoing small integer argument.  The x86 target hook returns 
> > > > the
> > > > value of the corresponding incoming argument as int if it can be used as
> > > > the outgoing argument.  If callee is a global function, we always 
> > > > properly
> > > > extend the incoming small integer arguments in callee.  If callee is a
> > > > local function, since DECL_ARG_TYPE has the original small integer type,
> > > > we will extend the incoming small integer arguments in callee if needed.
> > > > It is safe only if
> > > >
> > > > 1. Caller and callee are not nested functions.
> > > > 2. Caller and callee use the same ABI.
> > >
> > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > should apply to all of them, no?
> >
> > When the arguments are passed in different registers in different ABIs,
> > we have to copy them anyway.
>
> But optimization can elide copies easily, but not easily elide
> sign-/zero-extensions.

What I meant was that caller and callee have different ABIs.
Optimizer can't elide copies since incoming arguments and outgoing
arguments are in different registers.  They have to be moved.

> > >
> > > > 3. The incoming argument and the outgoing argument are in the same
> > > > location.
> > >
> > > Why's that?  Can't we move them but still elide the sign-/zero-extension?
> >
> > If they aren't in the same locations, we have to move them anyway.
> > This patch tries to avoid necessary moves of incoming arguments to
> > outgoing arguments.
>
> That's not exactly how you presented it, but you convenitently used
> x86 stack argument passing.  That might be difficult to elide, but is
> also uncommon for "small integer types" - does the same issue not
> apply to other arguments passed on the stack as well?

It applies to both passing in registers and on stack.   It is an issue only
for small integer types due to sign-/zero-extensions at call sites.  My
patch elides sign-/zero-extensions when incoming arguments and outgoing
arguments are unchanged in the exactly same location, in register or on stack.

>
> > > > 4. The incoming argument is unchanged before call expansion.
> > >
> > > Obviously, but then IMO this reveals an issue with the design of a target 
> > > hook
> > > returning the argument register - it returns a place rather than a
> > > value.  Wha'ts
> >
> > We need the place so that we can avoid meaningless copy.
> >
> > > the limitation of implementing this without help of the target?
> >
> > Middle-end may not know what is safe and not safe, for example, we
> > can skip the hidden argument SUBREG for x32.
> >
> > > Richard.
> > >
> > > > Otherwise, using the incoming argument as the outgoing argument may 
> > > > change
> > > > values of other incoming arguments or the wrong outgoing argument value
> > > > may be used.
> > > >
> > > > gcc/
> > > >
> > > > PR middle-end/14907
> > > > * calls.cc (arg_data): Add small_integer_argument_value.
> > > > (precompute_register_parameters): Set args[i].value to
> > > > args[i].small_integer_argument_value if not nullptr.
> > > > (initialize_argument_information): Set
> > > > args[i].small_integer_argument_value to
> > > > TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE.
> > > > (store_one_arg): Set arg->value to arg->small_integer_argument_value
> > > > if not nullptr.
> > > > * target.def (get_small_integer_argument_value): New for calls.
> > > > * targhooks.cc (default_get_small_integer_argument_value): New.
> > > > * targhooks.h (default_get_small_integer_argument_

Re: [PATCH] RISC-V: Fix register move cost for SIBCALL_REGS/JALR_REGS

2025-04-29 Thread Kito Cheng

LGTM, and pushed to the trunk :)

On Mon, Apr 28, 2025 at 10:04 AM 曾治金  wrote:
>
> Hi, according to Jeff's requirement 
> (https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681864.html), I divide 
> the change of riscv_register_move_cost into separate patch. Please help to 
> review. Thanks.
>
> Zhijin
>
> From b4c581393e864619192034bd8000c7e89443c19a Mon Sep 17 00:00:00 2001
> From: Zhijin Zeng 
> Date: Mon, 28 Apr 2025 09:24:16 +0800
> Subject: [PATCH] RISC-V: Fix register move cost for SIBCALL_REGS/JALR_REGS
>
> SIBCALL_REGS/JALR_REGS are also subset of GR_REGS and need to
> be taken into acount in riscv_register_move_cost, otherwise it
> will get a incorrect cost.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_register_move_cost):
> ---
>  gcc/config/riscv/riscv.cc | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index bad59e248d0..c53e0dd7a7d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -9650,10 +9650,10 @@ int
>  riscv_register_move_cost (machine_mode mode,
>   reg_class_t from, reg_class_t to)
>  {
> -  bool from_is_fpr = from == FP_REGS || from == RVC_FP_REGS;
> -  bool from_is_gpr = from == GR_REGS || from == RVC_GR_REGS;
> -  bool to_is_fpr = to == FP_REGS || to == RVC_FP_REGS;
> -  bool to_is_gpr = to == GR_REGS || to == RVC_GR_REGS;
> +  bool from_is_fpr = reg_class_subset_p (from, FP_REGS);
> +  bool from_is_gpr = reg_class_subset_p (from, GR_REGS);
> +  bool to_is_fpr = reg_class_subset_p (to, FP_REGS);
> +  bool to_is_gpr = reg_class_subset_p (to, GR_REGS);
>if ((from_is_fpr && to_is_gpr) || (from_is_gpr && to_is_fpr))
>  return tune_param->fmv_cost;
>
> --
> 2.25.1

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Jonathan Wakely

On Tue, 29 Apr 2025 at 13:54, Luc Grosheintz  wrote:
>
> This implements std::extents from  according to N4950 and
> contains partial progress towards PR107761.
>
> If an extent changes its type, there's a precondition in the standard,
> that the value is representable in the target integer type. This
> precondition is not checked at runtime.
>
> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> For extents this precondition is always violated and results in
> calling __builtin_trap. For all other specializations it's checked via
> __glibcxx_assert.
>
> PR libstdc++/107761
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (extents): New class.
> * src/c++23/std.cc.in: Add 'using std::extents'.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan  | 262 +++
>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>  2 files changed, 267 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
> index 4094a416d1e..39ced1d6301 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -33,6 +33,12 @@
>  #pragma GCC system_header
>  #endif
>
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
>  #define __glibcxx_want_mdspan
>  #include 
>
> @@ -41,6 +47,262 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +  namespace __mdspan
> +  {
> +template
> +  class _ExtentsStorage
> +  {
> +  public:
> +   static consteval bool
> +   _S_is_dyn(size_t __ext) noexcept
> +   { return __ext == dynamic_extent; }
> +
> +   template
> + static constexpr _IndexType
> + _S_int_cast(const _OIndexType& __other) noexcept
> + { return _IndexType(__other); }
> +
> +   static constexpr size_t _S_rank = _Extents.size();
> +
> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> +   // of dynamic extents up to (and not including) __r.
> +   //
> +   // If __r is the index of a dynamic extent, then
> +   // _S_dynamic_index[__r] is the index of that extent in
> +   // _M_dynamic_extents.
> +   static constexpr auto _S_dynamic_index = [] consteval
> +   {
> + array __ret;
> + size_t __dyn = 0;
> + for(size_t __i = 0; __i < _S_rank; ++__i)
> +   {
> + __ret[__i] = __dyn;
> + __dyn += _S_is_dyn(_Extents[__i]);
> +   }
> + __ret[_S_rank] = __dyn;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
> +
> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
> +   // index of the __r-th dynamic extent in _Extents.
> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> +   {
> + array __ret;
> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> +   if (_S_is_dyn(_Extents[__i]))
> + __ret[__r++] = __i;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t
> +   _S_static_extent(size_t __r) noexcept
> +   { return _Extents[__r]; }
> +
> +   constexpr _IndexType
> +   _M_extent(size_t __r) const noexcept
> +   {
> + auto __se = _Extents[__r];
> + if (__se == dynamic_extent)
> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> + else
> +   return __se;
> +   }
> +
> +   template
> + constexpr void
> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> + {
> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> + {
> +   size_t __di = __i;
> +   if constexpr (_OtherRank != _S_rank_dynamic)
> + __di = _S_dynamic_index_inv[__i];
> +   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
> + }
> + }
> +
> +   constexpr
> +   _ExtentsStorage() noexcept = default;
> +
> +   template
> + constexpr
> + _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
> + __other) noexcept
> + {
> +   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
> + { return __other._M_extent(__i); });
> + }
> +
> +   template
> + constexpr
> + _ExtentsStorage(span __exts) noexcept
> + {
> +   _M_init_dynamic_extents<_Nm>(
> + [&__exts](size_t __i) -> const _OIndexType&
> + { return __exts[__i]; });
> + }
> +
> +  private:
> +   using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
> +   [[no_unique_address]] _S_storage _M_dynamic_extents;
> +  };
> +
> +template
> +  concept __valid_index_type =
> +   is_convertible_v<_OIndexType, _SIndexType> &&
> +   is_nothrow_

Re: [PATCH] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-04-29 Thread LIU Hao


在 2025-4-29 13:03, LIU Hao 写道:
This fixes a long-standing issue that GCC used to assume 16-byte stack alignment on i686-w64-mingw32, 
which is not always the case for callbacks from system libraries.




CC Zeb Figura

This patch looks a bit risky. The overall effect of `__attribute__((__force_align_arg_pointer__))` seems 
to be that it realigns ESP to the _preferred_ alignment, which is changed by this patch.


If we change it to 4 bytes, then incoming alignment == preferred alignment == 4, then suddenly, 
`__attribute__((__force_align_arg_pointer__))` no longer has an effect, which may break code that it to 
fix stack alignment for SSE.


Maybe we shouldn't apply this patch. This leaves 8-byte alignment unresolved 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c17). But does it harm in practice? `double` on 
x86-32 requires only 4-byte alignment despite performance pitfalls. It might be an issue if an atomic 
int64 is stored on the stack and is accessed with CMPXCHG8B.






9005-i386-cygming-Decrease-default-preferred-stack-bounda.patch

 From 1b92f8105dbece1694dd3ab398cfb5e3ce2c15d9 Mon Sep 17 00:00:00 2001
From: LIU Hao
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
  32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

Reference:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9
Signed-off-by: LIU Hao

gcc/ChangeLog:

PR 07
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
---
  gcc/config/i386/cygming.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..b8c396d35793 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,6 +28,10 @@ along with GCC; see the file COPYING3.  If not see
  #undef TARGET_SEH
  #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)
  
+#undef PREFERRED_STACK_BOUNDARY_DEFAULT

+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY)
+
  /* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
 Force the use of different mechanisms to allocate aligned local data.  */
  #undef MAX_STACK_ALIGNMENT
-- 2.49.0




--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature

[PATCH] tree-optimization/119997 - &ptr->field no longer subject to PRE

2025-04-29 Thread Richard Biener

The following makes PRE handle &ptr->field the same as VN by
treating it as a POINTER_PLUS_EXPR when possible and thus as
'nary'.  To facilitate this the patch splits out vn_pp_nary_for_addr
and adds const overloads for vec::last.  The patch also avoids
handling an effective zero offset as POINTER_PLUS_EXPR.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/119997
* vec.h (vec::last): Provide const overload.
(vec::last): Likewise.
* tree-ssa-sccvn.h (vn_pp_nary_for_addr): Declare.
* tree-ssa-sccvn.cc (vn_pp_nary_for_addr): Split out from ...
(vn_reference_lookup): ... here.
(vn_reference_insert): ... and duplicate here.  Do not handle
zero offset as POINTER_PLUS_EXPR.
* tree-ssa-pre.cc (compute_avail): Implement
ADDR_EXPR-as-POINTER_PLUS_EXPR special casing.

* gcc.dg/tree-ssa/ssa-pre-35.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-35.c | 15 
 gcc/tree-ssa-pre.cc| 27 
 gcc/tree-ssa-sccvn.cc  | 81 +++---
 gcc/tree-ssa-sccvn.h   |  1 +
 gcc/vec.h  | 11 +++
 5 files changed, 93 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-35.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-35.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-35.c
new file mode 100644
index 000..1b49445fc3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-35.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+
+void bar (int *);
+
+struct X { int a[2]; };
+void foo (struct X *p, int b)
+{
+  if (b)
+bar ((int *)p + 1);
+  bar (&p->a[1]);
+}
+
+/* We should PRE and hoist &p->a[1] as (int *)p + 1.  */
+/* { dg-final { scan-tree-dump "HOIST inserted: 1" "pre" } } */
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index ecf45d29e76..f6c531e4892 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -4133,6 +4133,33 @@ compute_avail (function *fun)
  vec operands
= vn_reference_operands_for_lookup (rhs1);
  vn_reference_t ref;
+
+ /* We handle &MEM[ptr + 5].b[1].c as
+POINTER_PLUS_EXPR.  */
+ if (operands[0].opcode == ADDR_EXPR
+ && operands.last ().opcode == SSA_NAME)
+   {
+ tree ops[2];
+ if (vn_pp_nary_for_addr (operands, ops))
+   {
+ vn_nary_op_t nary;
+ vn_nary_op_lookup_pieces (2, POINTER_PLUS_EXPR,
+   TREE_TYPE (rhs1), ops,
+   &nary);
+ operands.release ();
+ if (nary && !nary->predicated_values)
+   {
+ unsigned value_id = nary->value_id;
+ if (value_id_constant_p (value_id))
+   continue;
+ result = get_or_alloc_expr_for_nary
+ (nary, value_id, gimple_location (stmt));
+ break;
+   }
+ continue;
+   }
+   }
+
  vn_reference_lookup_pieces (gimple_vuse (stmt), set,
  base_set, TREE_TYPE (rhs1),
  operands, &ref, VN_WALK);
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 481ab8b243d..f7f50c3de99 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -3998,6 +3998,41 @@ vn_reference_lookup_pieces (tree vuse, alias_set_type 
set,
   return NULL_TREE;
 }
 
+/* When OPERANDS is an ADDR_EXPR that can be possibly expressed as a
+   POINTER_PLUS_EXPR return true and fill in its operands in OPS.  */
+
+bool
+vn_pp_nary_for_addr (const vec& operands, tree ops[2])
+{
+  gcc_assert (operands[0].opcode == ADDR_EXPR
+ && operands.last ().opcode == SSA_NAME);
+  poly_int64 off = 0;
+  vn_reference_op_t vro;
+  unsigned i;
+  for (i = 1; operands.iterate (i, &vro); ++i)
+{
+  if (vro->opcode == SSA_NAME)
+   break;
+  else if (known_eq (vro->off, -1))
+   break;
+  off += vro->off;
+}
+  if (i == operands.length () - 1
+  && maybe_ne (off, 0)
+  /* Make sure we the offset we accumulated in a 64bit int
+fits the address computation carried out in target
+offset precision.  */
+  && (off.coeffs[0]
+ == sext_hwi (off.coeffs[0], TYPE_PRECISION (sizetype
+{
+  gcc_assert (operands[i-1].opc

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
>  wrote:
> >
> > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
> > > > >
> > > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to 
> > > > > return
> > > > > true, all integer arguments smaller than int are passed as int:
> > > > >
> > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > extern int baz (char c1);
> > > > >
> > > > > int
> > > > > foo (char c1)
> > > > > {
> > > > >   return baz (c1);
> > > > > }
> > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > .file "x.c"
> > > > > .text
> > > > > .p2align 4
> > > > > .globl foo
> > > > > .type foo, @function
> > > > > foo:
> > > > > .LFB0:
> > > > > .cfi_startproc
> > > > > movsbl 4(%esp), %eax
> > > > > movl %eax, 4(%esp)
> > > > > jmp baz
> > > > > .cfi_endproc
> > > > > .LFE0:
> > > > > .size foo, .-foo
> > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > .section .note.GNU-stack,"",@progbits
> > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > >
> > > > > But integer promotion:
> > > > >
> > > > > movsbl 4(%esp), %eax
> > > > > movl %eax, 4(%esp)
> > > > >
> > > > > isn't necessary if incoming arguments are copied to outgoing arguments
> > > > > directly.
> > > > >
> > > > > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, 
> > > > > defaulting
> > > > > to return nullptr.  If the new target hook returns non-nullptr, use 
> > > > > it to
> > > > > get the outgoing small integer argument.  The x86 target hook returns 
> > > > > the
> > > > > value of the corresponding incoming argument as int if it can be used 
> > > > > as
> > > > > the outgoing argument.  If callee is a global function, we always 
> > > > > properly
> > > > > extend the incoming small integer arguments in callee.  If callee is a
> > > > > local function, since DECL_ARG_TYPE has the original small integer 
> > > > > type,
> > > > > we will extend the incoming small integer arguments in callee if 
> > > > > needed.
> > > > > It is safe only if
> > > > >
> > > > > 1. Caller and callee are not nested functions.
> > > > > 2. Caller and callee use the same ABI.
> > > >
> > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > should apply to all of them, no?
> > >
> > > When the arguments are passed in different registers in different ABIs,
> > > we have to copy them anyway.
> >
> > But optimization can elide copies easily, but not easily elide
> > sign-/zero-extensions.
>
> What I meant was that caller and callee have different ABIs.
> Optimizer can't elide copies since incoming arguments and outgoing
> arguments are in different registers.  They have to be moved.
>
> > > >
> > > > > 3. The incoming argument and the outgoing argument are in the same
> > > > > location.
> > > >
> > > > Why's that?  Can't we move them but still elide the 
> > > > sign-/zero-extension?
> > >
> > > If they aren't in the same locations, we have to move them anyway.
> > > This patch tries to avoid necessary moves of incoming arguments to
> > > outgoing arguments.
> >
> > That's not exactly how you presented it, but you convenitently used
> > x86 stack argument passing.  That might be difficult to elide, but is
> > also uncommon for "small integer types" - does the same issue not
> > apply to other arguments passed on the stack as well?
>
> It applies to both passing in registers and on stack.   It is an issue only
> for small integer types due to sign-/zero-extensions at call sites.  My
> patch elides sign-/zero-extensions when incoming arguments and outgoing
> arguments are unchanged in the exactly same location, in register or on stack.

Is it possible to dissect this from TARGET_PROMOTE_PROTOTYPES then?
That is, this should also work for the case prototypes are not promoted and
for modes larger than SImode, even BLKmode.

Richard.

> >
> > > > > 4. The incoming argument is unchanged before call expansion.
> > > >
> > > > Obviously, but then IMO this reveals an issue with the design of a 
> > > > target hook
> > > > returning the argument register - it returns a place rather than a
> > > > value.  Wha'ts
> > >
> > > We need the place so that we can avoid meaningless copy.
> > >
> > > > the limitation of implementing this without help of the target?
> > >
> > > Middle-end may not know what is safe and not safe, for example, we
> > > can skip the hidden argument SUBREG for x32.
> > >
> > > > Richard.
> > > >
> > > > > Otherwise, using the incoming argument as the outgoing argument may 
> > > > > change
> > > > > values of other incoming arguments or the wrong outgoing argument 
> > > > > value
> > > > > may be used.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR middle-end/14907
> > > > > * calls.cc (arg_data): Add small_integer_argument_value.
> > > > > (p

Re: [pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 5:52 PM Uros Bizjak  wrote:
>
> MOVS instructions allow segment override of their source operand, e.g.:
>
> rep movsq %gs:(%rsi), (%rdi)
>
> where %rsi is the address of the source location (with %gs segment override)
> and %rdi is the address of the destination location.

Please be aware that 0x67 prefix (used by x32) is applied before segment
register.  That is in

 rep movsq %gs:(%esi), (%edi)

the address is %gs + %esi.

> The testcase improves from (-O2 -mno-sse -mtune=generic):
>
> xorl%eax, %eax
> .L2:
> movl%eax, %edx
> addl$8, %eax
> movq%gs:m(%rdx), %rcx
> movq%rcx, (%rdi,%rdx)
> cmpl$240, %eax
> jb.L2
> ret
>
> to:
> movl$m, %esi
> movl$30, %ecx
> rep movsq %gs:(%rsi), (%rdi)
> ret
>
> PR 111657
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (alg_usable_p): Remove have_as bool
> argument and add dst_as and src_as address space arguments.  Reject
> libcall algorithm with dst_as and src_as in the non-default address
> spaces.  Reject rep_prefix_{1,4,8}_byte algorithms with dst_as in
> the non-default address space.
> (decide_alg): Remove have_as bool argument and add dst_as and src_as
> address space arguments.  Update calls to alg_usable_p.
> (ix86_expand_set_or_cpymem): Update call to decide_alg.
> * config/i386/i386.md (strmov): Do not fail if operand[3] (source)
> is in the non-default address space.  Expand with gen_strmov_singleop
> only when operand[1] (destination) is in the default address space.
> (*strmovdi_rex_1): Determine memory operands from insn pattern.
> Allow only when destination is in the default address space.
> Rewrite asm template to use explicit operands.
> (*strmovsi_1): Ditto.
> (*strmovhi_1): DItto.
> (*strmovqi_1): Ditto.
> (*rep_movdi_rex64): Ditto.
> (*rep_movsi): Ditto.
> (*rep_movqi): Ditto.
> (*strsetdi_rex_1): Determine memory operands from insn pattern.
> Allow only when destination is in the default address space.
> (*strsetsi_1): Ditto.
> (*strsethi_1): Ditto.
> (*strsetqi_1): Ditto.
> (*rep_stosdi_rex64): Ditto.
> (*rep_stossi): Ditto.
> (*rep_stosqi): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr111657-1.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Uros.



-- 
H.J.

RE: [EXTERNAL]Re: [PATCH]RISCV :Added MIPS P8700 Subtarget

2025-04-29 Thread Umesh Kalappa

Hi all,

Here is the updated patch that address some of the   @Jeff Law comments .

P8700  don't  have a vector engine and we support the insns type till 
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.md#L358 
and schedule module enabled the same .

---
 gcc/config/riscv/mips-p8700.md   | 139 +++
 gcc/config/riscv/riscv-cores.def |   5 ++
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.cc|  22 +
 gcc/config/riscv/riscv.md|   3 +-
 5 files changed, 170 insertions(+), 2 deletions(-)  create mode 100644 
gcc/config/riscv/mips-p8700.md

diff --git a/gcc/config/riscv/mips-p8700.md b/gcc/config/riscv/mips-p8700.md 
new file mode 100644 index 000..11d0b1ca793
--- /dev/null
+++ b/gcc/config/riscv/mips-p8700.md
@@ -0,0 +1,139 @@
+;; DFA-based pipeline description for MIPS P8700.
+;;
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published ;; by 
+the Free Software Foundation; either version 3, or (at your ;; option) 
+any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT 
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public ;; 
+License for more details.
+
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+.
+
+(define_automaton "mips_p8700_agen_alq_pipe, mips_p8700_mdu_pipe, 
+mips_p8700_fpu_pipe")
+
+;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes 
+(define_cpu_unit "mips_p8700_agq, mips_p8700_al2, mips_p8700_ctistd, 
mips_p8700_lsu"
+"mips_p8700_agen_alq_pipe")
+
+(define_cpu_unit "mips_p8700_gpmul, mips_p8700_gpdiv" 
+"mips_p8700_mdu_pipe")
+
+;; The arithmetic-logic-unit queue (ALQ) has ALU pipe (define_cpu_unit 
+"mips_p8700_alq, mips_p8700_alu" "mips_p8700_agen_alq_pipe")
+
+;; The floating-point-unit queue (FPQ) has short and long pipes 
+(define_cpu_unit "mips_p8700_fpu_short, mips_p8700_fpu_long" 
+"mips_p8700_fpu_pipe")
+
+;; Long FPU pipeline.
+(define_cpu_unit "mips_p8700_fpu_apu" "mips_p8700_fpu_pipe")
+
+(define_reservation "mips_p8700_agq_al2" "mips_p8700_agq, 
+mips_p8700_al2") (define_reservation "mips_p8700_agq_ctistd" 
+"mips_p8700_agq, mips_p8700_ctistd") (define_reservation 
+"mips_p8700_agq_lsu" "mips_p8700_agq, mips_p8700_lsu") 
+(define_reservation "mips_p8700_alq_alu" "mips_p8700_alq, 
+mips_p8700_alu")
+
+;;
+;; FPU pipe
+;;
+
+(define_insn_reservation "mips_p8700_fpu_fadd" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fabs" 2
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcmp,fmove"))
+  "mips_p8700_fpu_short, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fload" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpload"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fstore" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpstore"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmadd" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmul" 5
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmul"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_div" 17
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fdiv,fsqrt"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu*17")
+
+(define_insn_reservation "mips_p8700_fpu_fcvt" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcvt,fcvt_i2f,fcvt_f2i"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmtc" 7
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "mtc"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmfc" 7
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "mfc"))
+  "mips_p8700_agq_lsu")
+
+;;
+;; Integer pipe
+;;
+
+(define_insn_reservation "mips_p8700_int_load" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "load"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_int_store" 3
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "store"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_int_arith_1" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" 
+"unknown,const,arith,shift,slt,multi,auipc,logical,move,bitmanip,min,ma
+x,minu,maxu,clz,ctz,rotate,atomic,condmove,crypto,mvpair,zicond"))
+  "mips_p8700_alq_alu | mips_p8700_agq_al2"

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 6:46 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
> > > > > >
> > > > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to 
> > > > > > return
> > > > > > true, all integer arguments smaller than int are passed as int:
> > > > > >
> > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > extern int baz (char c1);
> > > > > >
> > > > > > int
> > > > > > foo (char c1)
> > > > > > {
> > > > > >   return baz (c1);
> > > > > > }
> > > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > .file "x.c"
> > > > > > .text
> > > > > > .p2align 4
> > > > > > .globl foo
> > > > > > .type foo, @function
> > > > > > foo:
> > > > > > .LFB0:
> > > > > > .cfi_startproc
> > > > > > movsbl 4(%esp), %eax
> > > > > > movl %eax, 4(%esp)
> > > > > > jmp baz
> > > > > > .cfi_endproc
> > > > > > .LFE0:
> > > > > > .size foo, .-foo
> > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > > .section .note.GNU-stack,"",@progbits
> > > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > > >
> > > > > > But integer promotion:
> > > > > >
> > > > > > movsbl 4(%esp), %eax
> > > > > > movl %eax, 4(%esp)
> > > > > >
> > > > > > isn't necessary if incoming arguments are copied to outgoing 
> > > > > > arguments
> > > > > > directly.
> > > > > >
> > > > > > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, 
> > > > > > defaulting
> > > > > > to return nullptr.  If the new target hook returns non-nullptr, use 
> > > > > > it to
> > > > > > get the outgoing small integer argument.  The x86 target hook 
> > > > > > returns the
> > > > > > value of the corresponding incoming argument as int if it can be 
> > > > > > used as
> > > > > > the outgoing argument.  If callee is a global function, we always 
> > > > > > properly
> > > > > > extend the incoming small integer arguments in callee.  If callee 
> > > > > > is a
> > > > > > local function, since DECL_ARG_TYPE has the original small integer 
> > > > > > type,
> > > > > > we will extend the incoming small integer arguments in callee if 
> > > > > > needed.
> > > > > > It is safe only if
> > > > > >
> > > > > > 1. Caller and callee are not nested functions.
> > > > > > 2. Caller and callee use the same ABI.
> > > > >
> > > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > > should apply to all of them, no?
> > > >
> > > > When the arguments are passed in different registers in different ABIs,
> > > > we have to copy them anyway.
> > >
> > > But optimization can elide copies easily, but not easily elide
> > > sign-/zero-extensions.
> >
> > What I meant was that caller and callee have different ABIs.
> > Optimizer can't elide copies since incoming arguments and outgoing
> > arguments are in different registers.  They have to be moved.
> >
> > > > >
> > > > > > 3. The incoming argument and the outgoing argument are in the same
> > > > > > location.
> > > > >
> > > > > Why's that?  Can't we move them but still elide the 
> > > > > sign-/zero-extension?
> > > >
> > > > If they aren't in the same locations, we have to move them anyway.
> > > > This patch tries to avoid necessary moves of incoming arguments to
> > > > outgoing arguments.
> > >
> > > That's not exactly how you presented it, but you convenitently used
> > > x86 stack argument passing.  That might be difficult to elide, but is
> > > also uncommon for "small integer types" - does the same issue not
> > > apply to other arguments passed on the stack as well?
> >
> > It applies to both passing in registers and on stack.   It is an issue only
> > for small integer types due to sign-/zero-extensions at call sites.  My
> > patch elides sign-/zero-extensions when incoming arguments and outgoing
> > arguments are unchanged in the exactly same location, in register or on 
> > stack.
>
> Is it possible to dissect this from TARGET_PROMOTE_PROTOTYPES then?
> That is, this should also work for the case prototypes are not promoted and
> for modes larger than SImode, even BLKmode.
>
> Richard.

Arguments which don't need promotion, including large arguments, are already
working today.  The only issue is sign-/zero-extension of small outgoing integer
arguments on x86.  My patch removes unnecessary sign-/zero-extensions.   See:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14907

for more discussions.

-- 
H.J.

Re: [pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 6:49 PM Uros Bizjak  wrote:
>
> On Tue, Apr 29, 2025 at 12:41 PM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 5:52 PM Uros Bizjak  wrote:
> > >
> > > MOVS instructions allow segment override of their source operand, e.g.:
> > >
> > > rep movsq %gs:(%rsi), (%rdi)
> > >
> > > where %rsi is the address of the source location (with %gs segment 
> > > override)
> > > and %rdi is the address of the destination location.
> >
> > Please be aware that 0x67 prefix (used by x32) is applied before segment
> > register.  That is in
> >
> >  rep movsq %gs:(%esi), (%edi)
> >
> > the address is %gs + %esi.
>
> Uh, yes, now I remember this x32 peculiarity.
>
> So, we want the segment prefix disabled with "-mx32 -maddress-mode=short" ?
>
> Thanks,
> Uros.

Yes, it should be disabled when address size prefix is used, like TLS:

[hjl@gnu-tgl-3 tmp]$ cat t.c
extern __thread int x;

int
foo (void)
{
  return x;
}
[hjl@gnu-tgl-3 tmp]$ gcc -S -O2  t.c
[hjl@gnu-tgl-3 tmp]$ cat t.s
.file "t.c"
.text
.p2align 4
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movq x@gottpoff(%rip), %rax
movl %fs:(%rax), %eax
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 15.1.1 20250425 (Red Hat 15.1.1-1)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-tgl-3 tmp]$ gcc -S -O2  t.c -mx32
[hjl@gnu-tgl-3 tmp]$ cat t.s
.file "t.c"
.text
.p2align 4
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movl %fs:0, %eax
addl x@gottpoff(%rip), %eax
movl (%eax), %eax
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 15.1.1 20250425 (Red Hat 15.1.1-1)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-tgl-3 tmp]$


-- 
H.J.

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 2:33 PM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 6:46 PM Richard Biener
>  wrote:
> >
> > On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
> > > > > > >
> > > > > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to 
> > > > > > > return
> > > > > > > true, all integer arguments smaller than int are passed as int:
> > > > > > >
> > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > extern int baz (char c1);
> > > > > > >
> > > > > > > int
> > > > > > > foo (char c1)
> > > > > > > {
> > > > > > >   return baz (c1);
> > > > > > > }
> > > > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > .file "x.c"
> > > > > > > .text
> > > > > > > .p2align 4
> > > > > > > .globl foo
> > > > > > > .type foo, @function
> > > > > > > foo:
> > > > > > > .LFB0:
> > > > > > > .cfi_startproc
> > > > > > > movsbl 4(%esp), %eax
> > > > > > > movl %eax, 4(%esp)
> > > > > > > jmp baz
> > > > > > > .cfi_endproc
> > > > > > > .LFE0:
> > > > > > > .size foo, .-foo
> > > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > > > .section .note.GNU-stack,"",@progbits
> > > > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > > > >
> > > > > > > But integer promotion:
> > > > > > >
> > > > > > > movsbl 4(%esp), %eax
> > > > > > > movl %eax, 4(%esp)
> > > > > > >
> > > > > > > isn't necessary if incoming arguments are copied to outgoing 
> > > > > > > arguments
> > > > > > > directly.
> > > > > > >
> > > > > > > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, 
> > > > > > > defaulting
> > > > > > > to return nullptr.  If the new target hook returns non-nullptr, 
> > > > > > > use it to
> > > > > > > get the outgoing small integer argument.  The x86 target hook 
> > > > > > > returns the
> > > > > > > value of the corresponding incoming argument as int if it can be 
> > > > > > > used as
> > > > > > > the outgoing argument.  If callee is a global function, we always 
> > > > > > > properly
> > > > > > > extend the incoming small integer arguments in callee.  If callee 
> > > > > > > is a
> > > > > > > local function, since DECL_ARG_TYPE has the original small 
> > > > > > > integer type,
> > > > > > > we will extend the incoming small integer arguments in callee if 
> > > > > > > needed.
> > > > > > > It is safe only if
> > > > > > >
> > > > > > > 1. Caller and callee are not nested functions.
> > > > > > > 2. Caller and callee use the same ABI.
> > > > > >
> > > > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > > > should apply to all of them, no?
> > > > >
> > > > > When the arguments are passed in different registers in different 
> > > > > ABIs,
> > > > > we have to copy them anyway.
> > > >
> > > > But optimization can elide copies easily, but not easily elide
> > > > sign-/zero-extensions.
> > >
> > > What I meant was that caller and callee have different ABIs.
> > > Optimizer can't elide copies since incoming arguments and outgoing
> > > arguments are in different registers.  They have to be moved.
> > >
> > > > > >
> > > > > > > 3. The incoming argument and the outgoing argument are in the same
> > > > > > > location.
> > > > > >
> > > > > > Why's that?  Can't we move them but still elide the 
> > > > > > sign-/zero-extension?
> > > > >
> > > > > If they aren't in the same locations, we have to move them anyway.
> > > > > This patch tries to avoid necessary moves of incoming arguments to
> > > > > outgoing arguments.
> > > >
> > > > That's not exactly how you presented it, but you convenitently used
> > > > x86 stack argument passing.  That might be difficult to elide, but is
> > > > also uncommon for "small integer types" - does the same issue not
> > > > apply to other arguments passed on the stack as well?
> > >
> > > It applies to both passing in registers and on stack.   It is an issue 
> > > only
> > > for small integer types due to sign-/zero-extensions at call sites.  My
> > > patch elides sign-/zero-extensions when incoming arguments and outgoing
> > > arguments are unchanged in the exactly same location, in register or on 
> > > stack.
> >
> > Is it possible to dissect this from TARGET_PROMOTE_PROTOTYPES then?
> > That is, this should also work for the case prototypes are not promoted and
> > for modes larger than SImode, even BLKmode.
> >
> > Richard.
>
> Arguments which don't need promotion, including large arguments, are already
> working today.  The only issue is sign-/zero-extension of small outgoing 
> integer
> arguments on x86.  My patch removes unnecessary sign-/zero-extensions.   See:

So we're back to square one ... why restrict this sign-/zero-extension
elimination
t

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-29 Thread Robin Dapp

I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I 
have a try testing.  For example with below change:


+   switch (rcode)
+   {
+ case VEC_DUPLICATE:
+   *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
(1);

+   break;
+ case PLUS:
+   {
+   rtx op_0 = XEXP (x, 0); +   rtx op_1 = XEXP (x, 1);
+   if (GET_CODE (op_0) == VEC_DUPLICATE
+   || GET_CODE (op_1) == VEC_DUPLICATE)
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
(1);

+   else
+ *total = COSTS_N_INSNS (1);
+   break;
+   }
+ default:
+   *total = COSTS_N_INSNS (1);
+   break;
+   }
+
+   return true;

For case_0, GR2VR is 0, we will have late-combine as blow:
  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
  53   │ into:
  54   │18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
  55   │   REG_DEAD r146:RVVM1SI
  56   │ successfully matched this instruction to *add_vx_rvvm1si:
  57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 
  [ x ]) 0))

  59   │ (reg:RVVM1SI 146)))
  60   │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 
  (weighted: 64.727273); rejecting replacement



The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also 
confused about the how we calculate the
vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have 
define_insn_and_split to match the pattern and
emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec 
== vadd.vx.

Please help to correct me if misunderstanding.


Yes, that doesn't look quite correct yet.
I think the issue is that using

 *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);

as I suggested initializes total with an estimate of the mode size (total = 8 
for me) before we get to riscv_rtx_cost.  This makes the rest of the

costs (which we assume to be relative to 4) inaccurate.

So try
 *total = get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
for the vec_dup case and
 *total = COST_N_INSNS (1) + get_vector_costs ()->regmove->GR2VR * 
 COSTS_N_INSNS (1);

for the vx case.

Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

[RFC 0/3] Use automatic make dependencies in aarch64

2025-04-29 Thread Alice Carlotti

This RFC series shows the steps that I believe are relevant to using automatic
make depencies, and optionally automatic make rules, in the aarch64 backend.  I
believe the same steps and caveats would apply to other backends as well.

This builds upon the work by Tom Tromey in 2013 (see e.g. [1]), which in turn
was a resurrection of Tom's earlier attempt in 2008 ([2]).  Tom's work only
addressed dependencies in the middle end and front ends, and left the backends
unchanged.

I think it would be beneficial to convert the backend makefile rules as well,
to avoid the burden of updating the dependencies manually or the confusion that
arises when that is forgotten.  In the past couple of years I have:

- Discovered a missing dependency that lead to confusing out-of-sync enum
  values in driver-aarch64.cc (fixed in 943fd92254);

- Forgotten to update dependencies in my own patch in February, and only
  rembering this once my patch was approved and I was about to push it (pushed
  as 7135570043);

- Discovered a typo in another existing dependency variable (REG_H instead of
  REGS_H, fixed in the above commit).


This demonstrates a clear benefit to make the makefile rules automatic. I
thought this might be quite tricky, but it turns out to be fairly
straightforward.  However, there are a few caveats to consider.

This series is split into three patches to show the logical steps involved.  It
may be appropriate to merge these as a squashed patch, or to merge them
separately with a significant time gap, in order to avoid issues that can arise
with some of the intermediate step.

- Patch 1 uses $(COMPILE) and $(POSTCOMPILE) to ensure that all rules *output*
  and *use* automatic dependencies.
- Patch 2 removes explicit dependencies.
- Patch 3 modifies the object file locations to enable use of the default .cc.o
  rule.

I think patches 1+2 are definitely desirable.  Patch 3 is a little more
invasive, and less important (because a missing rule is a much more obvious
error than a missing dependency).  I am therefore posting it as a hacky RFC,
and will implement proper plumbing as required if using the default rule is
desirable.

Merging both patches 1 and 2 requires measures to ensure that we do not lose
necessary dependency information when doing incremental builds.  The problem
arises because patch 2 lacks most of the required dependency information until
the object file has been rebuilt with the compile command from patch 1.  There
are two ways I can see to avoid this issue:

Option 1: Merge patch 1, wait for .cc files to be changed, and then merge patch
2.  This would ensure that any incremental build that lacked both automatic and
explicit dependencies would rebuild the affect object files anyway, due to the
explicit dependency on the changed .cc file.  Almost every file has its
copyright header changed in the new year, so this would be satified by merging
patch 1 this year, and merging patch 2 in January.

Option 2: Merge patches 2+3 at the same time - this would change the location
of the affected object files, ensuring they are rebuilt without depending upon
the automatic dependency files.


These considerations apply to all backends, so it might be worth making taking
the same approach for all backends.  What do you think?




[1] https://gcc.gnu.org/pipermail/gcc-patches/2013-September/370648.html
[2] https://gcc.gnu.org/legacy-ml/gcc-patches/2008-03/msg00503.html

Re: [PATCH v3] libstdc++: Cleanup and stabilize format _Spec<_CharT> and _Pres_type.

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 12:58 PM Tomasz Kaminski 
wrote:

>
>
> On Tue, Apr 29, 2025 at 9:28 AM Tomasz Kamiński 
> wrote:
>
>> These patch makes following changes to _Pres_type values:
>>  * _Pres_esc is replaced with separate _M_debug flag.
>>  * _Pres_s, _Pres_p do not overlap with _Pres_none.
>>  * hexadecimal presentation use same values for pointer, integer
>>and floating point types.
>>
>> Instead of `_M_reserved` and `_M_reserved2` bitfields, the members of
>> _Spec<_CharT> are rearranged so the class contains 16bit of tail padding.
>> Derived classes (like _ChronoSpec<_CharT>) can reuse the storage for
>> initial
>> members. We also add _SpecBase as the base class for _Spec<_CharT> to make
>> it non-C++98 POD, which allows tail padding to be reused on Itanium ABI.
>>
>> Finally, the format enumerators are defined as enum class with unsigned
>> char as underlying type, followed by using enum to bring names in scope.
>> _Term_char was adjusted for consistency.
>>
>> The '?' is changed to separate _M_debug flag, to allow debug format to be
>> independent from the presentation type, and applied to multiple
>> presentation
>> types. For example it could be used to trigger memberwise or reflection
>> based
>> formatting.
>>
>> The _M_format_character and _M_format_character_escaped functions are
>> merged
>> to single function that handle normal and debug presentation. In
>> particular
>> this would allow future support for '?c' for printing integer types as
>> escaped
>> character. _S_character_width is also folded in the merged function.
>>
>> Decoupling _Pres_s value from _Pres_none, allows it to be used for string
>> presentation for range formatting, and removes the need for separate
>> _Pres_seq
>> and _Pres_str. This does not affect formatting of bool as
>> __formatter_int::_M_parse
>> overrides default value of _M_type. And with separation of the _M_debug
>> flag,
>> __formatter_str::format behavior is now agnostic to _M_type value.
>>
>> The values for integer presentation types, are arranged so textual
>> presentations
>> (_Prec_s, _Pres_c) are grouped together. For consistency floating point
>> hexadecimal presentation uses the same values as integer ones.
>>
>> New _Pres_p and setting for _M_alt enables using some spec to configure
>> formatting
>> of  uintptr_t with __formatter_int, and const void* with __formatter_ptr.
>> Differentiating it from _Pres_none would allow future of formatter> _CharT>
>> that would require explicit presentation type to be specified. This would
>> allow
>> std::vector to be formatter directly with '{::p}' format spec.
>>
>> The constructors for __formatter_int and _formatter_ptr from
>> _Spec<_CharT>,
>> now also set default presentation modes, as format functions expects them.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/chrono_io.h
>> (_ChronoSpec<_CharT>::_M_locale_specific):
>> Declare as bit filed in tail-padding..
>> * include/bits/formatfwd.h (__format::_Align): Defined as enum
>> class
>> and add using enum.
>> * include/std/format (__format::_Pres_type, __format::_Sign)
>> (__format::_WidthPrec,  __format::_Arg_t): Defined as enum class
>> and
>> add using enum.
>> (_Pres_type::_Pres_esc): Replace with _Pres_max.
>> (_Pres_type::_Pres_seq, _Pres_type::_Pres_str): Remove.
>> (__format::_Pres_type): Updated values of enumerators as
>> described above.
>> (_Spec<_CharT>): Rearranged members to have 16bits of
>> tail-padding.
>> (_Spec<_CharT>::_M_debug): Defined.
>> (_Spec<_CharT>::_M_reserved, _Spec<_CharT>::_M_reserved2):
>> Removed.
>> (_Spec<_CharT>::_M_parse_fill_and_align,
>> _Spec<_CharT>::_M_parse_sign)
>> (__format::__write_padded_as_spec): Adjusted default value checks.
>> (__format::_Term_char): Add using enum and rename enumertors.
>> (__format::__should_escape_ascii): Adjusted _Term_char uses.
>> (__formatter_str<_CharT>::parse): Set _Pres_s if specifed and
>> _M_debug
>> instead of _Pres_esc.
>> (__formatter_str<_CharT>::set_debug_format): Set _M_debug instead
>> of
>> _Pres_esc.
>> (__formatter_str<_CharT>::format,
>> __formatter_str<_CharT>::_M_format_range):
>> Check _M_debug instead of _Prec_esc.
>> (__formatter_str<_CharT>::_M_format_escaped): Adjusted _Term_char
>> uses.
>> (__formatter_int<_CharT>::__formatter_int(_Spec<_CharT>)): Set
>> _Pres_d if
>> default presentation type is not set.
>> (__formatter_int<_CharT>::_M_parse): Adjusted default value
>> checks.
>> (__formatter_int<_CharT>::_M_do_parse): Set _M_debug instead of
>> _Pres_esc.
>> (__formatter_int<_CharT>::_M_format_character): Handle escaped
>> presentation.
>> (__formatter_int<_CharT>::_M_format_character_escaped)
>> (__formatter_int<_CharT>::_S_character_width): Merged into
>> _M_format_character.
>> (__format

Re: [PATCH v3] libstdc++: Cleanup and stabilize format _Spec<_CharT> and _Pres_type.

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 9:28 AM Tomasz Kamiński  wrote:

> These patch makes following changes to _Pres_type values:
>  * _Pres_esc is replaced with separate _M_debug flag.
>  * _Pres_s, _Pres_p do not overlap with _Pres_none.
>  * hexadecimal presentation use same values for pointer, integer
>and floating point types.
>
> Instead of `_M_reserved` and `_M_reserved2` bitfields, the members of
> _Spec<_CharT> are rearranged so the class contains 16bit of tail padding.
> Derived classes (like _ChronoSpec<_CharT>) can reuse the storage for
> initial
> members. We also add _SpecBase as the base class for _Spec<_CharT> to make
> it non-C++98 POD, which allows tail padding to be reused on Itanium ABI.
>
> Finally, the format enumerators are defined as enum class with unsigned
> char as underlying type, followed by using enum to bring names in scope.
> _Term_char was adjusted for consistency.
>
> The '?' is changed to separate _M_debug flag, to allow debug format to be
> independent from the presentation type, and applied to multiple
> presentation
> types. For example it could be used to trigger memberwise or reflection
> based
> formatting.
>
> The _M_format_character and _M_format_character_escaped functions are
> merged
> to single function that handle normal and debug presentation. In particular
> this would allow future support for '?c' for printing integer types as
> escaped
> character. _S_character_width is also folded in the merged function.
>
> Decoupling _Pres_s value from _Pres_none, allows it to be used for string
> presentation for range formatting, and removes the need for separate
> _Pres_seq
> and _Pres_str. This does not affect formatting of bool as
> __formatter_int::_M_parse
> overrides default value of _M_type. And with separation of the _M_debug
> flag,
> __formatter_str::format behavior is now agnostic to _M_type value.
>
> The values for integer presentation types, are arranged so textual
> presentations
> (_Prec_s, _Pres_c) are grouped together. For consistency floating point
> hexadecimal presentation uses the same values as integer ones.
>
> New _Pres_p and setting for _M_alt enables using some spec to configure
> formatting
> of  uintptr_t with __formatter_int, and const void* with __formatter_ptr.
> Differentiating it from _Pres_none would allow future of formatter _CharT>
> that would require explicit presentation type to be specified. This would
> allow
> std::vector to be formatter directly with '{::p}' format spec.
>
> The constructors for __formatter_int and _formatter_ptr from _Spec<_CharT>,
> now also set default presentation modes, as format functions expects them.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h
> (_ChronoSpec<_CharT>::_M_locale_specific):
> Declare as bit filed in tail-padding..
> * include/bits/formatfwd.h (__format::_Align): Defined as enum
> class
> and add using enum.
> * include/std/format (__format::_Pres_type, __format::_Sign)
> (__format::_WidthPrec,  __format::_Arg_t): Defined as enum class
> and
> add using enum.
> (_Pres_type::_Pres_esc): Replace with _Pres_max.
> (_Pres_type::_Pres_seq, _Pres_type::_Pres_str): Remove.
> (__format::_Pres_type): Updated values of enumerators as described
> above.
> (_Spec<_CharT>): Rearranged members to have 16bits of tail-padding.
> (_Spec<_CharT>::_M_debug): Defined.
> (_Spec<_CharT>::_M_reserved, _Spec<_CharT>::_M_reserved2): Removed.
> (_Spec<_CharT>::_M_parse_fill_and_align,
> _Spec<_CharT>::_M_parse_sign)
> (__format::__write_padded_as_spec): Adjusted default value checks.
> (__format::_Term_char): Add using enum and rename enumertors.
> (__format::__should_escape_ascii): Adjusted _Term_char uses.
> (__formatter_str<_CharT>::parse): Set _Pres_s if specifed and
> _M_debug
> instead of _Pres_esc.
> (__formatter_str<_CharT>::set_debug_format): Set _M_debug instead
> of
> _Pres_esc.
> (__formatter_str<_CharT>::format,
> __formatter_str<_CharT>::_M_format_range):
> Check _M_debug instead of _Prec_esc.
> (__formatter_str<_CharT>::_M_format_escaped): Adjusted _Term_char
> uses.
> (__formatter_int<_CharT>::__formatter_int(_Spec<_CharT>)): Set
> _Pres_d if
> default presentation type is not set.
> (__formatter_int<_CharT>::_M_parse): Adjusted default value checks.
> (__formatter_int<_CharT>::_M_do_parse): Set _M_debug instead of
> _Pres_esc.
> (__formatter_int<_CharT>::_M_format_character): Handle escaped
> presentation.
> (__formatter_int<_CharT>::_M_format_character_escaped)
> (__formatter_int<_CharT>::_S_character_width): Merged into
> _M_format_character.
> (__formatter_ptr<_CharT>::__formatter_ptr(_Spec<_CharT>)): Set
> _Pres_p if default
> presentation type is not set.
> (__formatter_ptr<_CharT>::parse): Add default __type paramete

Re: [pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread Uros Bizjak

On Tue, Apr 29, 2025 at 12:41 PM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 5:52 PM Uros Bizjak  wrote:
> >
> > MOVS instructions allow segment override of their source operand, e.g.:
> >
> > rep movsq %gs:(%rsi), (%rdi)
> >
> > where %rsi is the address of the source location (with %gs segment override)
> > and %rdi is the address of the destination location.
>
> Please be aware that 0x67 prefix (used by x32) is applied before segment
> register.  That is in
>
>  rep movsq %gs:(%esi), (%edi)
>
> the address is %gs + %esi.

Uh, yes, now I remember this x32 peculiarity.

So, we want the segment prefix disabled with "-mx32 -maddress-mode=short" ?

Thanks,
Uros.

Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-29 Thread Richard Sandiford

Jennifer Schmitz  writes:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index f7bccf532f8..1c06b8528e9 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -6416,13 +6416,30 @@ aarch64_stack_protect_canary_mem (machine_mode mode, 
> rtx decl_rtl,
>  void
>  aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
>  {
> -  expand_operand ops[3];
>machine_mode mode = GET_MODE (dest);
> -  create_output_operand (&ops[0], dest, mode);
> -  create_input_operand (&ops[1], pred, GET_MODE(pred));
> -  create_input_operand (&ops[2], src, mode);
> -  temporary_volatile_ok v (true);
> -  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
> +  if ((MEM_P (dest) || MEM_P (src))
> +  && known_eq (BYTES_PER_SVE_VECTOR, 16)

I suppose it's personal preference, but I think this would be more
obvious as the suggested:

  known_eq (GET_MODE_SIZE (mode), 16)

so that everything is defined/tested in terms of the mode that we
want to move.

> +  && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA
> +  && !BYTES_BIG_ENDIAN)
> +{
> +  if (MEM_P (src))
> + {
> +   rtx tmp = force_reg (V16QImode, adjust_address (src, V16QImode, 0));
> +   emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode));
> + }
> +  else
> + emit_move_insn (adjust_address (dest, V16QImode, 0),
> + force_lowpart_subreg (V16QImode, src, mode));
> +}
> +  else
> +{
> +  expand_operand ops[3];
> +  create_output_operand (&ops[0], dest, mode);
> +  create_input_operand (&ops[1], pred, GET_MODE(pred));
> +  create_input_operand (&ops[2], src, mode);
> +  temporary_volatile_ok v (true);
> +  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
> +}
>  }
>  
>  /* Expand a pre-RA SVE data move from SRC to DEST in which at least one
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> index d99ce1202a9..370bd9e3bfe 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> @@ -473,17 +473,11 @@ SEL2 (struct, pst_uniform4)
>  **   sub sp, sp, #144
>  **   add (x[0-9]+), sp, #?31
>  **   and x7, \1, #?(?:-32|4294967264)
> -**   ptrue   (p[0-7])\.b, vl16
> -**   st1wz0\.s, \2, \[x7\]
> -**   add (x[0-9]+), x7, #?32
> -** (
> -**   str z1, \[\3\]
> -**   str z2, \[\3, #1, mul vl\]
> -** |
> -**   stp q1, q2, \[\3\]
> -** )
> -**   str z3, \[\3, #2, mul vl\]
> -**   st1wz4\.s, \2, \[x7, #6, mul vl\]
> +**   mov x0, x7
> +**   str q0, \[x0\], 32
> +**   stp q1, q2, \[x0\]
> +**   str z3, \[x0, #2, mul vl\]
> +**   str q4, \[x7, 96\]
>  **   add sp, sp, #?144
>  **   ret
>  */

Sorry for not noticing last time, but:

There's no need for the temporary register to be x0, so it should be
handled using captures rather than hard-coded to x0, as with the
original ADD.

Also, the patch doesn't change the handling of unpredicated SVE stores,
and so doesn't change whether we use:

str z1, \[\3\]
str z2, \[\3, #1, mul vl\]

or:

stp q1, q2, \[\3\]

So I think we should keep the (...|...) part of the test as-is, with
just the captures updated.

There would be a benefit in "defending" the use of STP if we paired q0+q1
and q2+q3, but until then, I think we should continue to accept both.

Same for the other tests in the file.

Thanks,
Richard

[PATCH v5 02/10] libstdc++: Add header mdspan to the build-system.

2025-04-29 Thread Luc Grosheintz

Creates a nearly empty header mdspan and adds it to the build-system and
Doxygen config file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Add .
* include/Makefile.am: Ditto.
* include/Makefile.in: Ditto.
* include/precompiled/stdc++.h: Ditto.
* include/std/mdspan: New file.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
 libstdc++-v3/include/Makefile.am  |  1 +
 libstdc++-v3/include/Makefile.in  |  1 +
 libstdc++-v3/include/precompiled/stdc++.h |  1 +
 libstdc++-v3/include/std/mdspan   | 48 +++
 5 files changed, 52 insertions(+)
 create mode 100644 libstdc++-v3/include/std/mdspan

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 19ae67a67ba..e926c6707f6 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -880,6 +880,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc 
\
  include/list \
  include/locale \
  include/map \
+ include/mdspan \
  include/memory \
  include/memory_resource \
  include/mutex \
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 537774c2668..1140fa0dffd 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -38,6 +38,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 7b96b2207f8..c96e981acd6 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -396,6 +396,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index f4b312d9e47..e7d89c92704 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -228,6 +228,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
new file mode 100644
index 000..4094a416d1e
--- /dev/null
+++ b/libstdc++-v3/include/std/mdspan
@@ -0,0 +1,48 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2025 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file mdspan
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_MDSPAN
+#define _GLIBCXX_MDSPAN 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#define __glibcxx_want_mdspan
+#include 
+
+#ifdef __glibcxx_mdspan
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+#endif
+#endif
-- 
2.49.0

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Jonathan Wakely

On Tue, 29 Apr 2025 at 13:54, Luc Grosheintz  wrote:
>
> This implements std::extents from  according to N4950 and
> contains partial progress towards PR107761.
>
> If an extent changes its type, there's a precondition in the standard,
> that the value is representable in the target integer type. This
> precondition is not checked at runtime.
>
> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> For extents this precondition is always violated and results in
> calling __builtin_trap. For all other specializations it's checked via
> __glibcxx_assert.
>
> PR libstdc++/107761
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (extents): New class.
> * src/c++23/std.cc.in: Add 'using std::extents'.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan  | 262 +++
>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>  2 files changed, 267 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
> index 4094a416d1e..39ced1d6301 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -33,6 +33,12 @@
>  #pragma GCC system_header
>  #endif
>
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
>  #define __glibcxx_want_mdspan
>  #include 
>
> @@ -41,6 +47,262 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +  namespace __mdspan
> +  {
> +template
> +  class _ExtentsStorage
> +  {
> +  public:
> +   static consteval bool
> +   _S_is_dyn(size_t __ext) noexcept
> +   { return __ext == dynamic_extent; }
> +
> +   template
> + static constexpr _IndexType
> + _S_int_cast(const _OIndexType& __other) noexcept
> + { return _IndexType(__other); }
> +
> +   static constexpr size_t _S_rank = _Extents.size();
> +
> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> +   // of dynamic extents up to (and not including) __r.
> +   //
> +   // If __r is the index of a dynamic extent, then
> +   // _S_dynamic_index[__r] is the index of that extent in
> +   // _M_dynamic_extents.
> +   static constexpr auto _S_dynamic_index = [] consteval
> +   {
> + array __ret;
> + size_t __dyn = 0;
> + for(size_t __i = 0; __i < _S_rank; ++__i)
> +   {
> + __ret[__i] = __dyn;
> + __dyn += _S_is_dyn(_Extents[__i]);
> +   }
> + __ret[_S_rank] = __dyn;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
> +
> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
> +   // index of the __r-th dynamic extent in _Extents.
> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> +   {
> + array __ret;
> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> +   if (_S_is_dyn(_Extents[__i]))
> + __ret[__r++] = __i;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t
> +   _S_static_extent(size_t __r) noexcept
> +   { return _Extents[__r]; }
> +
> +   constexpr _IndexType
> +   _M_extent(size_t __r) const noexcept
> +   {
> + auto __se = _Extents[__r];
> + if (__se == dynamic_extent)
> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> + else
> +   return __se;
> +   }
> +
> +   template
> + constexpr void
> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> + {
> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> + {
> +   size_t __di = __i;
> +   if constexpr (_OtherRank != _S_rank_dynamic)
> + __di = _S_dynamic_index_inv[__i];
> +   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
> + }
> + }
> +
> +   constexpr
> +   _ExtentsStorage() noexcept = default;
> +
> +   template
> + constexpr
> + _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
> + __other) noexcept
> + {
> +   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
> + { return __other._M_extent(__i); });
> + }
> +
> +   template
> + constexpr
> + _ExtentsStorage(span __exts) noexcept
> + {
> +   _M_init_dynamic_extents<_Nm>(
> + [&__exts](size_t __i) -> const _OIndexType&
> + { return __exts[__i]; });
> + }
> +
> +  private:
> +   using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
> +   [[no_unique_address]] _S_storage _M_dynamic_extents;
> +  };
> +
> +template
> +  concept __valid_index_type =
> +   is_convertible_v<_OIndexType, _SIndexType> &&
> +   is_nothrow_

[RFC 2/3] aarch64: Remove explicit make dependencies

2025-04-29 Thread Alice Carlotti

This might miss some dependencies when doing an incremental build where
the previous build did not include generated dependency files, and the
.cc file has not subsequently changed (but another dependency has).

gcc/ChangeLog:

* config/aarch64/t-aarch64: Remove explicit .o dependencies.


diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 
a70c323ad0ad6be7d887645b6866112105f4f805..ae8c406db8388e49425124e5b438feadf76ae61d
 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -49,83 +49,31 @@ endif
 
 s-mddeps: s-aarch64-tune-md
 
-aarch64-builtins.o: $(srcdir)/config/aarch64/aarch64-builtins.cc $(CONFIG_H) \
-  $(SYSTEM_H) coretypes.h $(TM_H) $(REGS_H) \
-  $(RTL_H) $(TREE_H) expr.h $(TM_P_H) $(RECOG_H) langhooks.h \
-  $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
-  $(srcdir)/config/aarch64/aarch64-simd-builtins.def \
-  $(srcdir)/config/aarch64/aarch64-simd-builtin-types.def \
-  $(srcdir)/config/aarch64/aarch64-simd-pragma-builtins.def \
-  aarch64-builtin-iterators.h
+aarch64-builtins.o: $(srcdir)/config/aarch64/aarch64-builtins.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
-aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.def \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.def \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.def \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.def \
-  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
-  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) $(DIAGNOSTIC_H) \
-  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
-  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
-  stor-layout.h alias.h gimple-fold.h langhooks.h \
-  stringpool.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h
+aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
 aarch64-sve-builtins-shapes.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.cc \
-  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
-  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
 aarch64-sve-builtins-base.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.cc \
-  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
-  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
-  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
-  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
-  rtx-vector-builder.h vec-perm-indices.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
 aarch64-sve-builtins-sve2.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc \
-  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
-  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
-  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
-  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
-  rtx-vector-builder.h vec-perm-indices.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
 aarch64-sve-builtins-sme.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc \
-  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
-  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
-  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
-  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
@@ -135,13 +83,11 @@ aarch64-builtin-iterators.h: 
$(srcdir)/config/aarch64/geniterators.sh \
$(srcdir)/config/aarch64/iterators.md >

[RFC 3/3] aarch64: Use default .cc.o rule to build aarch64 .cc files

2025-04-29 Thread Alice Carlotti

The change to gcc/configure is a hack to illustrate where we need extra
arguments available.  If the rest of the change is desirable, then we
could define a new variable to include these extra directories.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
6dbe880c9d45369a0128d79f5fa30ca07faf9532..3c321794025ff9314817c65ae07a39a708966685
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -348,10 +348,10 @@ m32c*-*-*)
 aarch64*-*-*)
cpu_type=aarch64
extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h arm_private_neon_types.h"
-   c_target_objs="aarch64-c.o"
-   cxx_target_objs="aarch64-c.o"
-   d_target_objs="aarch64-d.o"
-   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
cortex-a57-fma-steering.o aarch64-speculation.o aarch-bti-insert.o 
aarch64-cc-fusion.o aarch64-early-ra.o aarch64-ldp-fusion.o"
+   c_target_objs="config/aarch64/aarch64-c.o"
+   cxx_target_objs="config/aarch64/aarch64-c.o"
+   d_target_objs="config/aarch64/aarch64-d.o"
+   extra_objs="config/aarch64/aarch64-builtins.o config/arm/aarch-common.o 
config/aarch64/aarch64-sve-builtins.o 
config/aarch64/aarch64-sve-builtins-shapes.o 
config/aarch64/aarch64-sve-builtins-base.o 
config/aarch64/aarch64-sve-builtins-sve2.o 
config/aarch64/aarch64-sve-builtins-sme.o 
config/aarch64/cortex-a57-fma-steering.o config/aarch64/aarch64-speculation.o 
config/arm/aarch-bti-insert.o config/aarch64/aarch64-cc-fusion.o 
config/aarch64/aarch64-early-ra.o config/aarch64/aarch64-ldp-fusion.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
\$(srcdir)/config/aarch64/aarch64-builtins.h 
\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 
ae8c406db8388e49425124e5b438feadf76ae61d..a118ac8bf7b0f5c83aed6a3bc5284ee8b006fed1
 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -49,78 +49,14 @@ endif
 
 s-mddeps: s-aarch64-tune-md
 
-aarch64-builtins.o: $(srcdir)/config/aarch64/aarch64-builtins.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-sve-builtins-shapes.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-sve-builtins-base.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-base.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-sve-builtins-sve2.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-sve-builtins-sme.o: \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
 aarch64-builtin-iterators.h: $(srcdir)/config/aarch64/geniterators.sh \
$(srcdir)/config/aarch64/iterators.md
$(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
$(srcdir)/config/aarch64/iterators.md > \
aarch64-builtin-iterators.h
 
-aarch-common.o: $(srcdir)/config/arm/aarch-common.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-c.o: $(srcdir)/config/aarch64/aarch64-c.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-d.o: $(srcdir)/config/aarch64/aarch64-d.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
 PASSES_EXTRA += $(srcdir)/config/aarch64/aarch64-passes.def
 
-cortex-a57-fma-steering.o: $(srcdir)/config/aarch64/cortex-a57-fma-steering.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-speculation.o: $(srcdir)/config/aarch64/aarch64-speculation.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch-bti-insert.o: $(srcdir)/config/arm/aarch-bti-insert.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-cc-fusion.o: $(srcdir)/config/aarch64/aarch64-cc-fusion.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-early-ra.o: $(srcdir)/config/aarch64/aarch64-early-ra.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
-aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
 comma=,
 MULTILIB_OPTIONS= $(subst $(comma),/, $(patsubst %, mabi=%, $(subst 
$(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG
 MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
diff --git a/gcc/configure b/gcc/configure
index 
16965953f05160ea4572957144f305cc0cce4e18..9eedf9d45b4e30207803f4bc11c2af51f5fd2ae4
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -36390,7 +36390,7 @@ $as_echo "$as_me: executing $ac_file commands" >&6;}
 "depdir":C) $SHELL $ac_aux_dir/mkinstalldirs $DEPDIR ;;
 "gccdepdir":C)
   ${CON

[PATCH 1/2] tree-optimization/119960 - fix and guard get_later_stmt

2025-04-29 Thread Richard Biener

The following makes get_later_stmt handle stmts from different
basic-blocks in the case they are orderd and otherwise asserts.

Bootstrap/regtest running on x86_64-unknown-linux-gnu.

* tree-vectorizer.h (get_later_stmt): Robustify against
stmts in different BBs, assert when they are unordered.
---
 gcc/tree-vectorizer.h | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 0e4cc8e7cdc..d0557d39430 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1873,11 +1873,25 @@ vect_orig_stmt (stmt_vec_info stmt_info)
 inline stmt_vec_info
 get_later_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info)
 {
-  if (gimple_uid (vect_orig_stmt (stmt1_info)->stmt)
-  > gimple_uid (vect_orig_stmt (stmt2_info)->stmt))
+  gimple *stmt1 = vect_orig_stmt (stmt1_info)->stmt;
+  gimple *stmt2 = vect_orig_stmt (stmt2_info)->stmt;
+  if (gimple_bb (stmt1) == gimple_bb (stmt2))
+{
+  if (gimple_uid (stmt1) > gimple_uid (stmt2))
+   return stmt1_info;
+  else
+   return stmt2_info;
+}
+  /* ???  We should be really calling this function only with stmts
+ in the same BB but we can recover if there's a domination
+ relationship between them.  */
+  else if (dominated_by_p (CDI_DOMINATORS,
+  gimple_bb (stmt1), gimple_bb (stmt2)))
 return stmt1_info;
-  else
+  else if (dominated_by_p (CDI_DOMINATORS,
+  gimple_bb (stmt2), gimple_bb (stmt1)))
 return stmt2_info;
+  gcc_unreachable ();
 }
 
 /* If STMT_INFO has been replaced by a pattern statement, return the
-- 
2.43.0

[RFC 1/3] aarch64: Generate automatic dependency rules

2025-04-29 Thread Alice Carlotti

This also improves consistency of the compile commands, and eliminates
an ALL_SPPFLAGS typo.

gcc/ChangeLog:

* config/aarch64/t-aarch64: Use $(COMPILE) and $(POSTCOMPILE)


diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 
59571948479c0857df2cca70b18df6c5d9a7254c..a70c323ad0ad6be7d887645b6866112105f4f805
 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -57,8 +57,8 @@ aarch64-builtins.o: 
$(srcdir)/config/aarch64/aarch64-builtins.cc $(CONFIG_H) \
   $(srcdir)/config/aarch64/aarch64-simd-builtin-types.def \
   $(srcdir)/config/aarch64/aarch64-simd-pragma-builtins.def \
   aarch64-builtin-iterators.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-builtins.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
   $(srcdir)/config/aarch64/aarch64-sve-builtins.def \
@@ -76,8 +76,8 @@ aarch64-sve-builtins.o: 
$(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-sve-builtins.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-sve-builtins-shapes.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.cc \
@@ -85,8 +85,8 @@ aarch64-sve-builtins-shapes.o: \
   $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) \
   $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-sve-builtins-base.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.cc \
@@ -99,8 +99,8 @@ aarch64-sve-builtins-base.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-sve-builtins-sve2.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc \
@@ -113,8 +113,8 @@ aarch64-sve-builtins-sve2.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-sve-builtins-sme.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc \
@@ -126,8 +126,8 @@ aarch64-sve-builtins-sme.o: \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-builtin-iterators.h: $(srcdir)/config/aarch64/geniterators.sh \
$(srcdir)/config/aarch64/iterators.md
@@ -137,13 +137,13 @@ aarch64-builtin-iterators.h: 
$(srcdir)/config/aarch64/geniterators.sh \
 
 aarch-common.o: $(srcdir)/config/arm/aarch-common.cc $(CONFIG_H) $(SYSTEM_H) \
 coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H)
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/arm/aarch-common.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-c.o: $(srcdir)/config/aarch64/aarch64-c.cc $(CONFIG_H) $(SYSTEM_H) \
 coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H) $(TARGET_H)
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/aarch64-c.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-d.o: $(srcdir)/config/aarch64/aarch64-d.cc
$(COMPILE) $<
@@ -157,8 +157,8 @@ cortex-a57-fma-steering.o: 
$(srcdir)/config/aarch64/cortex-a57-fma-steering.cc \
 output.h hash-map.h $(DF_H) $(OBSTACK_H) $(TARGET_H) $(RTL_H) \
 $(CONTEXT_H) $(TREE_PASS_H) regrename.h \
 $(srcdir)/config/aarch64/aarch64-protos.h
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/aarch64/cortex-a57-fma-steering.cc
+   $(COMPILE) $<
+   $(POSTCOMPILE)
 
 aarch64-speculation.o: $(srcdir)/config/aarch64/aarch64-speculation.cc \
 $(CONFIG_H) \
@@ -167,8 +167,8 @@ aarch64-speculatio

[PATCH 2/2] tree-optimization/119960 - add validity checking to SLP scheduling

2025-04-29 Thread Richard Biener

The following adds checks that when we search for a vector stmt
insert location we arrive at one where all required operand defs
are dominating the insert location.  At the moment any such
failure only blows up during SSA verification.

There's the long-standing issue that we do not verify there
exists a valid schedule of the SLP graph from BB vectorization
into the existing CFG.  We do not have the ability to insert
vector stmts on the dominance frontier "end", nor to insert
LC PHIs that would be eventually required.

This should be done all differently, computing the schedule
during analysis and failing if we can't schedule.

Bootstrap/regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/119960
* tree-vect-slp.cc (vect_schedule_slp_node): Sanity
check dominance check on operand defs.
---
 gcc/tree-vect-slp.cc | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 5eca08be2ef..439b99cab0f 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -11214,9 +11214,14 @@ vect_schedule_slp_node (vec_info *vinfo,
== cycle_phi_info_type);
gphi *phi = as_a 
  (vect_find_last_scalar_stmt_in_slp (child)->stmt);
-   if (!last_stmt
-   || vect_stmt_dominates_stmt_p (last_stmt, phi))
+   if (!last_stmt)
  last_stmt = phi;
+   else if (vect_stmt_dominates_stmt_p (last_stmt, phi))
+ last_stmt = phi;
+   else if (vect_stmt_dominates_stmt_p (phi, last_stmt))
+ ;
+   else
+ gcc_unreachable ();
  }
/* We are emitting all vectorized stmts in the same place and
   the last one is the last.
@@ -11227,9 +11232,14 @@ vect_schedule_slp_node (vec_info *vinfo,
FOR_EACH_VEC_ELT (SLP_TREE_VEC_DEFS (child), j, vdef)
  {
gimple *vstmt = SSA_NAME_DEF_STMT (vdef);
-   if (!last_stmt
-   || vect_stmt_dominates_stmt_p (last_stmt, vstmt))
+   if (!last_stmt)
+ last_stmt = vstmt;
+   else if (vect_stmt_dominates_stmt_p (last_stmt, vstmt))
  last_stmt = vstmt;
+   else if (vect_stmt_dominates_stmt_p (vstmt, last_stmt))
+ ;
+   else
+ gcc_unreachable ();
  }
  }
else if (!SLP_TREE_VECTYPE (child))
@@ -11242,9 +11252,14 @@ vect_schedule_slp_node (vec_info *vinfo,
  && !SSA_NAME_IS_DEFAULT_DEF (def))
{
  gimple *stmt = SSA_NAME_DEF_STMT (def);
- if (!last_stmt
- || vect_stmt_dominates_stmt_p (last_stmt, stmt))
+ if (!last_stmt)
+   last_stmt = stmt;
+ else if (vect_stmt_dominates_stmt_p (last_stmt, stmt))
last_stmt = stmt;
+ else if (vect_stmt_dominates_stmt_p (stmt, last_stmt))
+   ;
+ else
+   gcc_unreachable ();
}
  }
else
@@ -11265,9 +11280,14 @@ vect_schedule_slp_node (vec_info *vinfo,
  && !SSA_NAME_IS_DEFAULT_DEF (vdef))
{
  gimple *vstmt = SSA_NAME_DEF_STMT (vdef);
- if (!last_stmt
- || vect_stmt_dominates_stmt_p (last_stmt, vstmt))
+ if (!last_stmt)
+   last_stmt = vstmt;
+ else if (vect_stmt_dominates_stmt_p (last_stmt, vstmt))
last_stmt = vstmt;
+ else if (vect_stmt_dominates_stmt_p (vstmt, last_stmt))
+   ;
+ else
+   gcc_unreachable ();
}
  }
  }
-- 
2.43.0

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 9:34 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 2:33 PM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 6:46 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES 
> > > > > > > > to return
> > > > > > > > true, all integer arguments smaller than int are passed as int:
> > > > > > > >
> > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > > extern int baz (char c1);
> > > > > > > >
> > > > > > > > int
> > > > > > > > foo (char c1)
> > > > > > > > {
> > > > > > > >   return baz (c1);
> > > > > > > > }
> > > > > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > > .file "x.c"
> > > > > > > > .text
> > > > > > > > .p2align 4
> > > > > > > > .globl foo
> > > > > > > > .type foo, @function
> > > > > > > > foo:
> > > > > > > > .LFB0:
> > > > > > > > .cfi_startproc
> > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > movl %eax, 4(%esp)
> > > > > > > > jmp baz
> > > > > > > > .cfi_endproc
> > > > > > > > .LFE0:
> > > > > > > > .size foo, .-foo
> > > > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > > > > .section .note.GNU-stack,"",@progbits
> > > > > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > > > > >
> > > > > > > > But integer promotion:
> > > > > > > >
> > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > movl %eax, 4(%esp)
> > > > > > > >
> > > > > > > > isn't necessary if incoming arguments are copied to outgoing 
> > > > > > > > arguments
> > > > > > > > directly.
> > > > > > > >
> > > > > > > > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, 
> > > > > > > > defaulting
> > > > > > > > to return nullptr.  If the new target hook returns non-nullptr, 
> > > > > > > > use it to
> > > > > > > > get the outgoing small integer argument.  The x86 target hook 
> > > > > > > > returns the
> > > > > > > > value of the corresponding incoming argument as int if it can 
> > > > > > > > be used as
> > > > > > > > the outgoing argument.  If callee is a global function, we 
> > > > > > > > always properly
> > > > > > > > extend the incoming small integer arguments in callee.  If 
> > > > > > > > callee is a
> > > > > > > > local function, since DECL_ARG_TYPE has the original small 
> > > > > > > > integer type,
> > > > > > > > we will extend the incoming small integer arguments in callee 
> > > > > > > > if needed.
> > > > > > > > It is safe only if
> > > > > > > >
> > > > > > > > 1. Caller and callee are not nested functions.
> > > > > > > > 2. Caller and callee use the same ABI.
> > > > > > >
> > > > > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > > > > should apply to all of them, no?
> > > > > >
> > > > > > When the arguments are passed in different registers in different 
> > > > > > ABIs,
> > > > > > we have to copy them anyway.
> > > > >
> > > > > But optimization can elide copies easily, but not easily elide
> > > > > sign-/zero-extensions.
> > > >
> > > > What I meant was that caller and callee have different ABIs.
> > > > Optimizer can't elide copies since incoming arguments and outgoing
> > > > arguments are in different registers.  They have to be moved.
> > > >
> > > > > > >
> > > > > > > > 3. The incoming argument and the outgoing argument are in the 
> > > > > > > > same
> > > > > > > > location.
> > > > > > >
> > > > > > > Why's that?  Can't we move them but still elide the 
> > > > > > > sign-/zero-extension?
> > > > > >
> > > > > > If they aren't in the same locations, we have to move them anyway.
> > > > > > This patch tries to avoid necessary moves of incoming arguments to
> > > > > > outgoing arguments.
> > > > >
> > > > > That's not exactly how you presented it, but you convenitently used
> > > > > x86 stack argument passing.  That might be difficult to elide, but is
> > > > > also uncommon for "small integer types" - does the same issue not
> > > > > apply to other arguments passed on the stack as well?
> > > >
> > > > It applies to both passing in registers and on stack.   It is an issue 
> > > > only
> > > > for small integer types due to sign-/zero-extensions at call sites.  My
> > > > patch elides sign-/zero-extensions when incoming arguments and outgoing
> > > > arguments are unchanged in the exactly same location, in register or on 
> > > > stack.
> > >
> > > Is it possible to dissect this from TARGET_PROMOTE_PROTOTYPES then?
> > > That is, this should also work for the case prototypes are not promoted 
> > > and
> > > for modes larger than SImode, even BLKmode.
> > >
> > > Ric

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 2:55 PM Luc Grosheintz 
wrote:

> This implements std::extents from  according to N4950 and
> contains partial progress towards PR107761.
>
> If an extent changes its type, there's a precondition in the standard,
> that the value is representable in the target integer type. This
> precondition is not checked at runtime.
>
> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> For extents this precondition is always violated and results in
> calling __builtin_trap. For all other specializations it's checked via
> __glibcxx_assert.
>
> PR libstdc++/107761
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (extents): New class.
> * src/c++23/std.cc.in: Add 'using std::extents'.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan  | 262 +++
>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>  2 files changed, 267 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 4094a416d1e..39ced1d6301 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -33,6 +33,12 @@
>  #pragma GCC system_header
>  #endif
>
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
>  #define __glibcxx_want_mdspan
>  #include 
>
> @@ -41,6 +47,262 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +  namespace __mdspan
> +  {
> +template
> +  class _ExtentsStorage
> +  {
> +  public:
> +   static consteval bool
> +   _S_is_dyn(size_t __ext) noexcept
> +   { return __ext == dynamic_extent; }
> +
> +   template
> + static constexpr _IndexType
> + _S_int_cast(const _OIndexType& __other) noexcept
> + { return _IndexType(__other); }
> +
> +   static constexpr size_t _S_rank = _Extents.size();
> +
> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> +   // of dynamic extents up to (and not including) __r.
> +   //
> +   // If __r is the index of a dynamic extent, then
> +   // _S_dynamic_index[__r] is the index of that extent in
> +   // _M_dynamic_extents.
> +   static constexpr auto _S_dynamic_index = [] consteval
> +   {
> + array __ret;
> + size_t __dyn = 0;
> + for(size_t __i = 0; __i < _S_rank; ++__i)
> +   {
> + __ret[__i] = __dyn;
> + __dyn += _S_is_dyn(_Extents[__i]);
> +   }
> + __ret[_S_rank] = __dyn;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t _S_rank_dynamic =
> _S_dynamic_index[_S_rank];
> +
> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is
> the
> +   // index of the __r-th dynamic extent in _Extents.
> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> +   {
> + array __ret;
> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> +   if (_S_is_dyn(_Extents[__i]))
> + __ret[__r++] = __i;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t
> +   _S_static_extent(size_t __r) noexcept
> +   { return _Extents[__r]; }
> +
> +   constexpr _IndexType
> +   _M_extent(size_t __r) const noexcept
> +   {
> + auto __se = _Extents[__r];
> + if (__se == dynamic_extent)
> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> + else
> +   return __se;
> +   }
> +
> +   template
> + constexpr void
> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> + {
> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> + {
> +   size_t __di = __i;
> +   if constexpr (_OtherRank != _S_rank_dynamic)
> + __di = _S_dynamic_index_inv[__i];
> +   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
> + }
> + }
> +
> +   constexpr
> +   _ExtentsStorage() noexcept = default;
> +
> +   template
> + constexpr
> + _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
> + __other) noexcept
> + {
> +   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
> + { return __other._M_extent(__i); });
> + }
> +
> +   template
> + constexpr
> + _ExtentsStorage(span __exts) noexcept
> + {
> +   _M_init_dynamic_extents<_Nm>(
> + [&__exts](size_t __i) -> const _OIndexType&
> + { return __exts[__i]; });
> + }
> +
> +  private:
> +   using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> +   [[no_unique_address]] _S_storage _M_dynamic_extents;
> +  };
> +
> +template
> +  concept __valid_index_type =
> +   is_convertible_v<_OIndexType, _SIndexType> &&
> +   is

Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-04-29 Thread Jonathan Wakely

On Tue, 29 Apr 2025 at 13:56, Luc Grosheintz  wrote:
>
> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++/ChangeLog:

N.B. this needs to be libstdc++-v3/Changelog with "-v3", or the git
hooks will reject it. Similarly in patches 6/10 to 10/10.

There's a gcc-verify alias you can use to check the commit messages
against the hooks, see
https://gcc.gnu.org/gitwrite.html#customization#

I can fix these (and the other minor review comments) locally before pushing.


>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 179 
>  1 file changed, 179 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
> index 39ced1d6301..e05048a5b93 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  {
> +   typename _Extents::index_type __fwd = 1;
> +   for(size_t __i = 0; __i < __r; ++__i)
> + __fwd *= __exts.extent(__i);
> +   return __fwd;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  {
> +   typename _Extents::index_type __rev = 1;
> +   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
> + __rev *= __exts.extent(__i);
> +   return __rev;
> +  }
> +
>  template
>auto __build_dextents_type(integer_sequence)
> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
> @@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  explicit extents(_Integrals...) ->
>extents()...>;
>
> +  struct layout_left
> +  {
> +template
> +  class mapping;
> +  };
> +
> +  namespace __mdspan
> +  {
> +template
> +  constexpr bool __is_extents = false;
> +
> +template
> +  constexpr bool __is_extents> = true;
> +
> +template
> +struct _LinearIndexLeft
> +{
> +  template
> +   static constexpr typename _Extents::index_type
> +   _S_value(const _Extents& __exts, typename _Extents::index_type __idx,
> +_Indices... __indices) noexcept
> +   {
> + return __idx + __exts.extent(_Count)
> +   * _LinearIndexLeft<_Count + 1>::_S_value(__exts, __indices...);
> +   }
> +
> +  template
> +   static constexpr typename _Extents::index_type
> +   _S_value(const _Extents&) noexcept
> +   { return 0; }
> +};
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __linear_index_left(const _Extents& __exts, _Indices... __indices)
> +  {
> +   return _LinearIndexLeft<0>::_S_value(__exts, __indices...);
> +  }
> +
> +template
> +  consteval bool
> +  __is_representable_product(array<_Tp, _Nm> __factors)
> +  {
> +   size_t __rest = numeric_limits<_IndexType>::max();
> +   for(size_t __i = 0; __i < _Nm; ++__i)
> +   {
> + if (__factors[__i] == 0)
> +   return true;
> + __rest /= _IndexType(__factors[__i]);
> +   }
> +   return __rest > 0;
> +  }
> +
> +template
> +  consteval array
> +  __static_extents_array()
> +  {
> +   array __exts;
> +   for(size_t __i = 0; __i < _Extents::rank(); ++__i)
> + __exts[__i] = _Extents::static_extent(__i);
> +   return __exts;
> +  }
> +
> +template
> +  concept __representable_size = _Extents::rank_dynamic() != 0
> +  || __is_representable_product<_IndexType>(
> + __static_extents_array<_Extents>());
> +
> +template
> +  concept __layout_extent = __representable_size<
> +   _Extents, typename _Extents::index_type>;
> +  }
> +
> +  template
> +class layout_left::mapping
> +{
> +  static_assert(__mdspan::__layout_extent<_Extents>,
> +   "The size of extents_type is not representable as index_type.");
> +public:
> +  using extents_type = _Extents;
> +  using index_type = typename extents_type::index_type;
> +  using size_type = typename extents_type::size_type;
> +  using rank_type = typename extents_type::rank_type;
> +  using layout_type = layout_left;
> +
> +  constexpr
> +  mapping() noexcept = default;
> +
> +  constexpr
> +  mapping(const mapping&) noexcept = default;
> +
> +  constexpr
> +  mapping(const extents_type& __extents) noexcept
> +  : _M_extents(__extents)
> +  { }
> +
> +  template
> +   requires (is_constructible_v)
> +   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
> +   mapping(const mapping<_OExtents>& __other) noexcept
> +   : _M_extents(__other.extents())
> +   { }
> +
> +

[PATCH v3] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 11:27 AM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 10:08 AM Hongtao Liu  wrote:
> >
> > On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu  wrote:
> > >
> > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu  wrote:
> > > >
> > >
> > > > > > This is what my patch does:
> > > > > But it iterates through vector_insns, using a def-ref chain to find
> > > > > those insns. I think we can just record those single_set with src as
> > > > > const_m1/zero, and replace src for them.
> > > >
> > > > Will fix it.
> > >
> > > Fixed in the v2 patch.
> > >
> > > > > >
> > > > > >  /* Check the single definition of CONST0_RTX and 
> > > > > > integer
> > > > > >  CONSTM1_RTX.  */
> > > > > >   rtx src = SET_SRC (set);
> > > > > >   rtx replace;
> > > > > >   if (vector_const0 && src == CONST0_RTX (mode))
> > > > > > {
> > > > > >   /* Replace REG with VECTOR_CONST0.  */
> > > > > >   if (SUBREG_P (reg) || mode == zero_mode)
> > > > > > replace = vector_const0;
> > > > > >   else
> > > > > > replace = gen_rtx_SUBREG (mode, vector_const0, 
> > > > > > 0);
> > > > > >   *DF_REF_REAL_LOC (ref) = replace;
> > > > > >   replaced = true;
> > > > > >   zero_replaced = true;
> > > > > > }
> > > > > >
> > > > > > It changed the source to a subreg directly.
> > > > > >
> > > > > > > Also we also need to change ix86_modes_tieable_p to make sure 
> > > > > > > those
> > > > > > > inserted subreg can be handled by LRA and other passes?
> > > > > >
> > > > > > ix86_modes_tieable_p is OK:
> > > > > >
> > > > > >  /* If MODE2 is only appropriate for an SSE register, then tie with
> > > > > >  any other mode acceptable to SSE registers.  */
> > > > > >   if (GET_MODE_SIZE (mode2) == 64
> > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > return (GET_MODE_SIZE (mode1) == 64
> > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > >   if (GET_MODE_SIZE (mode2) == 32
> > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > return (GET_MODE_SIZE (mode1) == 32
> > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > >   if (GET_MODE_SIZE (mode2) == 16
> > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > return (GET_MODE_SIZE (mode1) == 16
> > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > >
> > > > > It's ok only size of mode1 is equal to size of mode2.
> > > > > But in the testcase, there are different size vectors(32-bytes, 
> > > > > 16-bytes).
> > > > >
> > > > > So it would be better as, for mode2 >= 16 bytes, it can only be put
> > > > > into SSE_REGS(except for TImode, but TImode still can be tied to
> > > > > <=16bytes mode1 which can be put into SSE_REGS) , if mode1 can also be
> > > > > put into SSE_REGS, then mode2 tie with mode1.
> > > > >
> > > > >/* If MODE2 is only appropriate for an SSE register, then tie with
> > > > >   any other mode acceptable to SSE registers.  */
> > > > > -  if (GET_MODE_SIZE (mode2) == 64
> > > > > -  && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > -return (GET_MODE_SIZE (mode1) == 64
> > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > -  if (GET_MODE_SIZE (mode2) == 32
> > > > > -  && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > -return (GET_MODE_SIZE (mode1) == 32
> > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > -  if (GET_MODE_SIZE (mode2) == 16
> > > > > +  if (GET_MODE_SIZE (mode2) >= 16
> > > > > +  && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2)
> > > > >&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > -return (GET_MODE_SIZE (mode1) == 16
> > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > +return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1);
> > > > >
> > > >
> > > > This caused:
> > > >
> > > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movd
> > > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movq
> > > > FAIL: gcc.target/i386/pr82580.c scan-assembler-not \\mmovzb
> > > >
> > > > since GCC thinks it is cheap to get  QI/HI/SI/DI from TI in XMM.
> > > > I am testing:
> > > >
> > > >  /* If MODE2 is only appropriate for an SSE register, then tie with
> > > >  any other mode acceptable to SSE registers, excluding
> > > > (subreg:QI (reg:TI 99) 0))
> > > > (subreg:HI (reg:TI 99) 0))
> > > > (subreg:SI (reg:TI 99) 0))
> > > > (subreg:DI (reg:TI 99) 0))
> > > >  to avoid unnecessary move from SSE register to integer register.
> > > >*/
> > > >   if (GET_MODE_SIZE (mode2) >= 16
> > > >   && (GET_MODE_SIZE (mode1) == GET_MODE_SIZE (mode2)
>

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 2:51 PM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: H.J. Lu 
> > Sent: Tuesday, April 29, 2025 1:58 PM
> > To: Hongtao Liu 
> > Cc: GCC Patches ; Uros Bizjak
> > ; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Add
> > ix86_expand_unsigned_small_int_cst_argument
> >
> > On Tue, Apr 29, 2025 at 1:54 PM H.J. Lu  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 12:56 PM Hongtao Liu 
> > wrote:
> > > >
> > > > On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu  wrote:
> > > > >
> > > > > When passing 0xff as an unsigned char function argument with the C
> > > > > frontend promotion, expand_normal used to get
> > > > >
> > > > > 
> > > > > constant 255>
> > > > >
> > > > > and returned the rtx value using the sign-extended representation:
> > > > >
> > > > > (const_int 255 [0xff])
> > > > >
> > > > > But after
> > > > >
> > > > > commit a670ebde3995481225ec62b29686ec07a21e5c10
> > > > > Author: H.J. Lu 
> > > > > Date:   Thu Nov 21 07:54:35 2024 +0800
> > > > >
> > > > > Drop targetm.promote_prototypes from C, C++ and Ada frontends
> > > > >
> > > > > expand_normal now gets
> > > > >
> > > > >  > > > > unsigned char > co nstant 255>
> > > > >
> > > > > and returns
> > > > >
> > > > >  (const_int -1 [0x])
> > > > It sounds like a general issue which should be fixed in
> > > > expand_normal instead of fixing it in the backend?
> > >
> > > It is related to trunc_int_for_mode.  I think this is a backend issue.
> I see,  I guess trunc_int_for_mode itself should be ok since middle end will 
> add extra sign_extend/zero_extend when it needs to be used in another mode.
> But trunc_int_for_mode + intrinsic immediate check caused the problem.
>
> I can't think of a better solution, so LGTM.

I am checking it in.

Thanks.

> >
> > Please discuss the issue at
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117547
> >
> > > > >
> > > > > which doesn't work with the predicates nor the instruction
> > > > > templates which expect the unsigned expanded value.  Extract the
> > > > > unsigned char and short integer constants to return
> > > > >
> > > > > (const_int 255 [0xff])
> > > > >
> > > > > so that the expanded value is always unsigned, without the C
> > > > > frontend promotion.
> > > > >
> > > > > PR target/117547
> > > > > * config/i386/i386-expand.cc
> > > > > (ix86_expand_unsigned_small_int_cst_argumen
> > > > > t):
> > > > > New function.
> > > > > (ix86_expand_args_builtin): Call
> > > > > ix86_expand_unsigned_small_int_cst_argument to expand the argument
> > > > > before calling fixup_modeless_constant.
> > > > > (ix86_expand_round_builtin): Likewise.
> > > > > (ix86_expand_special_args_builtin): Likewise.
> > > > > (ix86_expand_builtin): Likewise.
> > > > >
> > > > > --
> > > > > H.J.
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > H.J.
> >
> >
> >
> > --
> > H.J.



-- 
H.J.

RE: [PATCH v3] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-29 Thread Liu, Hongtao



> -Original Message-
> From: H.J. Lu 
> Sent: Tuesday, April 29, 2025 2:59 PM
> To: Hongtao Liu 
> Cc: GCC Patches ; Liu, Hongtao
> ; Uros Bizjak 
> Subject: [PATCH v3] x86: Add a pass to remove redundant all 0s/1s vector
> load
> 
> On Tue, Apr 29, 2025 at 11:27 AM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 10:08 AM Hongtao Liu 
> wrote:
> > >
> > > On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu  wrote:
> > > >
> > > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu  wrote:
> > > > >
> > > >
> > > > > > > This is what my patch does:
> > > > > > But it iterates through vector_insns, using a def-ref chain to
> > > > > > find those insns. I think we can just record those single_set
> > > > > > with src as const_m1/zero, and replace src for them.
> > > > >
> > > > > Will fix it.
> > > >
> > > > Fixed in the v2 patch.
> > > >
> > > > > > >
> > > > > > >  /* Check the single definition of CONST0_RTX and 
> > > > > > > integer
> > > > > > >  CONSTM1_RTX.  */
> > > > > > >   rtx src = SET_SRC (set);
> > > > > > >   rtx replace;
> > > > > > >   if (vector_const0 && src == CONST0_RTX (mode))
> > > > > > > {
> > > > > > >   /* Replace REG with VECTOR_CONST0.  */
> > > > > > >   if (SUBREG_P (reg) || mode == zero_mode)
> > > > > > > replace = vector_const0;
> > > > > > >   else
> > > > > > > replace = gen_rtx_SUBREG (mode, 
> > > > > > > vector_const0, 0);
> > > > > > >   *DF_REF_REAL_LOC (ref) = replace;
> > > > > > >   replaced = true;
> > > > > > >   zero_replaced = true;
> > > > > > > }
> > > > > > >
> > > > > > > It changed the source to a subreg directly.
> > > > > > >
> > > > > > > > Also we also need to change ix86_modes_tieable_p to make
> > > > > > > > sure those inserted subreg can be handled by LRA and other
> passes?
> > > > > > >
> > > > > > > ix86_modes_tieable_p is OK:
> > > > > > >
> > > > > > >  /* If MODE2 is only appropriate for an SSE register, then tie 
> > > > > > > with
> > > > > > >  any other mode acceptable to SSE registers.  */
> > > > > > >   if (GET_MODE_SIZE (mode2) == 64
> > > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > > return (GET_MODE_SIZE (mode1) == 64
> > > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > > >   if (GET_MODE_SIZE (mode2) == 32
> > > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > > return (GET_MODE_SIZE (mode1) == 32
> > > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > > >   if (GET_MODE_SIZE (mode2) == 16
> > > > > > >   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > > return (GET_MODE_SIZE (mode1) == 16
> > > > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG,
> > > > > > > mode1));
> > > > > > >
> > > > > > It's ok only size of mode1 is equal to size of mode2.
> > > > > > But in the testcase, there are different size vectors(32-bytes, 16-
> bytes).
> > > > > >
> > > > > > So it would be better as, for mode2 >= 16 bytes, it can only
> > > > > > be put into SSE_REGS(except for TImode, but TImode still can
> > > > > > be tied to <=16bytes mode1 which can be put into SSE_REGS) ,
> > > > > > if mode1 can also be put into SSE_REGS, then mode2 tie with mode1.
> > > > > >
> > > > > >/* If MODE2 is only appropriate for an SSE register, then tie 
> > > > > > with
> > > > > >   any other mode acceptable to SSE registers.  */
> > > > > > -  if (GET_MODE_SIZE (mode2) == 64
> > > > > > -  && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > -return (GET_MODE_SIZE (mode1) == 64
> > > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > > -  if (GET_MODE_SIZE (mode2) == 32
> > > > > > -  && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > -return (GET_MODE_SIZE (mode1) == 32
> > > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > > -  if (GET_MODE_SIZE (mode2) == 16
> > > > > > +  if (GET_MODE_SIZE (mode2) >= 16
> > > > > > +  && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2)
> > > > > >&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > > -return (GET_MODE_SIZE (mode1) == 16
> > > > > > -   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > > +return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1);
> > > > > >
> > > > >
> > > > > This caused:
> > > > >
> > > > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movd
> > > > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movq
> > > > > FAIL: gcc.target/i386/pr82580.c scan-assembler-not \\mmovzb
> > > > >
> > > > > since GCC thinks it is cheap to get  QI/HI/SI/DI from TI in XMM.
> > > > > I am testing:
> > > > >
> > > > >  /* If MODE2 is only appropriate for an SSE regist

[PATCH] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-29 Thread Jerry Zhang Jian

The Zve32x extension depends on the Zicsr extension.
Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr

gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-19.c: set the march to rv64i_zve32x
  instead of rv64gc_zve32x to avoid Zicsr implied by g

Signed-off-by: Jerry Zhang Jian 
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-19.c | 22 +++---
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 15df22d5377..145a0f2bd95 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zve64f", "f"},
   {"zve64d", "d"},
 
+  {"zve32x", "zicsr"},
   {"zve32x", "zvl32b"},
   {"zve32f", "zve32x"},
   {"zve32f", "zvl32b"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index 2b90702192b..b29e60f9b99 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
+/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
 
 int main () {
 
@@ -15,28 +15,12 @@ int main () {
 #error "__riscv_i"
 #endif
 
-#if !defined(__riscv_c)
-#error "__riscv_c"
-#endif
-
 #if defined(__riscv_e)
 #error "__riscv_e"
 #endif
 
-#if !defined(__riscv_a)
-#error "__riscv_a"
-#endif
-
-#if !defined(__riscv_m)
-#error "__riscv_m"
-#endif
-
-#if !defined(__riscv_f)
-#error "__riscv_f"
-#endif
-
-#if !defined(__riscv_d)
-#error "__riscv_d"
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
 #endif
 
 #if defined(__riscv_v)
-- 
2.49.0

Re: [PATCH] libstdc++: Use constexpr-if in std::function for C++11 and C++14

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 11:01 AM Jonathan Wakely  wrote:

> This allows removing the _Target_handler class template, because it's no
> longer needed to prevent instantiating invalid specializations of
> _Function_handler.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/std_function.h (_Target_handler): Remove.
> (function::target): Use constexpr-if for C++11 and
> C++14, with diagnostic pragmas to suppress warnings.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/bits/std_function.h | 31 +++-
>  1 file changed, 9 insertions(+), 22 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/std_function.h
> b/libstdc++-v3/include/bits/std_function.h
> index 1bf8b9ad6e8..3bfbe824026 100644
> --- a/libstdc++-v3/include/bits/std_function.h
> +++ b/libstdc++-v3/include/bits/std_function.h
> @@ -135,13 +135,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> static _Functor*
> _M_get_pointer(const _Any_data& __source) noexcept
> {
> - if _GLIBCXX17_CONSTEXPR (__stored_locally)
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
> + if constexpr (__stored_locally)
> {
>   const _Functor& __f = __source._M_access<_Functor>();
>   return const_cast<_Functor*>(std::__addressof(__f));
> }
>   else // have stored a pointer
> return __source._M_access<_Functor*>();
> +#pragma GCC diagnostic pop
> }
>
>private:
> @@ -312,21 +315,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return false; }
>  };
>
> -  // Avoids instantiating ill-formed specializations of _Function_handler
> -  // in std::function<_Signature>::target<_Functor>().
> -  // e.g. _Function_handler and _Function_handler
> -  // would be ill-formed.
> -  template -  bool __valid = is_object<_Functor>::value>
> -struct _Target_handler
> -: _Function_handler<_Signature, typename remove_cv<_Functor>::type>
> -{ };
> -
> -  template
> -struct _Target_handler<_Signature, _Functor, false>
> -: _Function_handler
> -{ };
> -
>/**
> *  @brief Polymorphic function wrapper.
> *  @ingroup functors
> @@ -644,13 +632,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> const _Functor*
> target() const noexcept
> {
> - if _GLIBCXX17_CONSTEXPR (is_object<_Functor>::value)
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
> + if constexpr (is_object<_Functor>::value)
> {
> - // For C++11 and C++14 if-constexpr is not used above, so
> - // _Target_handler avoids ill-formed _Function_handler types.
> - using _Handler = _Target_handler<_Res(_ArgTypes...),
> _Functor>;
> -
> - if (_M_manager == &_Handler::_M_manager
> + if (_M_manager == &_Handler<_Functor>::_M_manager
>  #if __cpp_rtti
>   || (_M_manager && typeid(_Functor) == target_type())
>  #endif
> @@ -661,6 +647,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   return __ptr._M_access();
> }
> }
> +#pragma GCC diagnostic pop
>   return nullptr;
> }
>/// @}
> --
> 2.49.0
>
>

Re: [PATCH] libstdc++: Fix allocator propagation for rvalue+rvalue string concatenation

2025-04-29 Thread Tomasz Kaminski

On Tue, Apr 29, 2025 at 10:49 AM Jonathan Wakely  wrote:

> I made a last-minute change to Nina's r10-200-gf4e678ef74b272
> implementation of P1165R1 (consistent allocator propagation for
> operator+ on strings), so that the rvalue+rvalue case assumes that COW
> strings do not support stateful allocators. I don't think that was true
> when the change went in, and certainly isn't true now. COW strings don't
> support allocator propagation on assignment and swap, but they do
> support non-equal stateful allocators, which are correctly propagated on
> move construction.
>
> This removes the preprocessor conditional in the rvalue+rvalue overload
> so that COW strings are handled equivalently. Also use constexpr-if
> unconditionally, disabling diagnostics with pragmas.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h (operator+(string&&, string&&)):
> Do not assume that COW strings have equal allocators. Use
> constexpr-if unconditionally.
> *
> testsuite/21_strings/basic_string/allocator/char/operator_plus.cc:
> Remove { dg-require-effective-target cxx11_abi }.
> *
> testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc:
> Likewise.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/bits/basic_string.h   | 10 ++
>  .../basic_string/allocator/char/operator_plus.cc   |  2 --
>  .../basic_string/allocator/wchar_t/operator_plus.cc|  2 --
>  3 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h
> b/libstdc++-v3/include/bits/basic_string.h
> index c90bd099b63..a087e637805 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -3938,21 +3938,23 @@ _GLIBCXX_END_NAMESPACE_CXX11
>  operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
>   basic_string<_CharT, _Traits, _Alloc>&& __rhs)
>  {
> -#if _GLIBCXX_USE_CXX11_ABI
> -  using _Alloc_traits = allocator_traits<_Alloc>;
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
> +  // Return value must use __lhs.get_allocator(), but if __rhs has
> equal
> +  // allocator then we can choose which parameter to modify in-place.
>bool __use_rhs = false;
> -  if _GLIBCXX17_CONSTEXPR (typename _Alloc_traits::is_always_equal{})
> +  if constexpr (allocator_traits<_Alloc>::is_always_equal::value)
> __use_rhs = true;
>else if (__lhs.get_allocator() == __rhs.get_allocator())
> __use_rhs = true;
>if (__use_rhs)
> -#endif
> {
>   const auto __size = __lhs.size() + __rhs.size();
>   if (__size > __lhs.capacity() && __size <= __rhs.capacity())
> return std::move(__rhs.insert(0, __lhs));
> }
>return std::move(__lhs.append(__rhs));
> +#pragma GCC diagnostic pop
>  }
>
>template
> diff --git
> a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
> b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
> index 571f8535176..92e05690b19 100644
> ---
> a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
> +++
> b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
> @@ -17,8 +17,6 @@
>  // .
>
>  // { dg-do run { target c++11 } }
> -// COW strings don't support C++11 allocator propagation:
> -// { dg-require-effective-target cxx11_abi }
>
>  #include 
>  #include 
> diff --git
> a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
> b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
> index 0da684360ab..b75b26ae85c 100644
> ---
> a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
> +++
> b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
> @@ -17,8 +17,6 @@
>  // .
>
>  // { dg-do run { target c++11 } }
> -// COW strings don't support C++11 allocator propagation:
> -// { dg-require-effective-target cxx11_abi }
>
>  #include 
>  #include 
> --
> 2.49.0
>
>

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-04-29 Thread Richard Biener

On Tue, 29 Apr 2025, Richard Sandiford wrote:

> Pengfei Li  writes:
> > This patch implements the folding of a vector addition followed by a
> > logical shift right by 1 (add + lsr #1) on AArch64 into an unsigned
> > halving add, allowing GCC to emit NEON or SVE2 UHADD instructions.
> >
> > For example, this patch helps improve the codegen from:
> > add v0.4s, v0.4s, v31.4s
> > ushrv0.4s, v0.4s, 1
> > to:
> > uhadd   v0.4s, v0.4s, v31.4s
> >
> > For NEON, vector operations are represented using generic mid-end
> > operations, so new folding rules are added to match.pd. For SVE2, the
> > operations are represented using built-in GIMPLE calls, so this
> > optimization is implemented via gimple_folder.
> >
> > To ensure correctness, additional checks are introduced to guargntee
> > that the operands to UHADD are vectors in which each element has its top
> > bit cleared.
> >
> > This patch has been bootstrapped and regression tested on
> > x86_64-linux-gnu and aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve-builtins-base.cc (find_sve_builtin_call):
> > New helper function for finding and checking a GIMPLE call.
> > (is_undef): Rewrite with find_sve_builtin_call.
> > (class svlsr_impl): Implement the folding for SVE2.
> > (FUNCTION): Check and fold the pattern.
> > * match.pd: Add new rules to implement the folding for NEON.
> > * tree.cc (top_bit_zero_vector_p): Add a new utility function for
> > vector top bit zero check.
> > * tree.h (top_bit_zero_vector_p): Add a function declaration.
> 
> The target-independent changes are out of my comfort area.
> Cc:ing Richi for those.
> 
> But rather than top_bit_zero_vector_p, how about a more general
> nonzero_element_bits?  I've wanted something similar in the past.

IMO a general nonzero_element_bits, either as a zero bits mask
of vector width or ANDed/IORed across elements should be
provided by {set/get}_nonzero_bits/ranger being extended to
cover [integer] vectors.

> I don't think we can use an unbounded recursive walk, since that
> would become quadratic if we ever used it when optimising one
> AND in a chain of ANDs.  (And using this function for ANDs
> seems plausible.)  Maybe we should be handling the information
> in a similar way to Ranger.

Indeed, the recursion isn't good.  I'd be fine adding a non-recursive

(match top_bit_zero_vector_p ...)

> Rather than handle the built-in case entirely in target code, how about
> having a target hook into nonzero_element_bits (or whatever replaces it)
> for machine-dependent builtins?

I guess that's reasonable once we can make use of it, we should
have a generic function handling gimple *, like we have
gimple_stmt_nonnegative_warnv_p / gimple_stmt_integer_valued_real_p.
Those also provide a recipie to limit recursion in case you really
need that for the case in question.

But for nonzero bits wiring this into ranger looks better.  The
semantic of a common range/mask for all elements looks "easy"
to implement, since you can re-use irange/frange then and not
need a new "vector range".

Richard.

> 
> Thanks,
> Richard
> 
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/acle/uhadd_1.c: New test.
> > * gcc.target/aarch64/sve2/acle/general/uhadd_1.c: New test.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  | 101 --
> >  gcc/match.pd  |   7 ++
> >  .../gcc.target/aarch64/acle/uhadd_1.c |  34 ++
> >  .../aarch64/sve2/acle/general/uhadd_1.c   |  30 ++
> >  gcc/tree.cc   |  30 ++
> >  gcc/tree.h|   4 +
> >  6 files changed, 199 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/sve2/acle/general/uhadd_1.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index b4396837c24..ce6da82bf81 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -43,6 +43,7 @@
> >  #include "aarch64-sve-builtins.h"
> >  #include "aarch64-sve-builtins-shapes.h"
> >  #include "aarch64-sve-builtins-base.h"
> > +#include "aarch64-sve-builtins-sve2.h"
> >  #include "aarch64-sve-builtins-functions.h"
> >  #include "aarch64-builtins.h"
> >  #include "ssa.h"
> > @@ -53,6 +54,23 @@ using namespace aarch64_sve;
> >  
> >  namespace {
> >  
> > +/* Return gcall* if VAL is an SSA_NAME defined by the given SVE intrinsics 
> > call.
> > +   Otherwise return NULL.  */
> > +static gcall*
> > +find_sve_builtin_call (tree val, const function_base *func)
> > +{
> > +  if (TREE_CODE (val) == SSA_NAME)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (val);
> > +  if (gcall *call = dyn_cast (def))
> > +   if (tree fndecl = gimple_call_fndecl

[PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-04-29 Thread H.J. Lu

Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy
propagation pass can eliminate multiple __tls_get_addr calls.

gcc/

PR target/81501
* config/i386/i386-protos.h (ix86_split_tls_local_dynamic_base_64):
New.
* config/i386/i386.cc (ix86_split_tls_local_dynamic_base_64): New.
(legitimize_tls_address): Don't emit the 64-bit UNSPEC_TLS_LD_BASE
as a call.
* config/i386/i386.md (*tls_local_dynamic_base_64_): Renamed
to ...
(@tls_local_dynamic_base_call_64_): This.  Replace
(match_operand 2) with (const_int 0).
(@tls_local_dynamic_base_64_): Change call to unspec.
(*tls_local_dynamic_base_64_): New.

gcc/testsuite/

PR target/81501
* gcc.target/i386/pr81501-1.c: New test.

OK for master?

Thanks.

-- 
H.J.
From d154b3bf2fb86c82a6291f1fae45fbbe0d74f4e4 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 19 Aug 2022 11:50:41 -0700
Subject: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy
propagation pass can eliminate multiple __tls_get_addr calls.

gcc/

	PR target/81501
	* config/i386/i386-protos.h (ix86_split_tls_local_dynamic_base_64):
	New.
	* config/i386/i386.cc (ix86_split_tls_local_dynamic_base_64): New.
	(legitimize_tls_address): Don't emit the 64-bit UNSPEC_TLS_LD_BASE
	as a call.
	* config/i386/i386.md (*tls_local_dynamic_base_64_): Renamed
	to ...
	(@tls_local_dynamic_base_call_64_): This.  Replace
	(match_operand 2) with (const_int 0).
	(@tls_local_dynamic_base_64_): Change call to unspec.
	(*tls_local_dynamic_base_64_): New.

gcc/testsuite/

	PR target/81501
	* gcc.target/i386/pr81501-1.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.cc   | 42 +--
 gcc/config/i386/i386.md   | 30 +++-
 gcc/testsuite/gcc.target/i386/pr81501-1.c | 17 +
 4 files changed, 63 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-1.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index c59b5a67e3a..a8850cd7311 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -287,6 +287,7 @@ extern rtx ix86_tls_module_base (void);
 extern bool ix86_gpr_tls_address_pattern_p (rtx);
 extern bool ix86_tls_address_pattern_p (rtx);
 extern rtx ix86_rewrite_tls_address (rtx);
+extern void ix86_split_tls_local_dynamic_base_64 (rtx[]);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ddefc0f88d9..9ad11b122de 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -12330,6 +12330,27 @@ ix86_tls_module_base (void)
   return ix86_tls_module_base_symbol;
 }
 
+/* Split to CALL INSN to properly handle scratch registers.  */
+
+void
+ix86_split_tls_local_dynamic_base_64 (rtx operands[])
+{
+  rtx rax = gen_rtx_REG (Pmode, AX_REG);
+  rtx_insn *call_insn
+  = emit_call_insn (gen_tls_local_dynamic_base_call_64 (Pmode, rax,
+			operands[1]));
+  /* Indicate that this function can't jump to non-local gotos.  */
+  make_reg_eh_region_note_nothrow_nononlocal (call_insn);
+  RTL_CONST_CALL_P (call_insn) = 1;
+
+  /* Attach a unique REG_EQUAL, to allow the RTL optimizers to share
+ the LD_BASE result with other LD model accesses.  */
+  rtx eqv = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx),
+			UNSPEC_TLS_LD_BASE);
+  rtx_insn *set_insn = emit_move_insn (operands[0], rax);
+  set_unique_reg_note (set_insn, REG_EQUAL, eqv);
+}
+
 /* A subroutine of ix86_legitimize_address and ix86_expand_move.  FOR_MOV is
false if we expect this to be used for a memory address and true if
we expect to load the address into a register.  */
@@ -12442,25 +12463,8 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 
 	  base = gen_reg_rtx (Pmode);
 	  if (TARGET_64BIT)
-	{
-	  rtx rax = gen_rtx_REG (Pmode, AX_REG);
-	  rtx_insn *insns;
-	  rtx eqv;
-
-	  start_sequence ();
-	  emit_call_insn
-		(gen_tls_local_dynamic_base_64 (Pmode, rax, caddr));
-	  insns = get_insns ();
-	  end_sequence ();
-
-	  /* Attach a unique REG_EQUAL, to allow the RTL optimizers to
-		 share the LD_BASE result with other LD model accesses.  */
-	  eqv = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx),
-UNSPEC_TLS_LD_BASE);
-
-	  RTL_CONST_CALL_P (insns) = 1;
-	  emit_libcall_block (insns, base, rax, eqv);
-	}
+	emit_insn (gen_tls_local_dynamic_base_64 (Pmode, base,
+		  caddr));
 	  else
 	emit_insn (gen_tls_local_dynamic_base_32 (base, pic, caddr));
 	}
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e170da3b0e6..cf54a58ef96 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23102,11 +23102,11 @@ (define_expand "tls_local_dynamic_base_32"
   ""
   "ix86_tls_descriptor_calls

[PATCH] c++/modules: Catch exposures of TU-local values through inline references [PR119996]

2025-04-29 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu (so far just
modules.exp), OK for trunk and 15 if full regtest succeeds?

-- >8 --

In r15-9136-g0210bedf481a9f we started erroring for inline variables
that exposed TU-local entities in their definition, as such variables
would need to have their definitions emitted in importers but would not
know about the TU-local entities they referenced.

A case we mised was potentially-constant references, which disable
streaming of their definitions in make_dependency so as to comply with
[expr.const] p9.2.  This meant that we didn't see the definition
referencing a TU-local entity, leading to nonsensical results.

PR c++/119551
PR c++/119996

gcc/cp/ChangeLog:

* module.cc (depset::hash::make_dependency): Also mark inline
variables referencing TU-local values as exposures here.
(depset::hash::finalize_dependencies): Add error message for
inline variables.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-13.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 27 +-
 gcc/testsuite/g++.dg/modules/internal-13.C | 33 ++
 2 files changed, 53 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-13.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index a2e0d6d2571..7e3b24e2e42 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14062,9 +14062,10 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
 streaming the definition in such cases.  */
  dep->clear_flag_bit ();
 
- if (DECL_DECLARED_CONSTEXPR_P (decl))
-   /* Also, a constexpr variable initialized to a TU-local
-  value is an exposure.  */
+ if (DECL_DECLARED_CONSTEXPR_P (decl)
+ || DECL_INLINE_VAR_P (decl))
+   /* A constexpr variable initialized to a TU-local value,
+  or an inline value (PR c++/119996), is an exposure.  */
dep->set_flag_bit ();
}
}
@@ -15025,12 +15026,24 @@ depset::hash::finalize_dependencies ()
break;
  }
 
- if (!explained && VAR_P (decl) && DECL_DECLARED_CONSTEXPR_P (decl))
+ if (!explained
+ && VAR_P (decl)
+ && (DECL_DECLARED_CONSTEXPR_P (decl)
+ || DECL_INLINE_VAR_P (decl)))
{
  auto_diagnostic_group d;
- error_at (DECL_SOURCE_LOCATION (decl),
-   "%qD is declared % and is initialized to "
-   "a TU-local value", decl);
+ if (DECL_DECLARED_CONSTEXPR_P (decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD is declared % and is initialized to "
+ "a TU-local value", decl);
+ else
+   {
+ /* This can only occur with references.  */
+ gcc_checking_assert (TYPE_REF_P (TREE_TYPE (decl)));
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qD is a reference declared % and is "
+   "constant-initialized to a TU-local value", decl);
+   }
  bool informed = is_tu_local_value (decl, DECL_INITIAL (decl),
 /*explain=*/true);
  gcc_checking_assert (informed);
diff --git a/gcc/testsuite/g++.dg/modules/internal-13.C 
b/gcc/testsuite/g++.dg/modules/internal-13.C
new file mode 100644
index 000..ce1454e17bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-13.C
@@ -0,0 +1,33 @@
+// PR c++/119996
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi !M }
+// Similar to internal-11.C, but for potentially-constant variables.
+
+export module M;
+
+static int tu_local = 5;
+static int& foo() { return tu_local; }
+
+// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored
+// exposures to non-inline variables, since for inline variables without
+// dynamic initialisation we need to emit their initialiser for importer use.
+
+int& a = tu_local;  // OK
+inline int& b = tu_local;  // { dg-error "initialized to a TU-local value" }
+inline auto& bf = foo;  // { dg-error "initialized to a TU-local value" }
+
+// But dynamic initialisers are fine, importers will just treat them as 
external.
+inline int& c = foo();  // OK
+
+// For consistency, we follow the same rules with templates, noting that
+// we still need to emit definitions with dynamic initializers so we error.
+template  int& d = tu_local;  // OK
+template  inline int& e = tu_local;  // { dg-error "exposes 
TU-local entity" }
+template  inline int& f = foo();  // { dg-error "exposes TU-local 
entity" }
+template  inline auto& ff = foo;  // { dg-error "exposes TU-local 
entity" }
+
+// Note that no

Re: [PATCH] target.def: Remove TARGET_PROMOTE_FUNCTION_RETURN reference

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 3:49 AM H.J. Lu  wrote:
>
> Since TARGET_PROMOTE_FUNCTION_RETURN is no longer used, remove its
> reference from target.def.
>
> PR target/119985
> * target.def: Remove TARGET_PROMOTE_FUNCTION_RETURN reference.
> * doc/tm.texi: Regenerated.
>
> OK for master?

OK.

Richard.

> Thanks.
>
> --
> H.J.

[PATCH V3] [autofdo] Annotate empty bb with all debug_stmt with location of phi in the single_succ.

2025-04-29 Thread liuhongt

From: "hongtao.liu" 

> another thing, you can save the walk over PHI args by using
>
> gimple_phi_arg_location (phi, tmp_e->dest_idx);
>
Changed, use gimple_phi_arg_location_from_edge (phi, tmp_e);


For an empty BB with all debug_stmt, it will be ignored by
afdo_set_bb_count, but it can be set with count of single successors
PHIs which edge from the BB.

gcc/ChangeLog:

PR gcov-profile/118581
* auto-profile.cc (autofdo_source_profile::get_count_info):
Overload the function with parameter gimple location instead
of stmt.
(afdo_set_bb_count): For !has_annotated BB, Check single
successors PHIs corresponding to the block and use those
count.
---
 gcc/auto-profile.cc | 46 ++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 2d2d4a428f2..3ce2acbe86d 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -303,6 +303,10 @@ public:
  in INFO and return true; otherwise return false.  */
   bool get_count_info (gimple *stmt, count_info *info) const;
 
+  /* Find count_info for a given gimple location GIMPLE_LOC. If found,
+ store the count_info in INFO and return true; otherwise return false.  */
+  bool get_count_info (location_t gimple_loc, count_info *info) const;
+
   /* Find total count of the callee of EDGE.  */
   gcov_type get_callsite_total_count (struct cgraph_edge *edge) const;
 
@@ -724,11 +728,18 @@ autofdo_source_profile::get_function_instance_by_decl 
(tree decl) const
 bool
 autofdo_source_profile::get_count_info (gimple *stmt, count_info *info) const
 {
-  if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus)
+  return get_count_info (gimple_location (stmt), info);
+}
+
+bool
+autofdo_source_profile::get_count_info (location_t gimple_loc,
+   count_info *info) const
+{
+  if (LOCATION_LOCUS (gimple_loc) == cfun->function_end_locus)
 return false;
 
   inline_stack stack;
-  get_inline_stack (gimple_location (stmt), &stack);
+  get_inline_stack (gimple_loc, &stack);
   if (stack.length () == 0)
 return false;
   function_instance *s = get_function_instance_by_inline_stack (stack);
@@ -1132,7 +1143,36 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
&promoted)
 }
 
   if (!has_annotated)
-return false;
+{
+  /* For an empty BB with all debug stmt which assigne a value with
+constant, check successors PHIs corresponding to the block and
+use those counts.  */
+  edge tmp_e;
+  edge_iterator tmp_ei;
+  FOR_EACH_EDGE (tmp_e, tmp_ei, bb->succs)
+   {
+ basic_block bb_succ = tmp_e->dest;
+ for (gphi_iterator gpi = gsi_start_phis (bb_succ);
+  !gsi_end_p (gpi);
+  gsi_next (&gpi))
+   {
+ gphi *phi = gpi.phi ();
+ location_t phi_loc
+   = gimple_phi_arg_location_from_edge (phi, tmp_e);
+ count_info info;
+ if (afdo_source_profile->get_count_info (phi_loc, &info)
+ && info.count != 0)
+   {
+ if (info.count > max_count)
+   max_count = info.count;
+ has_annotated = true;
+   }
+   }
+   }
+
+  if (!has_annotated)
+   return false;
+}
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 afdo_source_profile->mark_annotated (gimple_location (gsi_stmt (gsi)));
-- 
2.34.1

[PATCH][15.2] nr2.0: late: Correctly initialize funny_error member

2025-04-29 Thread arthur . cohen

From: Arthur Cohen 

Hi everyone,

We noticed inconsistent errors when running name-resolution 2.0 on
certain files, where an invalid error was triggered and the message was
from the `funny_ice` error finalizer function we had added as an easter
egg. We realized yesterday that the undefined value was actually our
`funny_error` boolean, which is supposed to be set only when resolving
specific easter eggs `AST::IdentifierExpr`s.

Since `funny_error` is a boolean, it does not get default-initialized in
the constructor of `Late` - which this patch corrects.

I will be pushing it to trunk directly, but this email specifically
concerns its port into 15.2.

Thanks a lot to Marc PoulhiÃ¨s and Owen Avery for their help in
discovering and fixing this.

Best,

Arthur

gcc/rust/ChangeLog:

* resolve/rust-late-name-resolver-2.0.cc (Late::Late): False initialize 
the
funny_error field.
---
 gcc/rust/resolve/rust-late-name-resolver-2.0.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/rust/resolve/rust-late-name-resolver-2.0.cc 
b/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
index 5f215db0a72..f46f9e7dd3e 100644
--- a/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
+++ b/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
@@ -33,7 +33,9 @@
 namespace Rust {
 namespace Resolver2_0 {
 
-Late::Late (NameResolutionContext &ctx) : DefaultResolver (ctx) {}
+Late::Late (NameResolutionContext &ctx)
+  : DefaultResolver (ctx), funny_error (false)
+{}
 
 static NodeId
 next_node_id ()
-- 
2.49.0

Re: [PATCH V3] [autofdo] Annotate empty bb with all debug_stmt with location of phi in the single_succ.

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 10:21 AM liuhongt  wrote:
>
> From: "hongtao.liu" 
>
> > another thing, you can save the walk over PHI args by using
> >
> > gimple_phi_arg_location (phi, tmp_e->dest_idx);
> >
> Changed, use gimple_phi_arg_location_from_edge (phi, tmp_e);
>
> 
> For an empty BB with all debug_stmt, it will be ignored by
> afdo_set_bb_count, but it can be set with count of single successors
> PHIs which edge from the BB.

LGTM.

Richard.

> gcc/ChangeLog:
>
> PR gcov-profile/118581
> * auto-profile.cc (autofdo_source_profile::get_count_info):
> Overload the function with parameter gimple location instead
> of stmt.
> (afdo_set_bb_count): For !has_annotated BB, Check single
> successors PHIs corresponding to the block and use those
> count.
> ---
>  gcc/auto-profile.cc | 46 ++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index 2d2d4a428f2..3ce2acbe86d 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -303,6 +303,10 @@ public:
>   in INFO and return true; otherwise return false.  */
>bool get_count_info (gimple *stmt, count_info *info) const;
>
> +  /* Find count_info for a given gimple location GIMPLE_LOC. If found,
> + store the count_info in INFO and return true; otherwise return false.  
> */
> +  bool get_count_info (location_t gimple_loc, count_info *info) const;
> +
>/* Find total count of the callee of EDGE.  */
>gcov_type get_callsite_total_count (struct cgraph_edge *edge) const;
>
> @@ -724,11 +728,18 @@ autofdo_source_profile::get_function_instance_by_decl 
> (tree decl) const
>  bool
>  autofdo_source_profile::get_count_info (gimple *stmt, count_info *info) const
>  {
> -  if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus)
> +  return get_count_info (gimple_location (stmt), info);
> +}
> +
> +bool
> +autofdo_source_profile::get_count_info (location_t gimple_loc,
> +   count_info *info) const
> +{
> +  if (LOCATION_LOCUS (gimple_loc) == cfun->function_end_locus)
>  return false;
>
>inline_stack stack;
> -  get_inline_stack (gimple_location (stmt), &stack);
> +  get_inline_stack (gimple_loc, &stack);
>if (stack.length () == 0)
>  return false;
>function_instance *s = get_function_instance_by_inline_stack (stack);
> @@ -1132,7 +1143,36 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
> &promoted)
>  }
>
>if (!has_annotated)
> -return false;
> +{
> +  /* For an empty BB with all debug stmt which assigne a value with
> +constant, check successors PHIs corresponding to the block and
> +use those counts.  */
> +  edge tmp_e;
> +  edge_iterator tmp_ei;
> +  FOR_EACH_EDGE (tmp_e, tmp_ei, bb->succs)
> +   {
> + basic_block bb_succ = tmp_e->dest;
> + for (gphi_iterator gpi = gsi_start_phis (bb_succ);
> +  !gsi_end_p (gpi);
> +  gsi_next (&gpi))
> +   {
> + gphi *phi = gpi.phi ();
> + location_t phi_loc
> +   = gimple_phi_arg_location_from_edge (phi, tmp_e);
> + count_info info;
> + if (afdo_source_profile->get_count_info (phi_loc, &info)
> + && info.count != 0)
> +   {
> + if (info.count > max_count)
> +   max_count = info.count;
> + has_annotated = true;
> +   }
> +   }
> +   }
> +
> +  if (!has_annotated)
> +   return false;
> +}
>
>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>  afdo_source_profile->mark_annotated (gimple_location (gsi_stmt (gsi)));
> --
> 2.34.1
>

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-04-29 Thread Richard Sandiford

Pengfei Li  writes:
> This patch implements the folding of a vector addition followed by a
> logical shift right by 1 (add + lsr #1) on AArch64 into an unsigned
> halving add, allowing GCC to emit NEON or SVE2 UHADD instructions.
>
> For example, this patch helps improve the codegen from:
>   add v0.4s, v0.4s, v31.4s
>   ushrv0.4s, v0.4s, 1
> to:
>   uhadd   v0.4s, v0.4s, v31.4s
>
> For NEON, vector operations are represented using generic mid-end
> operations, so new folding rules are added to match.pd. For SVE2, the
> operations are represented using built-in GIMPLE calls, so this
> optimization is implemented via gimple_folder.
>
> To ensure correctness, additional checks are introduced to guargntee
> that the operands to UHADD are vectors in which each element has its top
> bit cleared.
>
> This patch has been bootstrapped and regression tested on
> x86_64-linux-gnu and aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve-builtins-base.cc (find_sve_builtin_call):
>   New helper function for finding and checking a GIMPLE call.
>   (is_undef): Rewrite with find_sve_builtin_call.
>   (class svlsr_impl): Implement the folding for SVE2.
>   (FUNCTION): Check and fold the pattern.
>   * match.pd: Add new rules to implement the folding for NEON.
>   * tree.cc (top_bit_zero_vector_p): Add a new utility function for
>   vector top bit zero check.
>   * tree.h (top_bit_zero_vector_p): Add a function declaration.

The target-independent changes are out of my comfort area.
Cc:ing Richi for those.

But rather than top_bit_zero_vector_p, how about a more general
nonzero_element_bits?  I've wanted something similar in the past.

I don't think we can use an unbounded recursive walk, since that
would become quadratic if we ever used it when optimising one
AND in a chain of ANDs.  (And using this function for ANDs
seems plausible.)  Maybe we should be handling the information
in a similar way to Ranger.

Rather than handle the built-in case entirely in target code, how about
having a target hook into nonzero_element_bits (or whatever replaces it)
for machine-dependent builtins?

Thanks,
Richard

>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/uhadd_1.c: New test.
>   * gcc.target/aarch64/sve2/acle/general/uhadd_1.c: New test.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 101 --
>  gcc/match.pd  |   7 ++
>  .../gcc.target/aarch64/acle/uhadd_1.c |  34 ++
>  .../aarch64/sve2/acle/general/uhadd_1.c   |  30 ++
>  gcc/tree.cc   |  30 ++
>  gcc/tree.h|   4 +
>  6 files changed, 199 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/general/uhadd_1.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index b4396837c24..ce6da82bf81 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -43,6 +43,7 @@
>  #include "aarch64-sve-builtins.h"
>  #include "aarch64-sve-builtins-shapes.h"
>  #include "aarch64-sve-builtins-base.h"
> +#include "aarch64-sve-builtins-sve2.h"
>  #include "aarch64-sve-builtins-functions.h"
>  #include "aarch64-builtins.h"
>  #include "ssa.h"
> @@ -53,6 +54,23 @@ using namespace aarch64_sve;
>  
>  namespace {
>  
> +/* Return gcall* if VAL is an SSA_NAME defined by the given SVE intrinsics 
> call.
> +   Otherwise return NULL.  */
> +static gcall*
> +find_sve_builtin_call (tree val, const function_base *func)
> +{
> +  if (TREE_CODE (val) == SSA_NAME)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (val);
> +  if (gcall *call = dyn_cast (def))
> + if (tree fndecl = gimple_call_fndecl (call))
> +   if (const function_instance *instance = lookup_fndecl (fndecl))
> + if (instance->base == func)
> +   return call;
> +}
> +  return NULL;
> +}
> +
>  /* Return true if VAL is an undefined value.  */
>  static bool
>  is_undef (tree val)
> @@ -62,12 +80,7 @@ is_undef (tree val)
>if (ssa_undefined_value_p (val, false))
>   return true;
>  
> -  gimple *def = SSA_NAME_DEF_STMT (val);
> -  if (gcall *call = dyn_cast (def))
> - if (tree fndecl = gimple_call_fndecl (call))
> -   if (const function_instance *instance = lookup_fndecl (fndecl))
> - if (instance->base == functions::svundef)
> -   return true;
> +  return (find_sve_builtin_call (val, functions::svundef) != NULL);
>  }
>return false;
>  }
> @@ -2088,6 +2101,80 @@ public:
>}
>  };
>  
> +class svlsr_impl : public rtx_code_function
> +{
> +private:
> +  /* Return true if we know active lanes for use in T have top bit zero, 
> where
> + pg_use tells which lanes are active for

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread Richard Biener

On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
>
> For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to return
> true, all integer arguments smaller than int are passed as int:
>
> [hjl@gnu-tgl-3 pr14907]$ cat x.c
> extern int baz (char c1);
>
> int
> foo (char c1)
> {
>   return baz (c1);
> }
> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> [hjl@gnu-tgl-3 pr14907]$ cat x.s
> .file "x.c"
> .text
> .p2align 4
> .globl foo
> .type foo, @function
> foo:
> .LFB0:
> .cfi_startproc
> movsbl 4(%esp), %eax
> movl %eax, 4(%esp)
> jmp baz
> .cfi_endproc
> .LFE0:
> .size foo, .-foo
> .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> .section .note.GNU-stack,"",@progbits
> [hjl@gnu-tgl-3 pr14907]$
>
> But integer promotion:
>
> movsbl 4(%esp), %eax
> movl %eax, 4(%esp)
>
> isn't necessary if incoming arguments are copied to outgoing arguments
> directly.
>
> Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, defaulting
> to return nullptr.  If the new target hook returns non-nullptr, use it to
> get the outgoing small integer argument.  The x86 target hook returns the
> value of the corresponding incoming argument as int if it can be used as
> the outgoing argument.  If callee is a global function, we always properly
> extend the incoming small integer arguments in callee.  If callee is a
> local function, since DECL_ARG_TYPE has the original small integer type,
> we will extend the incoming small integer arguments in callee if needed.
> It is safe only if
>
> 1. Caller and callee are not nested functions.
> 2. Caller and callee use the same ABI.

How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
should apply to all of them, no?

> 3. The incoming argument and the outgoing argument are in the same
> location.

Why's that?  Can't we move them but still elide the sign-/zero-extension?

> 4. The incoming argument is unchanged before call expansion.

Obviously, but then IMO this reveals an issue with the design of a target hook
returning the argument register - it returns a place rather than a
value.  Wha'ts
the limitation of implementing this without help of the target?

Richard.

> Otherwise, using the incoming argument as the outgoing argument may change
> values of other incoming arguments or the wrong outgoing argument value
> may be used.
>
> gcc/
>
> PR middle-end/14907
> * calls.cc (arg_data): Add small_integer_argument_value.
> (precompute_register_parameters): Set args[i].value to
> args[i].small_integer_argument_value if not nullptr.
> (initialize_argument_information): Set
> args[i].small_integer_argument_value to
> TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE.
> (store_one_arg): Set arg->value to arg->small_integer_argument_value
> if not nullptr.
> * target.def (get_small_integer_argument_value): New for calls.
> * targhooks.cc (default_get_small_integer_argument_value): New.
> * targhooks.h (default_get_small_integer_argument_value): Likewise.
> * config/i386/i386.cc (ix86_get_small_integer_argument_value): New.
> (TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE): Likewise.
> * config/i386/i386.h (machine_function): Add
> no_small_integer_argument_value and before_first_expand_call.
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE): New
> hook.
>
> gcc/testsuite/
>
> PR middle-end/14907
> * gcc.target/i386/pr14907-1.c: New test.
> * gcc.target/i386/pr14907-2.c: Likewise.
> * gcc.target/i386/pr14907-3.c: Likewise.
> * gcc.target/i386/pr14907-4.c: Likewise.
> * gcc.target/i386/pr14907-5.c: Likewise.
> * gcc.target/i386/pr14907-6.c: Likewise.
> * gcc.target/i386/pr14907-7a.c: Likewise.
> * gcc.target/i386/pr14907-7b.c: Likewise.
> * gcc.target/i386/pr14907-8a.c: Likewise.
> * gcc.target/i386/pr14907-8b.c: Likewise.
> * gcc.target/i386/pr14907-9a.c: Likewise.
> * gcc.target/i386/pr14907-9b.c: Likewise.
> * gcc.target/i386/pr14907-10a.c: Likewise.
> * gcc.target/i386/pr14907-10b.c: Likewise.
> * gcc.target/i386/pr14907-10c.c: Likewise.
> * gcc.target/i386/pr14907-11.c: Likewise.
> * gcc.target/i386/pr14907-12.c: Likewise.
> * gcc.target/i386/pr14907-13.c: Likewise.
> * gcc.target/i386/pr14907-14.c: Likewise.
> * gcc.target/i386/pr14907-15.c: Likewise.
> * gcc.target/i386/pr14907-16.c: Likewise.
> * gcc.target/i386/pr14907-17.c: Likewise.
> * gcc.target/i386/pr14907-18a.c: Likewise.
> * gcc.target/i386/pr14907-18b.c: Likewise.
> * gcc.target/i386/pr14907-19.c: Likewise.
> * gcc.target/i386/pr14907-20a.c: Likewise.
> * gcc.target/i386/pr14907-20b.c: Likewise.
> * gcc.target/i386/pr14907-21.c: Likewise.
> * gcc.target/i386/pr14907-22.c: Likewise.
>
>
> --
> H.J.

Re: [PATCH][15.2] nr2.0: late: Correctly initialize funny_error member

2025-04-29 Thread Andrew Pinski

On Tue, Apr 29, 2025 at 1:26 AM  wrote:
>
> From: Arthur Cohen 
>
> Hi everyone,
>
> We noticed inconsistent errors when running name-resolution 2.0 on
> certain files, where an invalid error was triggered and the message was
> from the `funny_ice` error finalizer function we had added as an easter
> egg. We realized yesterday that the undefined value was actually our
> `funny_error` boolean, which is supposed to be set only when resolving
> specific easter eggs `AST::IdentifierExpr`s.
>
> Since `funny_error` is a boolean, it does not get default-initialized in
> the constructor of `Late` - which this patch corrects.
>
> I will be pushing it to trunk directly, but this email specifically
> concerns its port into 15.2.

I am not sure if using NSDMI might be a better style here than doing
it in the constructor.

That is:
```
diff --git a/gcc/rust/resolve/rust-late-name-resolver-2.0.h
b/gcc/rust/resolve/rust-late-name-resolver-2.0.h
index 171d9bfe0f6..2f93981b0d7 100644
--- a/gcc/rust/resolve/rust-late-name-resolver-2.0.h
+++ b/gcc/rust/resolve/rust-late-name-resolver-2.0.h
@@ -76,7 +76,7 @@ private:
   /* Setup Rust's builtin types (u8, i32, !...) in the resolver */
   void setup_builtin_types ();

-  bool funny_error;
+  bool funny_error = false;
 };

 // TODO: Add missing mappings and data structures
```

Thanks,
Andrew

>
> Thanks a lot to Marc Poulhiès and Owen Avery for their help in
> discovering and fixing this.
>
> Best,
>
> Arthur
>
> gcc/rust/ChangeLog:
>
> * resolve/rust-late-name-resolver-2.0.cc (Late::Late): False 
> initialize the
> funny_error field.
> ---
>  gcc/rust/resolve/rust-late-name-resolver-2.0.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rust/resolve/rust-late-name-resolver-2.0.cc 
> b/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
> index 5f215db0a72..f46f9e7dd3e 100644
> --- a/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
> +++ b/gcc/rust/resolve/rust-late-name-resolver-2.0.cc
> @@ -33,7 +33,9 @@
>  namespace Rust {
>  namespace Resolver2_0 {
>
> -Late::Late (NameResolutionContext &ctx) : DefaultResolver (ctx) {}
> +Late::Late (NameResolutionContext &ctx)
> +  : DefaultResolver (ctx), funny_error (false)
> +{}
>
>  static NodeId
>  next_node_id ()
> --
> 2.49.0
>

[PATCH] libstdc++: Fix parallel algos for move-only values [PR117905]

2025-04-29 Thread Jonathan Wakely

All of reduce, transform_reduce, exclusive_scan, and inclusive_scan,
transform_exclusive_scan, and transform_inclusive_scan have a
precondition that the type of init meets the Cpp17MoveConstructible
requirements. It isn't required to be copy constructible, so when
passing it to the next internal function it needs to be moved, not
copied. We also need to move when creating local variables on the stack,
and when returning as part of a pair.

libstdc++-v3/ChangeLog:

PR libstdc++/117905
* include/pstl/glue_numeric_impl.h (reduce, transform_reduce)
(transform_reduce, inclusive_scan, transform_exclusive_scan)
(transform_inclusive_scan): Use std::move for __init parameter.
* include/pstl/numeric_impl.h (__brick_transform_reduce)
(__pattern_transform_reduce, __brick_transform_scan)
(__pattern_transform_scan): Likewise.
* include/std/numeric (inclusive_scan, transform_exclusive_scan):
Use std::move to create local copy of the first element.
* testsuite/26_numerics/pstl/numeric_ops/108236.cc: Move test
using move-only type to ...
* testsuite/26_numerics/pstl/numeric_ops/move_only.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/pstl/glue_numeric_impl.h | 16 ++--
 libstdc++-v3/include/pstl/numeric_impl.h  | 36 
 libstdc++-v3/include/std/numeric  |  6 +-
 .../26_numerics/pstl/numeric_ops/108236.cc| 25 --
 .../26_numerics/pstl/numeric_ops/move_only.cc | 90 +++
 5 files changed, 119 insertions(+), 54 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/26_numerics/pstl/numeric_ops/move_only.cc

diff --git a/libstdc++-v3/include/pstl/glue_numeric_impl.h 
b/libstdc++-v3/include/pstl/glue_numeric_impl.h
index 10d4912deed..fe2d0fd47e2 100644
--- a/libstdc++-v3/include/pstl/glue_numeric_impl.h
+++ b/libstdc++-v3/include/pstl/glue_numeric_impl.h
@@ -25,7 +25,7 @@ 
__pstl::__internal::__enable_if_execution_policy<_ExecutionPolicy, _Tp>
 reduce(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator 
__last, _Tp __init,
_BinaryOperation __binary_op)
 {
-return transform_reduce(std::forward<_ExecutionPolicy>(__exec), __first, 
__last, __init, __binary_op,
+return transform_reduce(std::forward<_ExecutionPolicy>(__exec), __first, 
__last, std::move(__init), __binary_op,
 __pstl::__internal::__no_op());
 }
 
@@ -33,7 +33,7 @@ template 
 __pstl::__internal::__enable_if_execution_policy<_ExecutionPolicy, _Tp>
 reduce(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator 
__last, _Tp __init)
 {
-return transform_reduce(std::forward<_ExecutionPolicy>(__exec), __first, 
__last, __init, std::plus<_Tp>(),
+return transform_reduce(std::forward<_ExecutionPolicy>(__exec), __first, 
__last, std::move(__init), std::plus<_Tp>(),
 __pstl::__internal::__no_op());
 }
 
@@ -58,7 +58,7 @@ transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator1 
__first1, _Forward
 
 typedef typename iterator_traits<_ForwardIterator1>::value_type _InputType;
 return __pstl::__internal::__pattern_transform_reduce(__dispatch_tag, 
std::forward<_ExecutionPolicy>(__exec),
-  __first1, __last1, 
__first2, __init, std::plus<_InputType>(),
+  __first1, __last1, 
__first2, std::move(__init), std::plus<_InputType>(),
   
std::multiplies<_InputType>());
 }
 
@@ -70,7 +70,7 @@ transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator1 
__first1, _Forward
 {
 auto __dispatch_tag = __pstl::__internal::__select_backend(__exec, 
__first1, __first2);
 return __pstl::__internal::__pattern_transform_reduce(__dispatch_tag, 
std::forward<_ExecutionPolicy>(__exec),
-  __first1, __last1, 
__first2, __init, __binary_op1,
+  __first1, __last1, 
__first2, std::move(__init), __binary_op1,
   __binary_op2);
 }
 
@@ -81,7 +81,7 @@ transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator 
__first, _ForwardIt
 {
 auto __dispatch_tag = __pstl::__internal::__select_backend(__exec, 
__first);
 return __pstl::__internal::__pattern_transform_reduce(__dispatch_tag, 
std::forward<_ExecutionPolicy>(__exec),
-  __first, __last, 
__init, __binary_op, __unary_op);
+  __first, __last, 
std::move(__init), __binary_op, __unary_op);
 }
 
 // [exclusive.scan]
@@ -139,7 +139,7 @@ inclusive_scan(_ExecutionPolicy&& __exec, _ForwardIterator1 
__first, _ForwardIte
_ForwardIterator2 __result, _BinaryOperation __binary_op, _Tp 
__init)
 {
 return transform

Re: [PATCH] Use incoming small integer argument value if possible

2025-04-29 Thread H.J. Lu

On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  wrote:
> >
> > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to return
> > true, all integer arguments smaller than int are passed as int:
> >
> > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > extern int baz (char c1);
> >
> > int
> > foo (char c1)
> > {
> >   return baz (c1);
> > }
> > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > .file "x.c"
> > .text
> > .p2align 4
> > .globl foo
> > .type foo, @function
> > foo:
> > .LFB0:
> > .cfi_startproc
> > movsbl 4(%esp), %eax
> > movl %eax, 4(%esp)
> > jmp baz
> > .cfi_endproc
> > .LFE0:
> > .size foo, .-foo
> > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > .section .note.GNU-stack,"",@progbits
> > [hjl@gnu-tgl-3 pr14907]$
> >
> > But integer promotion:
> >
> > movsbl 4(%esp), %eax
> > movl %eax, 4(%esp)
> >
> > isn't necessary if incoming arguments are copied to outgoing arguments
> > directly.
> >
> > Add a new target hook, TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, defaulting
> > to return nullptr.  If the new target hook returns non-nullptr, use it to
> > get the outgoing small integer argument.  The x86 target hook returns the
> > value of the corresponding incoming argument as int if it can be used as
> > the outgoing argument.  If callee is a global function, we always properly
> > extend the incoming small integer arguments in callee.  If callee is a
> > local function, since DECL_ARG_TYPE has the original small integer type,
> > we will extend the incoming small integer arguments in callee if needed.
> > It is safe only if
> >
> > 1. Caller and callee are not nested functions.
> > 2. Caller and callee use the same ABI.
>
> How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> should apply to all of them, no?

When the arguments are passed in different registers in different ABIs,
we have to copy them anyway.

>
> > 3. The incoming argument and the outgoing argument are in the same
> > location.
>
> Why's that?  Can't we move them but still elide the sign-/zero-extension?

If they aren't in the same locations, we have to move them anyway.
This patch tries to avoid necessary moves of incoming arguments to
outgoing arguments.

> > 4. The incoming argument is unchanged before call expansion.
>
> Obviously, but then IMO this reveals an issue with the design of a target hook
> returning the argument register - it returns a place rather than a
> value.  Wha'ts

We need the place so that we can avoid meaningless copy.

> the limitation of implementing this without help of the target?

Middle-end may not know what is safe and not safe, for example, we
can skip the hidden argument SUBREG for x32.

> Richard.
>
> > Otherwise, using the incoming argument as the outgoing argument may change
> > values of other incoming arguments or the wrong outgoing argument value
> > may be used.
> >
> > gcc/
> >
> > PR middle-end/14907
> > * calls.cc (arg_data): Add small_integer_argument_value.
> > (precompute_register_parameters): Set args[i].value to
> > args[i].small_integer_argument_value if not nullptr.
> > (initialize_argument_information): Set
> > args[i].small_integer_argument_value to
> > TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE.
> > (store_one_arg): Set arg->value to arg->small_integer_argument_value
> > if not nullptr.
> > * target.def (get_small_integer_argument_value): New for calls.
> > * targhooks.cc (default_get_small_integer_argument_value): New.
> > * targhooks.h (default_get_small_integer_argument_value): Likewise.
> > * config/i386/i386.cc (ix86_get_small_integer_argument_value): New.
> > (TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE): Likewise.
> > * config/i386/i386.h (machine_function): Add
> > no_small_integer_argument_value and before_first_expand_call.
> > * doc/tm.texi: Regenerated.
> > * doc/tm.texi.in (TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE): New
> > hook.
> >
> > gcc/testsuite/
> >
> > PR middle-end/14907
> > * gcc.target/i386/pr14907-1.c: New test.
> > * gcc.target/i386/pr14907-2.c: Likewise.
> > * gcc.target/i386/pr14907-3.c: Likewise.
> > * gcc.target/i386/pr14907-4.c: Likewise.
> > * gcc.target/i386/pr14907-5.c: Likewise.
> > * gcc.target/i386/pr14907-6.c: Likewise.
> > * gcc.target/i386/pr14907-7a.c: Likewise.
> > * gcc.target/i386/pr14907-7b.c: Likewise.
> > * gcc.target/i386/pr14907-8a.c: Likewise.
> > * gcc.target/i386/pr14907-8b.c: Likewise.
> > * gcc.target/i386/pr14907-9a.c: Likewise.
> > * gcc.target/i386/pr14907-9b.c: Likewise.
> > * gcc.target/i386/pr14907-10a.c: Likewise.
> > * gcc.target/i386/pr14907-10b.c: Likewise.
> > * gcc.target/i386/pr14907-10c.c: Likewise.
> > * gcc.target/i386/pr14907-11.c: Likewise.
> > * gcc.target/i386/pr14907-12.c: Likewise.
> > * gcc.target/i386/pr14907-13.c: Likewise.
> > * gcc.target/i386/pr14907-14.c: Likewise.
> > * gcc.target/i386/pr14907-15.c: Likewise.
> > * gcc.target/i386/pr14907-16.c: Likewise.
> > * gcc.target/i386/pr14907-17.c: Likew

Re: [PATCH][15.2] nr2.0: late: Correctly initialize funny_error member

2025-04-29 Thread Marc Poulhiès

April 29, 2025 at 10:39 AM, "Andrew Pinski" mailto:pins...@gmail.com?to=%22Andrew%20Pinski%22%20%3Cpinskia%40gmail.com%3E > 
wrote:


> 
> On Tue, Apr 29, 2025 at 1:26 AM  wrote:
> 
> > 
> > From: Arthur Cohen 
> > 
> >  Hi everyone,
> > 
> >  We noticed inconsistent errors when running name-resolution 2.0 on
> >  certain files, where an invalid error was triggered and the message was
> >  from the `funny_ice` error finalizer function we had added as an easter
> >  egg. We realized yesterday that the undefined value was actually our
> >  `funny_error` boolean, which is supposed to be set only when resolving
> >  specific easter eggs `AST::IdentifierExpr`s.
> > 
> >  Since `funny_error` is a boolean, it does not get default-initialized in
> >  the constructor of `Late` - which this patch corrects.
> > 
> >  I will be pushing it to trunk directly, but this email specifically
> >  concerns its port into 15.2.
> > 
> I am not sure if using NSDMI might be a better style here than doing
> it in the constructor.
> 

We discussed this before sending the patch but we didn't have any strong 
arguments for/against any solution. Do you have any?

Thanks,
Marc

[PATCH] libstdc++: Fix allocator propagation for rvalue+rvalue string concatenation

2025-04-29 Thread Jonathan Wakely

I made a last-minute change to Nina's r10-200-gf4e678ef74b272
implementation of P1165R1 (consistent allocator propagation for
operator+ on strings), so that the rvalue+rvalue case assumes that COW
strings do not support stateful allocators. I don't think that was true
when the change went in, and certainly isn't true now. COW strings don't
support allocator propagation on assignment and swap, but they do
support non-equal stateful allocators, which are correctly propagated on
move construction.

This removes the preprocessor conditional in the rvalue+rvalue overload
so that COW strings are handled equivalently. Also use constexpr-if
unconditionally, disabling diagnostics with pragmas.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (operator+(string&&, string&&)):
Do not assume that COW strings have equal allocators. Use
constexpr-if unconditionally.
* testsuite/21_strings/basic_string/allocator/char/operator_plus.cc:
Remove { dg-require-effective-target cxx11_abi }.
* testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc:
Likewise.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/basic_string.h   | 10 ++
 .../basic_string/allocator/char/operator_plus.cc   |  2 --
 .../basic_string/allocator/wchar_t/operator_plus.cc|  2 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index c90bd099b63..a087e637805 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3938,21 +3938,23 @@ _GLIBCXX_END_NAMESPACE_CXX11
 operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
  basic_string<_CharT, _Traits, _Alloc>&& __rhs)
 {
-#if _GLIBCXX_USE_CXX11_ABI
-  using _Alloc_traits = allocator_traits<_Alloc>;
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+  // Return value must use __lhs.get_allocator(), but if __rhs has equal
+  // allocator then we can choose which parameter to modify in-place.
   bool __use_rhs = false;
-  if _GLIBCXX17_CONSTEXPR (typename _Alloc_traits::is_always_equal{})
+  if constexpr (allocator_traits<_Alloc>::is_always_equal::value)
__use_rhs = true;
   else if (__lhs.get_allocator() == __rhs.get_allocator())
__use_rhs = true;
   if (__use_rhs)
-#endif
{
  const auto __size = __lhs.size() + __rhs.size();
  if (__size > __lhs.capacity() && __size <= __rhs.capacity())
return std::move(__rhs.insert(0, __lhs));
}
   return std::move(__lhs.append(__rhs));
+#pragma GCC diagnostic pop
 }
 
   template
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
index 571f8535176..92e05690b19 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/operator_plus.cc
@@ -17,8 +17,6 @@
 // .
 
 // { dg-do run { target c++11 } }
-// COW strings don't support C++11 allocator propagation:
-// { dg-require-effective-target cxx11_abi }
 
 #include 
 #include 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
index 0da684360ab..b75b26ae85c 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/operator_plus.cc
@@ -17,8 +17,6 @@
 // .
 
 // { dg-do run { target c++11 } }
-// COW strings don't support C++11 allocator propagation:
-// { dg-require-effective-target cxx11_abi }
 
 #include 
 #include 
-- 
2.49.0

[PATCH] libstdc++: Use 'if constexpr' to simplify std::vector relocation

2025-04-29 Thread Jonathan Wakely

Simplify std::vector's use of std::__relocate_a by using 'if constexpr'
even in C++11 and C++14, with diagnostic pragmas to disable warnings.
This allows us to call std::__relocate_a directly, instead of via
_S_relocate and tag distpatching.

Preserve _S_relocate so that explicit instantiations still get it, but
make it a no-op when _S_use_relocate() is false, so that we don't
instantiate __relocate_a if it isn't needed.

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (_S_do_relocate): Remove.
(_S_relocate): Remove tag dispatching path.
* include/bits/vector.tcc (reserve, _M_realloc_insert)
(_M_realloc_append, _M_default_append): Add diagnostic pragmas
and use 'if constexpr' in C++11 and C++14. Call
std::__relocate_a directly instead of _S_relocate.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/stl_vector.h | 26 +---
 libstdc++-v3/include/bits/vector.tcc   | 41 +-
 2 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index aff9d5d9ca5..57680b7bbcf 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -518,29 +518,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
return _S_nothrow_relocate(__is_move_insertable<_Tp_alloc_type>{});
   }
 
-  static pointer
-  _S_do_relocate(pointer __first, pointer __last, pointer __result,
-_Tp_alloc_type& __alloc, true_type) noexcept
-  {
-   return std::__relocate_a(__first, __last, __result, __alloc);
-  }
-
-  static pointer
-  _S_do_relocate(pointer, pointer, pointer __result,
-_Tp_alloc_type&, false_type) noexcept
-  { return __result; }
-
   static _GLIBCXX20_CONSTEXPR pointer
   _S_relocate(pointer __first, pointer __last, pointer __result,
  _Tp_alloc_type& __alloc) noexcept
   {
-#if __cpp_if_constexpr
-   // All callers have already checked _S_use_relocate() so just do it.
-   return std::__relocate_a(__first, __last, __result, __alloc);
-#else
-   using __do_it = __bool_constant<_S_use_relocate()>;
-   return _S_do_relocate(__first, __last, __result, __alloc, __do_it{});
-#endif
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+   if constexpr (_S_use_relocate())
+ return std::__relocate_a(__first, __last, __result, __alloc);
+   else
+ return __result;
+#pragma GCC diagnostic pop
   }
 #endif // C++11
 
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index b21e1d3b7a2..e18f01ab0ae 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -61,6 +61,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+
   template
 _GLIBCXX20_CONSTEXPR
 void
@@ -74,11 +77,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  const size_type __old_size = size();
  pointer __tmp;
 #if __cplusplus >= 201103L
- if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
+ if constexpr (_S_use_relocate())
{
  __tmp = this->_M_allocate(__n);
- _S_relocate(this->_M_impl._M_start, this->_M_impl._M_finish,
- __tmp, _M_get_Tp_allocator());
+ std::__relocate_a(this->_M_impl._M_start, this->_M_impl._M_finish,
+   __tmp, _M_get_Tp_allocator());
}
  else
 #endif
@@ -98,6 +101,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
 }
+#pragma GCC diagnostic pop
 
 #if __cplusplus >= 201103L
   template
@@ -444,6 +448,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
 #if __cplusplus >= 201103L
   template
 template
@@ -488,14 +494,16 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 
 #if __cplusplus >= 201103L
-   if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
+   if constexpr (_S_use_relocate())
  {
// Relocation cannot throw.
-   __new_finish = _S_relocate(__old_start, __position.base(),
-  __new_start, _M_get_Tp_allocator());
+   __new_finish = std::__relocate_a(__old_start, __position.base(),
+__new_start,
+_M_get_Tp_allocator());
++__new_finish;
-   __new_finish = _S_relocate(__position.base(), __old_finish,
-  __new_finish, _M_get_Tp_allocator());
+   __new_finish = std::__relocate_a(__positi

1 2 >

1 - 100 of 132 matches

Mail list logo