date:20250624

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Richard Biener wrote:

> On Tue, 24 Jun 2025, Richard Sandiford wrote:
> 
> > Tamar Christina  writes:
> > > store_bit_field_1 has an optimization where if a target is not a memory 
> > > operand
> > > and the entire value is being set from something larger we can just wrap a
> > > subreg around the source and emit a move.
> > >
> > > For vector constructors this is however problematic because the subreg 
> > > means
> > > that the expansion of the constructor won't happen through vec_init 
> > > anymore.
> > >
> > > Complicated constructors which aren't natively supported by targets then 
> > > ICE as
> > > they wouldn't have been expanded so recog fails.
> > >
> > > This patch blocks the optimization on non-constant vector constructors. 
> > > Or non-uniform
> > > non-constant vectors. I allowed constant vectors because if I read the 
> > > code right
> > > simplify-rtx should be able to perform the simplification of pulling out 
> > > the element
> > > or merging the constant values.  There are several testcases in 
> > > aarch64-sve-pcs.exp
> > > that test this as well. I allowed uniform non-constant vectors because 
> > > they
> > > would be folded into a vec_select later on.
> > >
> > > Note that codegen is quite horrible, for what should only be an lsr.  But 
> > > I'll
> > > address that separately so that this patch is backportable.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master? and GCC 15, 14, 13?
> > 
> > I was discussing this Alex off-list last week, and the fix we talked
> > about there was:
> > 
> > diff --git a/gcc/explow.cc b/gcc/explow.cc
> > index 7799a98053b..8b138f54f75 100644
> > --- a/gcc/explow.cc
> > +++ b/gcc/explow.cc
> > @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> >   machine_mode innermode, poly_uint64 byte)
> >  {
> >rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> > -  if (x)
> > +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x
> >  return x;
> >  
> >auto *start = get_last_insn ();
> > 
> > The justification is that force_subreg is somewhat like a "subreg
> > version of force_operand", and so should try to avoid returning
> > subregs that force_operand would have replaced.  The force_operand
> > code I mean is:
> 
> Yeah, in particular CONSTANT_P isn't sth documented as valid as
> subreg operands, only registers (and memory) are.  But isn't this
> then a bug in simplify_gen_subreg itself, that it creates a SUBREG
> of a non-REG/MEM?

Aka

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index cbe61b49bf6..4fa947f84cd 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -8425,7 +8425,8 @@ simplify_context::simplify_gen_subreg (machine_mode 
outermode, rtx op,
  || GET_CODE (op) == CONST_VECTOR))
 return NULL_RTX;
 
-  if (validate_subreg (outermode, innermode, op, byte))
+  if ((REG_P (op) || MEM_P (op))
+  && validate_subreg (outermode, innermode, op, byte))
 return gen_rtx_SUBREG (outermode, op, byte);
 
   return NULL_RTX;

?

> Richard.
> 
> >   /* Check for subreg applied to an expression produced by loop optimizer.  
> > */
> >   if (code == SUBREG
> >   && !REG_P (SUBREG_REG (value))
> >   && !MEM_P (SUBREG_REG (value)))
> > {
> >   value
> > = simplify_gen_subreg (GET_MODE (value),
> >force_reg (GET_MODE (SUBREG_REG (value)),
> >   force_operand (SUBREG_REG (value),
> >  NULL_RTX)),
> >GET_MODE (SUBREG_REG (value)),
> >SUBREG_BYTE (value));
> >   code = GET_CODE (value);
> > }
> > 
> > Thanks,
> > Richard
> > 
> > > Thanks,
> > > Tamar
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/120718
> > >   * expmed.cc (store_bit_field_1): Only push subreg over uniform vector
> > >   constructors.
> > >   (foldable_value_with_subreg): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR target/120718
> > >   * gcc.target/aarch64/sve/pr120718.c: New test.
> > >
> > > ---
> > >
> > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > > index 
> > > be427dca5d9afeed2013954472dde3a5430169e0..a468aa5c0c3f20bd62a7afc1d245d64e87be5396
> > >  100644
> > > --- a/gcc/expmed.cc
> > > +++ b/gcc/expmed.cc
> > > @@ -740,6 +740,28 @@ store_bit_field_using_insv (const extraction_insn 
> > > *insv, rtx op0,
> > >return false;
> > >  }
> > >  
> > > +/* For non-constant vectors wrapping a subreg around the RTX will not 
> > > make
> > > +   the expression expand properly through vec_init.  For constant vectors
> > > +   we can because simplification can just extract the element out by
> > > +   by merging the values.  This can be done by simplify-rtx and so the
> > > +   subreg will be eliminated.  However poly constants require vec_init as
>

[Patch, Fortran, Coarray, PR88076, v1] 6/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Andre Vehreschild

Hi all,

this is the last patch of the mini-series. It just updates the testcases common
to coarrays in the gfortran testsuite. All tests in the
gcc/testsuite/gfortran.dg/caf directory are now also run with caf_shmem. The
test driver ensures, that no more than 8 images are used per testcase (if not
specified differently by the tester, setting GFORTRAN_NUM_IMAGES beforehand).
This is to prevent large machines testing on all hardware threads without any
benefit. The minimum number of images required is 8 and therefore that number
was chosen.

Bootstrapped and regtests fine on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 2eafd3c6b52507d1690c7ab565e32db33a39455e Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 18 Jun 2025 09:26:22 +0200
Subject: [PATCH 6/6] Fortran: Enable coarray tests for multi image use
 [PR88076]

Change some of regression tests to run on single and multiple images.
Add some new tests.

	PR fortran/88076

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/alloc_comp_4.f90: Make multi image
	compatible.
	* gfortran.dg/coarray/atomic_2.f90: Same.
	* gfortran.dg/coarray/caf.exp: Also test caf_shmem and choose
	eight images as a default.
	* gfortran.dg/coarray/coarray_allocated.f90: Add multi image
	support.
	* gfortran.dg/coarray/coindexed_1.f90: Same.
	* gfortran.dg/coarray/coindexed_3.f08: Same.
	* gfortran.dg/coarray/coindexed_5.f90: Same.
	* gfortran.dg/coarray/dummy_3.f90: Same.
	* gfortran.dg/coarray/event_1.f90: Same.
	* gfortran.dg/coarray/event_3.f08: Same.
	* gfortran.dg/coarray/failed_images_2.f08: Same.
	* gfortran.dg/coarray/image_status_1.f08: Same.
	* gfortran.dg/coarray/image_status_2.f08: Same.
	* gfortran.dg/coarray/lock_2.f90: Same.
	* gfortran.dg/coarray/poly_run_3.f90: Same.
	* gfortran.dg/coarray/scalar_alloc_1.f90: Same.
	* gfortran.dg/coarray/stopped_images_2.f08: Same.
	* gfortran.dg/coarray/sync_1.f90: Same.
	* gfortran.dg/coarray/sync_3.f90: Same.
	* gfortran.dg/coarray/co_reduce_string.f90: New test.
	* gfortran.dg/coarray/sync_team.f90: New test.
---
 .../gfortran.dg/coarray/alloc_comp_4.f90  |  16 ++-
 .../gfortran.dg/coarray/atomic_2.f90  |  25 ++--
 gcc/testsuite/gfortran.dg/coarray/caf.exp |  13 +++
 .../gfortran.dg/coarray/co_reduce_string.f90  |  94 +++
 .../gfortran.dg/coarray/coarray_allocated.f90 |   9 +-
 .../gfortran.dg/coarray/coindexed_1.f90   |  74 +++-
 .../gfortran.dg/coarray/coindexed_3.f08   |   4 +-
 .../gfortran.dg/coarray/coindexed_5.f90   | 108 +-
 gcc/testsuite/gfortran.dg/coarray/dummy_3.f90 |   1 +
 gcc/testsuite/gfortran.dg/coarray/event_1.f90 |  89 ---
 gcc/testsuite/gfortran.dg/coarray/event_3.f08 |   4 +-
 .../gfortran.dg/coarray/failed_images_2.f08   |  39 ++-
 .../gfortran.dg/coarray/image_status_1.f08|   2 +-
 .../gfortran.dg/coarray/image_status_2.f08|  32 +-
 gcc/testsuite/gfortran.dg/coarray/lock_2.f90  |   2 +
 .../gfortran.dg/coarray/poly_run_3.f90|   8 +-
 .../gfortran.dg/coarray/scalar_alloc_1.f90|  13 ++-
 .../gfortran.dg/coarray/stopped_images_2.f08  |  39 ++-
 gcc/testsuite/gfortran.dg/coarray/sync_1.f90  |   7 +-
 gcc/testsuite/gfortran.dg/coarray/sync_3.f90  |  26 -
 .../gfortran.dg/coarray/sync_team.f90 |  33 ++
 21 files changed, 488 insertions(+), 150 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/co_reduce_string.f90
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/sync_team.f90

diff --git a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_4.f90 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_4.f90
index 2ee8ff0253d..50b4bab1603 100644
--- a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_4.f90
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_4.f90
@@ -11,11 +11,19 @@ program main
   end type
 
   type(mytype), save :: object[*]
-  integer :: me
+  integer :: me, other
 
   me=this_image()
-  allocate(object%indices(me))
-  object%indices = 42
+  other = me + 1
+  if (other .GT. num_images()) other = 1
+  if (me == num_images()) then
+ allocate(object%indices(me/2))
+  else
+allocate(object%indices(me))
+  end if
+  object%indices = 42 * me
 
-  if ( any( object[me]%indices(:) /= 42 ) ) STOP 1
+  sync all
+  if ( any( object[other]%indices(:) /= 42 * other ) ) STOP 1
+  sync all
 end program
diff --git a/gcc/testsuite/gfortran.dg/coarray/atomic_2.f90 b/gcc/testsuite/gfortran.dg/coarray/atomic_2.f90
index 5e1c4967248..7eccd7b578c 100644
--- a/gcc/testsuite/gfortran.dg/coarray/atomic_2.f90
+++ b/gcc/testsuite/gfortran.dg/coarray/atomic_2.f90
@@ -61,7 +61,7 @@ end do
 sync all
 
 call atomic_ref(var, caf[num_images()], stat=stat)
-if (stat /= 0 .or. var /= num_images() + this_image()) STOP 12
+if (stat /= 0 .or. var /= num_images() * 2) STOP 12
 do i = 1, num_images()
   call atomic_ref(var, caf[i], stat=stat)
   if (stat /= 0 .or. var /= num_images() + i) STOP 13
@@ -328,7 +328,7 @

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >
> >> Tamar Christina  writes:
> >> > store_bit_field_1 has an optimization where if a target is not a memory 
> >> > operand
> >> > and the entire value is being set from something larger we can just wrap 
> >> > a
> >> > subreg around the source and emit a move.
> >> >
> >> > For vector constructors this is however problematic because the subreg 
> >> > means
> >> > that the expansion of the constructor won't happen through vec_init 
> >> > anymore.
> >> >
> >> > Complicated constructors which aren't natively supported by targets then 
> >> > ICE as
> >> > they wouldn't have been expanded so recog fails.
> >> >
> >> > This patch blocks the optimization on non-constant vector constructors. 
> >> > Or non-uniform
> >> > non-constant vectors. I allowed constant vectors because if I read the 
> >> > code right
> >> > simplify-rtx should be able to perform the simplification of pulling out 
> >> > the element
> >> > or merging the constant values.  There are several testcases in 
> >> > aarch64-sve-pcs.exp
> >> > that test this as well. I allowed uniform non-constant vectors because 
> >> > they
> >> > would be folded into a vec_select later on.
> >> >
> >> > Note that codegen is quite horrible, for what should only be an lsr.  
> >> > But I'll
> >> > address that separately so that this patch is backportable.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >> > -m32, -m64 and no issues.
> >> >
> >> > Ok for master? and GCC 15, 14, 13?
> >> 
> >> I was discussing this Alex off-list last week, and the fix we talked
> >> about there was:
> >> 
> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
> >> index 7799a98053b..8b138f54f75 100644
> >> --- a/gcc/explow.cc
> >> +++ b/gcc/explow.cc
> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> >>  machine_mode innermode, poly_uint64 byte)
> >>  {
> >>rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> >> -  if (x)
> >> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x
> >>  return x;
> >>  
> >>auto *start = get_last_insn ();
> >> 
> >> The justification is that force_subreg is somewhat like a "subreg
> >> version of force_operand", and so should try to avoid returning
> >> subregs that force_operand would have replaced.  The force_operand
> >> code I mean is:
> >
> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
> > subreg operands, only registers (and memory) are.  But isn't this
> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
> > of a non-REG/MEM?
> 
> I don't think the documentation is correct/up-to-date.  subreg is
> de facto used as a general operation, and for example there are
> patterns like:
> 
> (define_insn ""
>   [(set (match_operand:QI 0 "general_operand_dst" 
> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
> (const_int 16)) 3))
>(clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
>(clobber (reg:CC CC_REG))]
>   ""
>   "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
>   [(set_attr "length" "10")])

I see.  Is the subreg for such define_insn generated by the middle-end
though?

> (from h8300).  This is also why simplify_gen_subreg has:
> 
>   if (GET_CODE (op) == SUBREG
>   || GET_CODE (op) == CONCAT
>   || GET_MODE (op) == VOIDmode)
> return NULL_RTX;
> 
>   if (MODE_COMPOSITE_P (outermode)
>   && (CONST_SCALAR_INT_P (op)
> || CONST_DOUBLE_AS_FLOAT_P (op)
> || CONST_FIXED_P (op)
> || GET_CODE (op) == CONST_VECTOR))
> return NULL_RTX;
> 
> rather than the !REG_P (op) && !MEM_P (op) that the documentation
> would imply.

So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
we don't seem to ever legitmize constants wrapped in a SUBREG, so
we shouldn't generate a SUBREG of a constant (in the middle-end)?

Richard.

> 
> To take another example that I happened to be working on today,
> LRA eliminates registers recursively, so if we have:
> 
>   (subreg:SI (reg:DI R) 0)
> 
> for an eliminable register R, we end up with:
> 
>   (subreg:SI (plus:DI (reg:DI R') (const_int N)) 0)
> 
> I wouldn't object to trying to change this, but
> 
> (a) if we do, we should also restrict it to just REGs, not REGs and MEMs
> 
> (b) it would be far too invasive to backport
> 
> (c) it might be an ongoing whack-a-mole project, a bit like the recent
> tightening of validate_subreg
> 
> But I agree it would make things cleaner in some ways.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH 6/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

Re-indent elided loop bodies

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will squash, repost and push if all OK.

Richard.

* tree-vect-stmts.cc (vectorizable_load):
---
 gcc/tree-vect-stmts.cc | 1686 
 1 file changed, 831 insertions(+), 855 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 717d4694b88..db1b539b6c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11007,345 +11007,327 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (!grouped_load && !slp_perm);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
+
+  /* 1. Create the vector or array pointer update chain.  */
+  if (!costing_p)
{
- /* 1. Create the vector or array pointer update chain.  */
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
+slp_node, &gs_info, &dataref_ptr,
+&vec_offsets);
+ else
+   dataref_ptr
+ = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
+ at_loop, offset, &dummy, gsi,
+ &ptr_incr, false, bump);
+   }
+
+  gimple *new_stmt = NULL;
+  for (i = 0; i < vec_num; i++)
+   {
+ tree final_mask = NULL_TREE;
+ tree final_len = NULL_TREE;
+ tree bias = NULL_TREE;
  if (!costing_p)
{
- if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
-slp_node, &gs_info, &dataref_ptr,
-&vec_offsets);
- else
-   dataref_ptr
- = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
- at_loop, offset, &dummy, gsi,
- &ptr_incr, false, bump);
+ if (mask)
+   vec_mask = vec_masks[i];
+ if (loop_masks)
+   final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
+vec_num, vectype, i);
+ if (vec_mask)
+   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
+  final_mask, vec_mask, gsi);
+
+ if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
+  gsi, stmt_info, bump);
}
 
- gimple *new_stmt = NULL;
- for (i = 0; i < vec_num; i++)
+ /* 2. Create the vector-load in the loop.  */
+ unsigned HOST_WIDE_INT align;
+ if (gs_info.ifn != IFN_LAST)
{
- tree final_mask = NULL_TREE;
- tree final_len = NULL_TREE;
- tree bias = NULL_TREE;
- if (!costing_p)
+ if (costing_p)
{
- if (mask)
-   vec_mask = vec_masks[i];
- if (loop_masks)
-   final_mask
- = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-   vec_num, vectype, i);
- if (vec_mask)
-   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
-  final_mask, vec_mask, gsi);
-
- if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
-  gsi, stmt_info, bump);
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ inside_cost
+   = record_stmt_cost (cost_vec, cnunits, scalar_load,
+   slp_node, 0, vect_body);
+ continue;
}
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+   vec_offset = vec_offsets[i];
+ tree zero = build_zero_cst (vectype);
+ tree scale = size_int (gs_info.scale);
 
- /* 2. Create the vector-load in the loop.  */
- unsigned HOST_WIDE_INT align;
- if (gs_info.ifn != IFN_LAST)
+ if (gs_info.ifn == IFN_MASK_LEN_GATHER_LOAD)
{
- if (costing_p)
-   {
- unsigned int cnunits = vect_nunits_for_cost (vectype);
- inside_cost
-   = record_stmt_cost (cost_vec, cnunits, scalar_load,
-   slp_node, 0, vect_body);
- continue;
-   }
- if

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Alexander Monakov

> > Thanks! Any thoughts on the other patch in the thread, about flipping
> > -ffp-contract from =fast to =on?
> 
> I can't find this mail, not in spam either, but I'm OK with such change if it
> comes with test coverage.

Ouch, let me reproduce it below. About test coverage, I'm not exactly sure what
you mean, so I'll try to explain my perspective.

We definitely have existing test coverage for production of FMAs, especially
on targets where FMA is in baseline ISA (aarch64, ia64, rs6000, maybe others),
but even x86 has a number of tests where fma is enabled with -m flags.  Some
of those tests will regress and the question is what to do about them.

On x86, I've provided a fix for one test (the patch you just approved), but
I can't fix everything and in the patch I'm changing the remaining tests to
pass -ffp-contract=fast.  But most of the tests are actually highlighting
missed optimizations, like SLP not able to form addsub FMAs from a pair of
fmadd+fmsub.

And the question of what to do on all the other targets remains.

--- 8< ---

>From 478d51bc46c925028f6b1782fcadb3a92a58 Mon Sep 17 00:00:00 2001
From: Alexander Monakov 
Date: Mon, 12 May 2025 23:23:42 +0300
Subject: [PATCH] c-family: switch away from -ffp-contract=fast

Unrestricted FMA contraction, combined with optimizations that create
copies of expressions, is causing hard-to-debug issues such as PR106902.
Since we implement conformant contraction now, switch C and C++ from
-ffp-contract=fast to either =off (under -std=c[++]NN like before, also
for C++ now), or =on (under -std=gnu[++]NN).  Keep -ffp-contract=fast
when -funsafe-math-optimizations (or -ffast-math, -Ofast) is active.

In other words,

- -ffast-math: no change, unrestricted contraction like before;
- standards compliant mode for C: no change, no contraction;
- ditto, for C++: align with C (no contraction);
- otherwise, switch C and C++ from -ffp-contract=fast to =on.

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Adjust handling of
flag_fp_contract_mode.

gcc/ChangeLog:

* doc/invoke.texi (-ffp-contract): Describe new defaults.
(-funsafe-math-optimizations): Add -ffp-contract=fast.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr81904.c: Add -ffp-contract=fast to flags.
* gcc.target/i386/pr116979.c: Ditto.
* gcc.target/i386/intrinsics_opt-1.c: Ditto.
* gcc.target/i386/part-vect-fmaddsubhf-1.c: Ditto.
* gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: Ditto.
* gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: Ditto.
* gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: Ditto.
* gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: Ditto.
---
 gcc/c-family/c-opts.cc| 11 ++-
 gcc/doc/invoke.texi   | 10 ++
 .../gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c  |  2 +-
 .../gcc.target/i386/avx512f-vect-fmaddsubXXXps.c  |  2 +-
 .../gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c  |  2 +-
 .../gcc.target/i386/avx512f-vect-fmsubaddXXXps.c  |  2 +-
 gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c  |  2 +-
 .../gcc.target/i386/part-vect-fmaddsubhf-1.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr116979.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr81904.c   |  2 +-
 10 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 697518637d..1ae45e2b9a 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -877,15 +877,8 @@ c_common_post_options (const char **pfilename)
 flag_excess_precision = (flag_iso ? EXCESS_PRECISION_STANDARD
 : EXCESS_PRECISION_FAST);
 
-  /* ISO C restricts floating-point expression contraction to within
- source-language expressions (-ffp-contract=on, currently an alias
- for -ffp-contract=off).  */
-  if (flag_iso
-  && !c_dialect_cxx ()
-  && (OPTION_SET_P (flag_fp_contract_mode)
- == (enum fp_contract_mode) 0)
-  && flag_unsafe_math_optimizations == 0)
-flag_fp_contract_mode = FP_CONTRACT_OFF;
+  if (!flag_unsafe_math_optimizations && !OPTION_SET_P (flag_fp_contract_mode))
+flag_fp_contract_mode = flag_iso ? FP_CONTRACT_OFF : FP_CONTRACT_ON;
 
   /* C language modes before C99 enable -fpermissive by default, but
  only if -pedantic-errors is not specified.  Also treat
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 91b0a201e1..b3a77320ee 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13070,16 +13070,17 @@ Disabled by default.
 @opindex ffp-contract
 @item -ffp-contract=@var{style}
 @option{-ffp-contract=off} disables floating-point expression contraction.
+This is the default for C and C++ in a standards compliant mode
+(@option{-std=c11}, @option{-std=c++11} or similar).
 @option{-ffp-contract=fast} enables floating-point expression contraction
 such as forming of fused multiply-add ope

[PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-24 Thread Christoph Müllner

insn_info::has_been_deleted () is documented to return true if an
instruction is deleted.  Such instructions have their `volatile` bit set,
which can be tested via rtx_insn::deleted ().

The current condition for insn_info::has_been_deleted () is:
* m_rtl is not NULL: this can't happen as no member of insn_info
  changes this pointer.
* !INSN_P (m_rtl): this will likely fail for rtx_insn objects and
  does not test the `volatile` bit.

This patch drops these conditions and calls m_rtl->deleted () instead.

The impact of this change is minimal as insn_info::has_been_deleted
is only called in insn_info::print_full.

Bootstrapped and regtested x86_64-linux.

gcc/ChangeLog:

* rtl-ssa/insns.h: Fix implementation of has_been_deleted ().

Signed-off-by: Christoph Müllner 
---
 gcc/rtl-ssa/insns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h
index d89dfc5c3f66..bb3f52efa83a 100644
--- a/gcc/rtl-ssa/insns.h
+++ b/gcc/rtl-ssa/insns.h
@@ -186,7 +186,7 @@ public:
   // Return true if the instruction was a real instruction but has now
   // been deleted.  In this case the instruction is no longer part of
   // the SSA information.
-  bool has_been_deleted () const { return m_rtl && !INSN_P (m_rtl); }
+  bool has_been_deleted () const { return m_rtl->deleted (); }
 
   // Return true if the instruction is a debug instruction (and thus
   // also a real instruction).
-- 
2.49.0

[PATCH v6 0/9] AArch64: CMPBR support

2025-06-24 Thread Karl Meakin

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Changelog:
* v6:
  - Correct the constraint string for immediate operands.
  - Drop the commit for adding `%j` format specifiers. The suffix for
the `cb` instruction is now calculated by the `cmp_op` code
attribute.
* v5:
  - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
CMPBR...). Every commit in the series should now produce a correct
compiler.
  - Reduce excessive diff context by not passing `--function-context` to
`git format-patch`.
* v4:
  - Added a commit to use HS/LO instead of CS/CC mnemonics.
  - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
CBHS, CBLE and CBLS have different ranges of allowed immediates than
the other comparisons.

Testing done:
`make bootstrap; make check`

Karl Meakin (9):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |   37 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  563 +++---
 gcc/config/aarch64/constraints.md |   18 +
 gcc/config/aarch64/iterators.md   |   19 +
 gcc/config/aarch64/predicates.md  |   15 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1667 +
 gcc/testsuite/lib/target-supports.exp |   14 +-
 13 files changed, 2123 insertions(+), 224 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

-- 
2.45.2

[PATCH v7 1/3] Extend "counted_by" attribute to pointer fields of structures.

2025-06-24 Thread Qing Zhao

And convert a pointer reference with counted_by attribute to 
 .ACCESS_WITH_SIZE.

For example:

struct PP {
  size_t count2;
  char other1;
  char *array2 __attribute__ ((counted_by (count2)));
  int other2;
} *pp;

specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_counted_by_attribute): Accept counted_by
attribute for pointer fields.

gcc/c/ChangeLog:

* c-decl.cc (verify_counted_by_attribute): Change the 2nd argument
to a vector of fields with counted_by attribute. Verify all fields
in this vector.
(finish_struct): Collect all the fields with counted_by attribute
to a vector and pass this vector to verify_counted_by_attribute.
* c-typeck.cc (build_counted_by_ref): Handle pointers with counted_by.
Add one more argument, issue error when the pointee type is a structure
or union including a flexible array member.
(build_access_with_size_for_counted_by): Handle pointers with 
counted_by.
(handle_counted_by_for_component_ref): Call build_counted_by_ref
with the new prototype.

gcc/ChangeLog:

* doc/extend.texi: Extend counted_by attribute to pointer fields in
structures. Add one more requirement to pointers with counted_by
attribute.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by.c: Update test.
* gcc.dg/pointer-counted-by-1.c: New test.
* gcc.dg/pointer-counted-by-2.c: New test.
* gcc.dg/pointer-counted-by-3.c: New test.
* gcc.dg/pointer-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc|  44 ++-
 gcc/c/c-decl.cc  |  91 +++--
 gcc/c/c-typeck.cc|  60 ++---
 gcc/doc/extend.texi  |  41 +-
 gcc/testsuite/gcc.dg/flex-array-counted-by.c |   2 +-
 gcc/testsuite/gcc.dg/pointer-counted-by-1.c  |  34 +
 gcc/testsuite/gcc.dg/pointer-counted-by-2.c  |  10 ++
 gcc/testsuite/gcc.dg/pointer-counted-by-3.c  | 127 +++
 gcc/testsuite/gcc.dg/pointer-counted-by.c| 111 
 9 files changed, 450 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..a276e5c1f1f 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2906,22 +2906,53 @@ handle_counted_by_attribute (tree *node, tree name,
" declaration %q+D", name, decl);
   *no_add_attrs = true;
 }
-  /* This attribute only applies to field with array type.  */
-  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+  /* This attribute only applies to a field with array type or pointer type.  
*/
+  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE)
 {
   error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute is not allowed for a non-array field",
-   name);
+   "%qE attribute is not allowed for a non-array"
+   " or non-pointer field", name);
   *no_add_attrs = true;
 }
   /* This attribute only applies to a C99 flexible array member type.  */
-  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
+  else if (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
+  && !c_flexible_array_member_type_p (TREE_TYPE (decl)))
 {
   error_at (DECL_SOURCE_LOCATION (decl),
"%qE attribute is not allowed for a non-flexible"
" array member field", name);
   *no_add_attrs = true;
 }
+  /* This attribute cannot be applied to a pointer to void type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) == VOID_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute is not allowed for a pointer to void",
+   name);
+  *no_add_attrs = true;
+}
+  /* This attribute cannot be applied to a pointer to function type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) == FUNCTION_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute is not allowed for a pointer to"
+   " function", name);
+  *no_add_attrs = true;
+}
+  /* This attribute cannot be applied to a pointer to structure or union
+ with flexible array member.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (TREE_TYPE (decl)))

[PATCH v7 3/3] Use the counted_by attribute of pointers in array bound checker.

2025-06-24 Thread Qing Zhao

Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.

When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref.  I.e.

Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
  OFFSET:
   ((long unsigned int) m * (long unsigned int) SAVE_EXPR ) * 4
  ELEMENT_SIZE:
   (sizetype) SAVE_EXPR  * 4
get the index as (long unsigned int) m.

gcc/c-family/ChangeLog:

* c-gimplify.cc (is_address_with_access_with_size): New function.
(ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base
address is .ACCESS_WITH_SIZE or an address computation whose base
address is .ACCESS_WITH_SIZE.
* c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function.
(struct factor_t): New structure.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array_address): New function.
(ubsan_array_ref_instrumented_p): Change prototype.
Handle MEM_REF in addtional to ARRAY_REF.
(ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional
to ARRAY_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
---
 gcc/c-family/c-gimplify.cc|  28 ++
 gcc/c-family/c-ubsan.cc   | 316 +-
 .../ubsan/pointer-counted-by-bounds-2.c   |  51 +++
 .../ubsan/pointer-counted-by-bounds-3.c   |  42 +++
 .../ubsan/pointer-counted-by-bounds-4.c   |  42 +++
 .../ubsan/pointer-counted-by-bounds-5.c   |  40 +++
 .../gcc.dg/ubsan/pointer-counted-by-bounds.c  |  46 +++
 7 files changed, 549 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-5.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index c6fb7646567..e905059708f 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -66,6 +66,20 @@ along with GCC; see the file COPYING3.  If not see
 walk back up, we check that they fit our constraints, and copy them
 into temporaries if not.  */
 
+
+/* Check whether TP is an address computation whose base is a call to
+   .ACCESS_WITH_SIZE.  */
+
+static bool
+is_address_with_access_with_size (tree tp)
+{
+  if (TREE_CODE (tp) == POINTER_PLUS_EXPR
+  && (TREE_CODE (TREE_OPERAND (tp, 0)) == INDIRECT_REF)
+  && (is_access_with_size_p (TREE_OPERAND (TREE_OPERAND (tp, 0), 0
+   return true;
+  return false;
+}
+
 /* Callback for c_genericize.  */
 
 static tree
@@ -121,6 +135,20 @@ ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, 
void *data)
   walk_tree (&TREE_OPERAND (*tp, 1), ubsan_walk_array_refs_r, pset, pset);
   walk_tree (&TREE_OPERAND (*tp, 0), ubsan_walk_array_refs_r, pset, pset);
 }
+  else if (TREE_CODE (*tp) == INDIRECT_REF
+  && is_address_with_access_with_size (TREE_OPERAND (*tp, 0)))
+{
+  ubsan_maybe_instrument_array_ref (&TREE_OPERAND (*tp, 0), false);
+  /* Make sure ubsan_maybe_instrument_array_ref is not called again on
+the POINTER_PLUS_EXPR, so ensure it is not walked again and walk
+its subtrees manually.  */
+  tree aref = TREE_OPERAND (*tp, 0);
+  pset->add (aref);
+  *walk_subtrees = 0;
+  walk_tree (&TREE_OPERAND (aref, 0), ubsan_walk_array_refs_r, pset, pset);
+}
+  else if (is_address_with_access_with_size (*tp))
+ubsan_maybe_instrument_array_ref (tp, true);
   return NULL_TREE;
 }
 
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 78b78685469..38514a4046c 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -554,38 +554,322 @@ ubsan_instrument_bounds (location_t loc, tree array, 
tree *index,
   *index, bound);
 }
 
-/* Return true iff T is an array that was instrumented by SANITIZE_BOUNDS.  */
+
+/* Instrument array bounds for the pointer array address which is
+   an INDIRECT_REF to the call to .ACCESS_WITH_SIZE.  We create special
+   builtin, that gets expanded in the sanopt pass, and make an array
+   dimention of it.  POI

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jakub Jelinek

On Tue, Jun 24, 2025 at 08:25:33PM +0200, Jakub Jelinek wrote:
> > > know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
> > > to figure out what needs to change there.  It is hint -2 that
> > > fails, not hint -1.
> > 
> > Yes, this is a -2 case because C does not derive from B.
> > 
> > How does cxx_eval_dynamic_cast_fn fail in this case?  From looking at the
> > function it seems like it ought to work.
> 
> I'll study it in detail tomorrow.

Actually, I see the reason now.
get_component_path is called with
a.D.2692, A, NULL
where a.D.2692 has B type.
And the reason why it fails is
2538  /* We need to check that the component we're accessing is in 
fact
2539 accessible.  */
2540  if (TREE_PRIVATE (TREE_OPERAND (path, 1))
2541  || TREE_PROTECTED (TREE_OPERAND (path, 1)))
2542return error_mark_node;
The D.2692 FIELD_DECL has been created by build_base_field_1
called from build_base_field from layout_virtual_bases and that one calls it
with
6753  if (!BINFO_PRIMARY_P (vbase))
6754{
6755  /* This virtual base is not a primary base of any class in the
6756 hierarchy, so we have to add space for it.  */
6757  next_field = build_base_field (rli, vbase,
6758 access_private_node,
6759 offsets, next_field);
6760}
access_private_node forces TREE_PRIVATE on the FIELD_DECL and so it doesn't
reflect whether the base in question was private/protected or public.
struct A has also D.2689 FIELD_DECL with C type and that one is the primary
base, neither TREE_PRIVATE nor TREE_PROTECTED.

Jakub

Re: [PATCH] s390: Fix float vector extract for pre-z13

2025-06-24 Thread Stefan Schulze Frielinghaus

On Fri, Jun 20, 2025 at 08:23:01PM +0200, Juergen Christ wrote:
> Also provide the vec_extract patterns for floats on pre-z13 machines
> to prevent ICEing in those cases.
> 
> Bootstrapped and regtested on s390.

Ok.

Thanks,
Stefan

> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md (VF): Don't restrict modes.
>   * config/s390/vector.md (VEC_SET_SINGLEFLOAT): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/vec-extract-1.c: Fix test on arch11.
>   * gcc.target/s390/vector/vec-set-1.c: Run test on arch11.
>   * gcc.target/s390/vector/vec-extract-2.c: New test.
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/vector.md |   4 +-
>  .../gcc.target/s390/vector/vec-extract-1.c|  16 +-
>  .../gcc.target/s390/vector/vec-extract-2.c| 168 ++
>  .../gcc.target/s390/vector/vec-set-1.c|  23 ++-
>  4 files changed, 187 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 6f4e1929eb80..7251a76c3aea 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -75,7 +75,7 @@
>  V1DF V2DF
>  (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
>  
> -(define_mode_iterator VF [(V2SF "TARGET_VXE") (V4SF "TARGET_VXE") V2DF])
> +(define_mode_iterator VF [V2SF V4SF V2DF])
>  
>  ; All modes present in V_HW1 and VFT.
>  (define_mode_iterator V_HW1_FT [V16QI V8HI V4SI V2DI V1TI V1DF
> @@ -512,7 +512,7 @@
>  (define_mode_iterator VEC_SET_NONFLOAT
>[V1QI V2QI V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI 
> V2SF V4SF])
>  ; Iterator for single element float vectors
> -(define_mode_iterator VEC_SET_SINGLEFLOAT [(V1SF "TARGET_VXE") V1DF (V1TF 
> "TARGET_VXE")])
> +(define_mode_iterator VEC_SET_SINGLEFLOAT [V1SF V1DF (V1TF "TARGET_VXE")])
>  
>  ; FIXME: Support also vector mode operands for 1
>  ; FIXME: A target memory operand seems to be useful otherwise we end
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c 
> b/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
> index 9df7909a3ea8..83af839963be 100644
> --- a/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
> +++ b/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -march=z14 -mzarch" } */
> +/* { dg-options "-O2 -march=arch11 -mzarch" } */
>  /* { dg-final { check-function-bodies "**" "" } } */
>  
>  typedef double V2DF __attribute__((vector_size(16)));
> @@ -110,17 +110,6 @@ extractnthfloat (V4SF x, int n)
>return x[n];
>  }
>  
> -/*
> -** sumfirstfloat:
> -**   vfasb   %v0,%v24,%v26
> -**   br  %r14
> -*/
> -float
> -sumfirstfloat (V4SF x, V4SF y)
> -{
> -  return (x + y)[0];
> -}
> -
>  /*
>  ** extractfirst2:
>  **   vlr %v0,%v24
> @@ -179,8 +168,7 @@ extractsingled (V1DF x)
>  
>  /*
>  ** extractsingleld:
> -**   vlr (%v.),%v24
> -**   vst \1,0\(%r2\),3
> +**   vst %v24,0\(%r2\),3
>  **   br  %r14
>  */
>  long double
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c 
> b/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c
> new file mode 100644
> index ..640ac0c8c766
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c
> @@ -0,0 +1,168 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=arch11 -mzarch" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +typedef double V2DF __attribute__((vector_size(16)));
> +typedef float V4SF __attribute__((vector_size(16)));
> +typedef float V2SF __attribute__((vector_size(8)));
> +typedef double V1DF __attribute__((vector_size(8)));
> +typedef float V1SF __attribute__((vector_size(4)));
> +typedef long double V1TF __attribute__((vector_size(16)));
> +
> +/*
> +** extractfirstdouble:
> +**   vsteg   %v24,0\(%r2\),0
> +**   br  %r14
> +*/
> +void
> +extractfirstdouble (double *res, V2DF x)
> +{
> +  *res = x[0];
> +}
> +
> +/*
> +** extractseconddouble:
> +**   vsteg   %v24,0\(%r2\),1
> +**   br  %r14
> +*/
> +void
> +extractseconddouble (double *res, V2DF x)
> +{
> +  *res = x[1];
> +}
> +
> +/*
> +** extractnthdouble:
> +**   vlgvg   (%r.),%v24,0\(%r3\)
> +**   stg \1,0\(%r2\)
> +**   br  %r14
> +*/
> +void
> +extractnthdouble (double *res, V2DF x, int n)
> +{
> +  *res = x[n];
> +}
> +
> +/*
> +** extractfirstfloat:
> +**   vstef   %v24,0\(%r2\),0
> +**   br  %r14
> +*/
> +void
> +extractfirstfloat (float *res, V4SF x)
> +{
> +  *res = x[0];
> +}
> +
> +/*
> +** extractsecondfloat:
> +**   vstef   %v24,0\(%r2\),1
> +**   br  %r14
> +*/
> +void
> +extractsecondfloat (float *res, V4SF x)
> +{
> +  *res = x[1];
> +}
> +
> +/*
> +** extractthirdfloat:
> +**   vstef   %v24,0\(%r2\),2
> +**   br  %r14
> +*/
> +void
> +extractthirdfloat (float *res, V4SF x)
> +{
> +  *res = x[2];
> +}
> +
> +/*
> +** ext

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> Richard Biener  writes:
>> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> >> (from h8300).  This is also why simplify_gen_subreg has:
>> >> 
>> >>   if (GET_CODE (op) == SUBREG
>> >>   || GET_CODE (op) == CONCAT
>> >>   || GET_MODE (op) == VOIDmode)
>> >> return NULL_RTX;
>> >> 
>> >>   if (MODE_COMPOSITE_P (outermode)
>> >>   && (CONST_SCALAR_INT_P (op)
>> >> || CONST_DOUBLE_AS_FLOAT_P (op)
>> >> || CONST_FIXED_P (op)
>> >> || GET_CODE (op) == CONST_VECTOR))
>> >> return NULL_RTX;
>> >> 
>> >> rather than the !REG_P (op) && !MEM_P (op) that the documentation
>> >> would imply.
>> >
>> > So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
>> > we don't seem to ever legitmize constants wrapped in a SUBREG, so
>> > we shouldn't generate a SUBREG of a constant (in the middle-end)?
>> 
>> Hmm, yeah, maybe.  I'd originally rejected that because I assumed
>> the MODE_COMPOSITE_P was there for a reason.  But looking at the
>> history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f,
>> where the reason given was:
>> 
>> As for the simplify_gen_subreg change, I think it would be desirable
>> to just avoid creating SUBREGs of constants on all targets and for all
>> constants, if simplify_immed_subreg simplified, fine, otherwise punt,
>> but as we are late in GCC11 development, the patch instead guards this
>> behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
>> powerpc{,64,64le} double double long double - and only for the cases 
>> where
>> simplify_immed_subreg was called.
>
> So then I'd say we want to give
>
>   if (CONSTANT_P (op))
> return NULL_RTX;
>
> a try?

I suppose we could try it, although then the CONSTANT_P test should
replace the VOIDmode test in:

  if (GET_CODE (op) == SUBREG
  || GET_CODE (op) == CONCAT
  || GET_MODE (op) == VOIDmode)
return NULL_RTX;

The danger is that CONSTANT_P also includes CONST, SYMBOL_REF, and
LABEL_REF, which AFAIK are not expected to be simplified away.

Richard

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Jerry D


On 6/24/25 6:09 AM, Andre Vehreschild wrote:

Hi all,

this series of patches (six in total) adds a new coarray backend library to
libgfortran.  The library uses shared memory and processes to implement
running multiple images on the same node.  The work is based on work started by
Thomas and Nicolas Koenig. No changes to the gfortran compile part are required
for this.


--- snip ---

Hi Andre,

Thank you for this work. I have been wanting this functionality for 
several years!


I will begin reviewing as best I can.  I did see Paul's initial comment 
so your feedback on that would be appreciated.


Best regards,

Jerry

[Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Andre Vehreschild

Hi all,

this series of patches (six in total) adds a new coarray backend library to
libgfortran.  The library uses shared memory and processes to implement
running multiple images on the same node.  The work is based on work started by
Thomas and Nicolas Koenig. No changes to the gfortran compile part are required
for this.

Unfortunately I found some defects in the gfortran compiler, that needed to be
fixed. These are the first four tiny patches. The fifth patch then adds the
library and sixth patches the testcases in gcc/testsuite/gfortran.dg/coarray to
also run (and pass) when linked against caf_shmem.

The development has been done on x86_64-pc-linux-gnu / Fedora 41. I am curious
to learn which fixes will be needed for other platforms. 

This will be the last big patch that was funded by the STF/STA. My funding has
run out and I will only be available for a few days before a new project will
consume my attention. Therefore please bring any deficiencies to my attention
as soon as possible.

I have done some performance measurement against OpenCoarrays measuring
coarray_icarr from https://github.com/gutmann/coarray_icar . The figures are:

OpenCoarrays (mpich4-backend): 165.578s (real 2m59,947s)
--
caf_shmem (16-trunk): 61.489s (real 1m3,681s)

The first number is the "Model run time:" as reported by the program. In the
parentheses the real run time as reported by the bash command `time` is given.

Both are done using a debug build of coarray_icar on an Intel Core i7-5775C CPU
@ 3.30GHz having 24GB, and running Fedora Linux 41 with all recent patches.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH] Fortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]

2025-06-24 Thread Harald Anlauf


Dear all,

here's an obvious fix for a recent regression: substring offset
calculations used a wrong type that crashed in gimplification.
Andre basically OK'ed it in the PR, but here it is nevertheless.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 5bc92717b804483a17dd5095f8b6d4fd75a472b1 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 24 Jun 2025 20:46:38 +0200
Subject: [PATCH] Fortran: fix ICE in verify_gimple_in_seq with substrings
 [PR120743]

	PR fortran/120743

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_substring): Substring indices are of
	type gfc_charlen_type_node.  Convert to size_type_node for
	pointer arithmetic only after offset adjustments have been made.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr120743.f90: New test.

Co-authored-by: Jerry DeLisle 
Co-authored-by: Mikael Morin 
---
 gcc/fortran/trans-expr.cc  |  5 ++--
 gcc/testsuite/gfortran.dg/pr120743.f90 | 38 ++
 2 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr120743.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index c8a207609e4..3e0d763d2fb 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2800,8 +2800,9 @@ gfc_conv_substring (gfc_se * se, gfc_ref * ref, int kind,
   else if (POINTER_TYPE_P (TREE_TYPE (tmp)))
 	{
 	  tree diff;
-	  diff = fold_build2 (MINUS_EXPR, size_type_node, start.expr,
-			  build_one_cst (size_type_node));
+	  diff = fold_build2 (MINUS_EXPR, gfc_charlen_type_node, start.expr,
+			  build_one_cst (gfc_charlen_type_node));
+	  diff = fold_convert (size_type_node, diff);
 	  se->expr
 	= fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (tmp), tmp, diff);
 	}
diff --git a/gcc/testsuite/gfortran.dg/pr120743.f90 b/gcc/testsuite/gfortran.dg/pr120743.f90
new file mode 100644
index 000..8682d0c8859
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr120743.f90
@@ -0,0 +1,38 @@
+! { dg-do compile }
+! PR fortran/120743 - ICE in verify_gimple_in_seq with substrings
+!
+! Testcase as reduced by Jerry DeLisle 
+
+module what
+  implicit none
+  CHARACTER(LEN=:), ALLOCATABLE :: attrlist
+contains
+  SUBROUTINE get_c_attr ( attrname, attrval_c )
+!
+! returns attrval_c='' if not found
+!
+IMPLICIT NONE
+CHARACTER(LEN=*), INTENT(IN) :: attrname
+CHARACTER(LEN=*), INTENT(OUT) :: attrval_c
+!
+CHARACTER(LEN=1) :: quote
+INTEGER :: j0, j1
+LOGICAL :: found
+!
+! search for attribute name in attrlist: attr1="val1" attr2="val2" ...
+!
+attrval_c = ''
+if ( .not. allocated(attrlist) ) return
+if ( len_trim(attrlist) < 1 ) return
+!
+j0 = 1
+do while ( j0 < len_trim(attrlist) )
+   ! locate = and first quote
+   j1 = index ( attrlist(j0:), '=' )
+   quote = attrlist(j0+j1:j0+j1)
+   ! next line: something is not right
+   if ( quote /= '"' .and. quote /= "'" ) return
+end do
+!
+  END SUBROUTINE get_c_attr
+end module what
-- 
2.43.0

[PATCH][www] Complete the list of supported languages in gcc-16/criteria.html

2025-06-24 Thread Richard Biener



Pushed.

* gcc-16/criteria.html: Mention Modula-2, Cobol and Rust.
Order alphabetically.
---
 htdocs/gcc-16/criteria.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-16/criteria.html b/htdocs/gcc-16/criteria.html
index 6bca431a..85fa39cd 100644
--- a/htdocs/gcc-16/criteria.html
+++ b/htdocs/gcc-16/criteria.html
@@ -36,7 +36,7 @@ then that criterion may be abandoned.
 Languages
 
 GCC supports several programming languages, including Ada, C, C++,
-Fortran, Objective-C, Objective-C++, Go, and D.
+Cobol, D, Fortran, Go, Modula-2, Objective-C, Objective-C++, and Rust.
 For the purposes of making releases,
 however, we will consider primarily C and C++, as those are the
 languages used by the vast majority of users.  Therefore, if below
-- 
2.43.0

Re: [Patch] gcn: Fix glc vs. sc0 handling for scalar memory access

2025-06-24 Thread Tobias Burnus


Andrew Stubbs:
You still seem to have the unrelated preload bits in this patch, but 
other than that, this looks fine.

Now committed with that one removed: r16-1661-g750bc2899844d6


In principle, we could use %Gn everywhere and use the address space 
from the MEM to determine which cache to use, but that's probably 
overkill until we need it.


I think it would good to properly use the right sc0/sc1/nt for memory 
access, but I concur that's something for a bit later.


Tobias
commit 750bc2899844d662aee93476f2da63fce68535d9
Author: Tobias Burnus 
Date:   Tue Jun 24 23:55:27 2025 +0200

gcn: Fix glc vs. sc0 handling for scalar memory access

gfx942 still uses glc for scalar access ('s_...') and only uses
sc0/nt/sc1 for vector access.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GLC_NAME): Fix and extend the
description in the comment.
* config/gcn/gcn.cc (print_operand): Extend the comment about
'G' and 'g'.
* config/gcn/gcn.md: Use 'glc' instead of %G where appropriate.
---
 gcc/config/gcn/gcn-opts.h |  7 +--
 gcc/config/gcn/gcn.cc |  2 ++
 gcc/config/gcn/gcn.md | 30 +++---
 3 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index bcea14f3fe7..0bfc7869eef 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -84,8 +84,11 @@ enum hsaco_attr_type
 #define TARGET_DPP8 TARGET_RDNA2_PLUS
 /* Device requires CDNA1-style manually inserted wait states for AVGPRs.  */
 #define TARGET_AVGPR_CDNA1_NOPS TARGET_CDNA1
-/* Whether to use the 'globally coherent' (glc) or the 'scope' (sc0, sc1) flag
-   for scalar memory operations. The string starts on purpose with a space.  */
+/* Whether to use the 'globally coherent' (glc) or the 'scope' (sc0) flag
+   for non-scalar memory operations. The string starts on purpose with a space.
+   Note: for scalar memory operations (i.e. 's_...'), 'glc' is still used.
+   CDNA3 also uses 'nt' instead of 'slc' and 'sc1' instead of 'scc'; however,
+   there is no non-scalar user so far.  */
 #define TARGET_GLC_NAME (TARGET_CDNA3 ? " sc0" : " glc")
 /* The metadata on different devices need different granularity.  */
 #define TARGET_VGPR_GRANULARITY \
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 2d8dfa3232e..0ce5a29fbb5 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -7103,6 +7103,8 @@ print_operand_address (FILE *file, rtx mem)
O - print offset:n for data share operations.
G - print "glc" (or for gfx94x: sc0) unconditionally [+ indep. of regnum]
g - print "glc" (or for gfx94x: sc0), if appropriate for given MEM
+   NOTE: Do not use 'G' or 'g with scalar memory access ('s_...') as those
+   require "glc" also with gfx94x.
L - print low-part of a multi-reg value
H - print second part of a multi-reg value (high-part of 2-reg value)
J - print third part of a multi-reg value
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 1998931e052..2ce2e054fbf 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -206,7 +206,7 @@ (define_c_enum "unspec" [
 ;	 vdata: vgpr0-255
 ;	 srsrc: sgpr0-102
 ;	 soffset: sgpr0-102
-;	 flags: offen, idxen, %G, lds, slc, tfe
+;	 flags: offen, idxen, glc, lds, slc, tfe
 ;
 ; mtbuf - Typed memory buffer operation. Two words
 ;	 offset: 12-bit constant
@@ -216,10 +216,10 @@ (define_c_enum "unspec" [
 ;	 vdata: vgpr0-255
 ;	 srsrc: sgpr0-102
 ;	 soffset: sgpr0-102
-;	 flags: offen, idxen, %G, lds, slc, tfe
+;	 flags: offen, idxen, glc, lds, slc, tfe
 ;
 ; flat - flat or global memory operations
-;	 flags: %G, slc
+;	 flags: {CDNA3: sc0, nt, sc1 | otherwise: glc, slc, scc}
 ;	 addr: vgpr0-255
 ;	 data: vgpr0-255
 ;	 vdst: vgpr0-255
@@ -1987,7 +1987,7 @@ (define_insn "atomic_fetch_"
(use (match_operand 3 "const_int_operand"))]
   "0 /* Disabled.  */"
   "@
-   s_atomic_\t%0, %1, %2 %G2\;s_waitcnt\tlgkmcnt(0)
+   s_atomic_\t%0, %1, %2 glc\;s_waitcnt\tlgkmcnt(0)
flat_atomic_\t%0, %1, %2 %G2\;s_waitcnt\t0
global_atomic_\t%0, %A1, %2%O1 %G2\;s_waitcnt\tvmcnt(0)"
   [(set_attr "type" "smem,flat,flat")
@@ -2054,7 +2054,7 @@ (define_insn "sync_compare_and_swap_insn"
 	  UNSPECV_ATOMIC))]
   ""
   "@
-   s_atomic_cmpswap\t%0, %1, %2 %G2\;s_waitcnt\tlgkmcnt(0)
+   s_atomic_cmpswap\t%0, %1, %2 glc\;s_waitcnt\tlgkmcnt(0)
flat_atomic_cmpswap\t%0, %1, %2 %G2\;s_waitcnt\t0
global_atomic_cmpswap\t%0, %A1, %2%O1 %G2\;s_waitcnt\tvmcnt(0)"
   [(set_attr "type" "smem,flat,flat")
@@ -2096,7 +2096,7 @@ (define_insn "atomic_load"
 	switch (which_alternative)
 	  {
 	  case 0:
-	return "s_load%o0\t%0, %A1 %G1\;s_waitcnt\tlgkmcnt(0)";
+	return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)";
 	  case 1:
 	return (TARGET_RDNA2 /* Not GFX11.  */
 		? "flat_load%o0\t%0, %A1%O1 %G1 dlc\;s_waitcnt\t0"
@@ -2113,7 +2113,7 @@ (define_insn "atomic_load"
 	swit

[PATCH] lra: Check for null lowpart_subregs [PR120733]

2025-06-24 Thread Richard Sandiford

lra-eliminations.cc:move_plus_up tries to:

   Transform (subreg (plus reg const)) to (plus (subreg reg) const)
   when it is possible.

Most of it is heavily conditional:

  if (!paradoxical_subreg_p (x)
  && GET_CODE (subreg_reg) == PLUS
  && CONSTANT_P (XEXP (subreg_reg, 1))
  && GET_MODE_CLASS (x_mode) == MODE_INT
  && GET_MODE_CLASS (subreg_reg_mode) == MODE_INT)
{
  rtx cst = simplify_subreg (x_mode, XEXP (subreg_reg, 1), subreg_reg_mode,
 subreg_lowpart_offset (x_mode,
subreg_reg_mode));
  if (cst && CONSTANT_P (cst))

but the final:

return gen_rtx_PLUS (x_mode, lowpart_subreg (x_mode,
 XEXP (subreg_reg, 0),
 subreg_reg_mode), cst);

assumed without checking that lowpart_subreg succeeded.  In the PR,
this led to creating a PLUS with a null operand.

In more detail, the testcase had:

(var_location a (plus:SI (subreg:SI (reg/f:DI 64 sfp) 0)
(const_int -4 [0xfffc])))

with sfp being eliminated to (plus:DI (reg:DI sp) (const_int 16)).
Initially, during the !subst_p phase, lra_eliminate_regs_1 sees
the PLUS and recurses into each operand.  The recursive call sees
the SUBREG and recurses into the SUBREG_REG.  Since !subst_p,
this final recursive call replaces (reg:DI sfp) with:

(plus:DI (reg:DI sfp) (const_int 16))

(i.e. keeping the base register the same).  So the SUBREG is
eliminated to:

(subreg:SI (plus:DI (reg:DI sfp) (const_int 16)) 0)

The PLUS handling in lra_eliminate_regs_1 then passes this to
move_plus_up, which tries to push the SUBREG into the PLUS.
This means trying to create:

(plus:SI (simplify_gen_subreg:SI (reg:DI sfp) 0) (const_int 16))

The simplify_gen_subreg then returns null, because simplify_subreg_regno
fails both with allow_stack_regs==false (when trying to simplify the
SUBREG to a REG) and with allow_stack_regs=true (when validating
whether the SUBREG can be generated).  And that in turn happens
because aarch64 refuses to allow SImode to be stored in sfp:

  if (regno == SP_REGNUM)
/* The purpose of comparing with ptr_mode is to support the
   global register variable associated with the stack pointer
   register via the syntax of asm ("wsp") in ILP32.  */
return mode == Pmode || mode == ptr_mode;

  if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
return mode == Pmode;

This seems dubious.  If the frame pointer can hold a DImode value then it
can also hold an SImode value.  There might be limited cases when the low
32 bits of the frame pointer are useful, but aarch64_hard_regno_mode_ok
doesn't have the context to second-guess things like that.  It seemed
from a quick scan of other targets that they behave more as I'd expect.

So there might be a target bug here too.  But it seemed worth fixing the
unchecked use of lowpart_subreg independently of that.

The patch fixes an existing ICE in gcc.c-torture/compile/pass.c.

gcc/
PR rtl-optimization/120733
* lra-eliminations.cc (move_plus_up): Check whether lowpart_subreg
returns null.
---
 gcc/lra-eliminations.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index bb708b007a4..c651e70d274 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -302,9 +302,12 @@ move_plus_up (rtx x)
 subreg_lowpart_offset (x_mode,
subreg_reg_mode));
   if (cst && CONSTANT_P (cst))
-   return gen_rtx_PLUS (x_mode, lowpart_subreg (x_mode,
-XEXP (subreg_reg, 0),
-subreg_reg_mode), cst);
+   {
+ rtx lowpart = lowpart_subreg (x_mode, XEXP (subreg_reg, 0),
+   subreg_reg_mode);
+ if (lowpart)
+   return gen_rtx_PLUS (x_mode, lowpart, cst);
+   }
 }
   return x;
 }
-- 
2.43.0

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Richard Biener

On Tue, Jun 24, 2025 at 1:18 PM Alexander Monakov  wrote:
>
> > > On Fri, May 23, 2025 at 2:31 PM Alexander Monakov  
> > > wrote:
> > > >
> > > > In PR 105965 we accepted a request to form FMA instructions when the
> > > > source code is using a narrow generic vector that contains just one
> > > > element, corresponding to V1SF or V1DF mode, while the backend does not
> > > > expand fma patterns for such modes.
> > > >
> > > > For this to work under -ffp-contract=on, we either need to modify
> > > > backends, or emulate such degenerate-vector FMA via scalar FMA in
> > > > tree-vect-generic.  Do the latter.
> > >
> > > Can you instead apply the lowering during gimplification?  That is because
> > > having an unsupported internal-function in the IL the user could not have
> > > emitted directly is somewhat bad.  I thought the vector lowering could
> > > be generalized for more single-argument internal functions but then no
> > > such unsupported calls should exist in the first place.
> >
> > Sure, like below?  Not fully tested yet.
>
> Ping — now bootstrapped and regtested.

LGTM.

Thanks,
Richard.

> > -- 8< --
> >
> > From 4caee92434d9425912979b285725166b22f40a87 Mon Sep 17 00:00:00 2001
> > From: Alexander Monakov 
> > Date: Wed, 21 May 2025 18:35:45 +0300
> > Subject: [PATCH v2] allow contraction to synthetic single-element vector FMA
> >
> > In PR 105965 we accepted a request to form FMA instructions when the
> > source code is using a narrow generic vector that contains just one
> > element, corresponding to V1SF or V1DF mode, while the backend does not
> > expand fma patterns for such modes.
> >
> > For this to work under -ffp-contract=on, we either need to modify
> > backends, or emulate such degenerate-vector FMA via scalar FMA.
> > Do the latter, in gimplification hook together with contraction.
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-gimplify.cc (fma_supported_p): Allow forming single-element
> >   vector FMA when scalar FMA is available.
> >   (c_gimplify_expr): Allow vector types.
> > ---
> >  gcc/c-family/c-gimplify.cc | 50 ++
> >  1 file changed, 40 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > index c6fb764656..6c313287e6 100644
> > --- a/gcc/c-family/c-gimplify.cc
> > +++ b/gcc/c-family/c-gimplify.cc
> > @@ -870,12 +870,28 @@ c_build_bind_expr (location_t loc, tree block, tree 
> > body)
> >return bind;
> >  }
> >
> > +enum fma_expansion
> > +{
> > +  FMA_NONE,
> > +  FMA_DIRECT,
> > +  FMA_VEC1_SYNTHETIC
> > +};
> > +
> >  /* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
> >
> > -static bool
> > +static fma_expansion
> >  fma_supported_p (enum internal_fn fn, tree type)
> >  {
> > -  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> > +  if (direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH))
> > +return FMA_DIRECT;
> > +  /* Accept single-element vector FMA (see PR 105965) when the
> > + backend handles the scalar but not the vector mode.  */
> > +  if (VECTOR_TYPE_P (type)
> > +  && known_eq (TYPE_VECTOR_SUBPARTS (type),  1U)
> > +  && direct_internal_fn_supported_p (fn, TREE_TYPE (type),
> > +  OPTIMIZE_FOR_BOTH))
> > +return FMA_VEC1_SYNTHETIC;
> > +  return FMA_NONE;
> >  }
> >
> >  /* Gimplification of expression trees.  */
> > @@ -936,13 +952,14 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > ATTRIBUTE_UNUSED,
> >  case MINUS_EXPR:
> >{
> >   tree type = TREE_TYPE (*expr_p);
> > + enum fma_expansion how;
> >   /* For -ffp-contract=on we need to attempt FMA contraction only
> >  during initial gimplification.  Late contraction across statement
> >  boundaries would violate language semantics.  */
> > - if (SCALAR_FLOAT_TYPE_P (type)
> > + if ((SCALAR_FLOAT_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type))
> >   && flag_fp_contract_mode == FP_CONTRACT_ON
> >   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> > - && fma_supported_p (IFN_FMA, type))
> > + && (how = fma_supported_p (IFN_FMA, type)) != FMA_NONE)
> > {
> >   bool neg_mul = false, neg_add = code == MINUS_EXPR;
> >
> > @@ -973,7 +990,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > ATTRIBUTE_UNUSED,
> >   enum internal_fn ifn = IFN_FMA;
> >   if (neg_mul)
> > {
> > - if (fma_supported_p (IFN_FNMA, type))
> > + if ((how = fma_supported_p (IFN_FNMA, type)) != FMA_NONE)
> > ifn = IFN_FNMA;
> >   else
> > ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> > @@ -981,21 +998,34 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > ATTRIBUTE_UNUSED,
> >   if (neg_add)
> > {
> >   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS;
> > - if (f

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-24 Thread Hongtao Liu

On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu  wrote:
>
> On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu  wrote:
> >
> > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu  wrote:
> > > >
> > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu  wrote:
> > > > > >
> > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts 
> > > > > > from
> > > > > > constant and variable scalars.  When broadcasting from constants and
> > > > > > function arguments, we can place a single widest vector broadcast at
> > > > > > entry of the nearest common dominator for basic blocks with all uses
> > > > > > since constants and function arguments aren't changed.  For 
> > > > > > broadcast
> > > > > > from variables with a single definition, the single definition is
> > > > > > replaced with the widest broadcast.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > > PR target/92080
> > > > > > * config/i386/i386-expand.cc (ix86_expand_call): Set
> > > > > > recursive_function to true for recursive call.
> > > > > > * config/i386/i386-features.cc 
> > > > > > (ix86_place_single_vector_set):
> > > > > > Add an argument for inner scalar, default to nullptr.  Set 
> > > > > > the
> > > > > > source from inner scalar if not nullptr.
> > > > > > (ix86_get_vector_load_mode): Renamed to ...
> > > > > > (ix86_get_vector_cse_mode): This.  Add an argument for 
> > > > > > scalar mode
> > > > > > and handle integer and float scalar modes.
> > > > > > (replace_vector_const): Add an argument for scalar mode and 
> > > > > > pass
> > > > > > it to ix86_get_vector_load_mode.
> > > > > > (x86_cse_kind): New.
> > > > > > (redundant_load): Likewise.
> > > > > > (ix86_broadcast_inner): Likewise.
> > > > > > (remove_redundant_vector_load): Also support const0_rtx and
> > > > > > constm1_rtx broadcasts.  Handle vector broadcasts from 
> > > > > > constant
> > > > > > and variable scalars.
> > > > > > * config/i386/i386.h (machine_function): Add 
> > > > > > recursive_function.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > > * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to 
> > > > > > expect
> > > > > > movdqa instead pxor.
> > > > > > * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> > > > > > * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> > > > > > * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-4.c: New test.
> > > > > > * gcc.target/i386/pr92080-5.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-6.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-7.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-8.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-9.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-10.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-11.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-12.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-13.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-14.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-15.c: Likewise.
> > > > > > * gcc.target/i386/pr92080-16.c: Likewise.
> > > > > >
> > > > > > Signed-off-by: H.J. Lu 
> > > > > > ---
> > > > > >  gcc/config/i386/i386-expand.cc|   3 +
> > > > > >  gcc/config/i386/i386-features.cc  | 410 
> > > > > > ++
> > > > > >  gcc/config/i386/i386.h|   3 +
> > > > > >  .../i386/keylocker-aesdecwide128kl.c  |  14 +-
> > > > > >  .../i386/keylocker-aesdecwide256kl.c  |  14 +-
> > > > > >  .../i386/keylocker-aesencwide128kl.c  |  14 +-
> > > > > >  .../i386/keylocker-aesencwide256kl.c  |  14 +-
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-10.c|  13 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-11.c|  33 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-12.c|  16 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-13.c|  32 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-14.c|  31 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-15.c|  25 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-16.c|  26 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-4.c |  50 +++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-5.c | 109 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-6.c |  19 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-7.c |  20 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-8.c |  16 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-9.c |  81 
> > > > > >  20 files changed, 823 insertions(+), 120 deletions(-)
> > > > > >  create

Re: [PATCH] diagnostic: fix for older version of GCC

2025-06-24 Thread David Malcolm

On Tue, 2025-06-24 at 15:16 +0200, Marc Poulhiès wrote:
> Having both an enum and a variable with the same name triggers an
> error with
> gcc 5.
> 
> ChangeLog:
> 
>   * c/gcc/diagnostic-state-to-dot.cc
> (get_color_for_dynalloc_state):
>   Rename argument dynalloc_state to dynalloc_st.
>   (add_title_tr): Rename argument style to styl.
>   (on_xml_node): Rename local variable dynalloc_state to
> dynalloc_st.
> ---
> Bootstrapped on x86_64-linux using GCC 5.5.0. 
> Ok for master?

Sorry about the breakage.  Looks good to me.

Thanks
Dave

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Richard Biener

On Tue, Jun 24, 2025 at 4:28 PM Alexander Monakov  wrote:
>
> > > Thanks! Any thoughts on the other patch in the thread, about flipping
> > > -ffp-contract from =fast to =on?
> >
> > I can't find this mail, not in spam either, but I'm OK with such change if 
> > it
> > comes with test coverage.
>
> Ouch, let me reproduce it below. About test coverage, I'm not exactly sure 
> what
> you mean, so I'll try to explain my perspective.
>
> We definitely have existing test coverage for production of FMAs, especially
> on targets where FMA is in baseline ISA (aarch64, ia64, rs6000, maybe others),
> but even x86 has a number of tests where fma is enabled with -m flags.  Some
> of those tests will regress and the question is what to do about them.
>
> On x86, I've provided a fix for one test (the patch you just approved), but
> I can't fix everything and in the patch I'm changing the remaining tests to
> pass -ffp-contract=fast.  But most of the tests are actually highlighting
> missed optimizations, like SLP not able to form addsub FMAs from a pair of
> fmadd+fmsub.

I'd say we want to fix these kind of things before switching the default.  Can
you file bugreports for the distinct issues you noticed when adjusting the
testcases?  I suppose they are reproducible as well when using the C
fma() function directly?

Thanks,
Richard.

>
> And the question of what to do on all the other targets remains.
>
> --- 8< ---
>
> From 478d51bc46c925028f6b1782fcadb3a92a58 Mon Sep 17 00:00:00 2001
> From: Alexander Monakov 
> Date: Mon, 12 May 2025 23:23:42 +0300
> Subject: [PATCH] c-family: switch away from -ffp-contract=fast
>
> Unrestricted FMA contraction, combined with optimizations that create
> copies of expressions, is causing hard-to-debug issues such as PR106902.
> Since we implement conformant contraction now, switch C and C++ from
> -ffp-contract=fast to either =off (under -std=c[++]NN like before, also
> for C++ now), or =on (under -std=gnu[++]NN).  Keep -ffp-contract=fast
> when -funsafe-math-optimizations (or -ffast-math, -Ofast) is active.
>
> In other words,
>
> - -ffast-math: no change, unrestricted contraction like before;
> - standards compliant mode for C: no change, no contraction;
> - ditto, for C++: align with C (no contraction);
> - otherwise, switch C and C++ from -ffp-contract=fast to =on.
>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_common_post_options): Adjust handling of
> flag_fp_contract_mode.
>
> gcc/ChangeLog:
>
> * doc/invoke.texi (-ffp-contract): Describe new defaults.
> (-funsafe-math-optimizations): Add -ffp-contract=fast.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr81904.c: Add -ffp-contract=fast to flags.
> * gcc.target/i386/pr116979.c: Ditto.
> * gcc.target/i386/intrinsics_opt-1.c: Ditto.
> * gcc.target/i386/part-vect-fmaddsubhf-1.c: Ditto.
> * gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: Ditto.
> * gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: Ditto.
> * gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: Ditto.
> * gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: Ditto.
> ---
>  gcc/c-family/c-opts.cc| 11 ++-
>  gcc/doc/invoke.texi   | 10 ++
>  .../gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c  |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmaddsubXXXps.c  |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c  |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmsubaddXXXps.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c  |  2 +-
>  .../gcc.target/i386/part-vect-fmaddsubhf-1.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr116979.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr81904.c   |  2 +-
>  10 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index 697518637d..1ae45e2b9a 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -877,15 +877,8 @@ c_common_post_options (const char **pfilename)
>  flag_excess_precision = (flag_iso ? EXCESS_PRECISION_STANDARD
>  : EXCESS_PRECISION_FAST);
>
> -  /* ISO C restricts floating-point expression contraction to within
> - source-language expressions (-ffp-contract=on, currently an alias
> - for -ffp-contract=off).  */
> -  if (flag_iso
> -  && !c_dialect_cxx ()
> -  && (OPTION_SET_P (flag_fp_contract_mode)
> - == (enum fp_contract_mode) 0)
> -  && flag_unsafe_math_optimizations == 0)
> -flag_fp_contract_mode = FP_CONTRACT_OFF;
> +  if (!flag_unsafe_math_optimizations && !OPTION_SET_P 
> (flag_fp_contract_mode))
> +flag_fp_contract_mode = flag_iso ? FP_CONTRACT_OFF : FP_CONTRACT_ON;
>
>/* C language modes before C99 enable -fpermissive by default, but
>   only if -pedantic-errors is not specified.  Also treat
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/

[PATCH] libstdc++: Test for %S precision for durations with integral representation.

2025-06-24 Thread Tomasz Kamiński

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/precision.cc: New tests.
---
Merging additional tests I have added, when working on erasing chrono
types. Testing on x86_64-linux.
OK for trunk when test passes?

 .../testsuite/std/time/format/precision.cc| 64 +--
 1 file changed, 59 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc 
b/libstdc++-v3/testsuite/std/time/format/precision.cc
index ccb2c77ce05..b604fbfe9e9 100644
--- a/libstdc++-v3/testsuite/std/time/format/precision.cc
+++ b/libstdc++-v3/testsuite/std/time/format/precision.cc
@@ -16,6 +16,8 @@ test_empty()
   std::basic_string res;
 
   const duration d(33.111222);
+  res = std::format(WIDEN("{:}"), d);
+  VERIFY( res == WIDEN("33.1112s") );
   res = std::format(WIDEN("{:.3}"), d);
   VERIFY( res == WIDEN("33.1112s") );
   res = std::format(WIDEN("{:.6}"), d);
@@ -25,6 +27,8 @@ test_empty()
 
   // Uses ostream operator<<
   const duration nd = d;
+  res = std::format(WIDEN("{:}"), nd);
+  VERIFY( res == WIDEN("3.31112e+10ns") );
   res = std::format(WIDEN("{:.3}"), nd);
   VERIFY( res == WIDEN("3.31112e+10ns") );
   res = std::format(WIDEN("{:.6}"), nd);
@@ -40,6 +44,8 @@ test_Q()
   std::basic_string res;
 
   const duration d(7.111222);
+  res = std::format(WIDEN("{:%Q}"), d);
+  VERIFY( res == WIDEN("7.111222") );
   res = std::format(WIDEN("{:.3%Q}"), d);
   VERIFY( res == WIDEN("7.111222") );
   res = std::format(WIDEN("{:.6%Q}"), d);
@@ -48,6 +54,8 @@ test_Q()
   VERIFY( res == WIDEN("7.111222") );
 
   const duration nd = d;
+  res = std::format(WIDEN("{:%Q}"), nd);
+  VERIFY( res == WIDEN("7111222000") );
   res = std::format(WIDEN("{:.3%Q}"), nd);
   VERIFY( res == WIDEN("7111222000") );
   res = std::format(WIDEN("{:.6%Q}"), nd);
@@ -58,12 +66,14 @@ test_Q()
 
 template
 void
-test_S()
+test_S_fp()
 {
   std::basic_string res;
 
   // Precision is ignored, but period affects output
-  const duration d(5.111222);
+  duration d(5.111222);
+  res = std::format(WIDEN("{:%S}"), d);
+  VERIFY( res == WIDEN("05") );
   res = std::format(WIDEN("{:.3%S}"), d);
   VERIFY( res == WIDEN("05") );
   res = std::format(WIDEN("{:.6%S}"), d);
@@ -71,7 +81,9 @@ test_S()
   res = std::format(WIDEN("{:.9%S}"), d);
   VERIFY( res == WIDEN("05") );
 
-  const duration md = d;
+  duration md = d;
+  res = std::format(WIDEN("{:%S}"), md);
+  VERIFY( res == WIDEN("05.111") );
   res = std::format(WIDEN("{:.3%S}"), md);
   VERIFY( res == WIDEN("05.111") );
   res = std::format(WIDEN("{:.6%S}"), md);
@@ -79,7 +91,19 @@ test_S()
   res = std::format(WIDEN("{:.9%S}"), md);
   VERIFY( res == WIDEN("05.111") );
 
-  const duration nd = d;
+  duration ud = d;
+  res = std::format(WIDEN("{:%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.3%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.6%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.9%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+
+  duration nd = d;
+  res = std::format(WIDEN("{:%S}"), nd);
+  VERIFY( res == WIDEN("05.111222000") );
   res = std::format(WIDEN("{:.3%S}"), nd);
   VERIFY( res == WIDEN("05.111222000") );
   res = std::format(WIDEN("{:.6%S}"), nd);
@@ -88,13 +112,43 @@ test_S()
   VERIFY( res == WIDEN("05.111222000") );
 }
 
+template
+void
+test_S_int()
+{
+  std::basic_string res;
+  const nanoseconds src(7'000'012'345);
+
+  auto d = floor(src);
+  res = std::format(WIDEN("{:%S}"), d);
+  VERIFY( res == WIDEN("07") );
+
+  auto md = floor(src);
+  res = std::format(WIDEN("{:%S}"), md);
+  VERIFY( res == WIDEN("07.000") );
+
+  auto ud = floor(src);
+  res = std::format(WIDEN("{:%S}"), ud);
+  VERIFY( res == WIDEN("07.12") );
+
+  auto nd = floor(src);
+  res = std::format(WIDEN("{:%S}"), nd);
+  VERIFY( res == WIDEN("07.12345") );
+
+  using picoseconds = duration;
+  auto pd = floor(src);
+  res = std::format(WIDEN("{:%S}"), pd);
+  VERIFY( res == WIDEN("07.12345000") );
+}
+
 template
 void
 test_all()
 {
   test_empty();
   test_Q();
-  test_S();
+  test_S_int();
+  test_S_fp();
 }
 
 int main()
-- 
2.49.0

Re: [PATCH] Fortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]

2025-06-24 Thread Harald Anlauf


Am 24.06.25 um 21:11 schrieb Steve Kargl:

On Tue, Jun 24, 2025 at 09:00:46PM +0200, Harald Anlauf wrote:


here's an obvious fix for a recent regression: substring offset
calculations used a wrong type that crashed in gimplification.
Andre basically OK'ed it in the PR, but here it is nevertheless.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?



Yes.  Thanks for the patch.



Thanks for the review!

Pushed as r16-1658-g5bc92717b80448.

Re: [PATCH, 1 of 4] Add -mcpu=future support for PowerPC

2025-06-24 Thread Michael Meissner

On Mon, Jun 23, 2025 at 07:30:51PM +0530, Surya Kumari Jangala wrote:
> Hi Mike,
> 
> On 14/06/25 2:07 pm, Michael Meissner wrote:
> > This is patch #1 of 4 that adds the support that can be used in developing 
> > GCC
> > support for future PowerPC processors.
> 
> Please reword the commit message, perhaps something like:
> This is patch #1 of 4 that adds support for the option -mcpu=future. This 
> enables
> future enhancements to GCC for supporting upcoming PowerPC processors.

Thanks.

> > @@ -5905,6 +5909,8 @@ rs6000_machine_from_flags (void)
> >flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> > OPTION_MASK_ISEL
> >  | OPTION_MASK_ALTIVEC);
> >  
> > +  if ((flags & (FUTURE_MASKS_SERVER & ~ISA_3_1_MASKS_SERVER)) != 0)
> 
> The test should be against POWER11_MASKS_SERVER, not ISA_3_1_MASKS_SERVER.

Thanks, good catch.

> > @@ -24450,6 +24463,7 @@ static struct rs6000_opt_mask const 
> > rs6000_opt_masks[] =
> >{ "float128",OPTION_MASK_FLOAT128_KEYWORD,   false, 
> > true  },
> >{ "float128-hardware",   OPTION_MASK_FLOAT128_HW,false, true  },
> >{ "fprnd",   OPTION_MASK_FPRND,  false, 
> > true  },
> > +  { "future",  OPTION_MASK_FUTURE, false, 
> > false },
> 
> Please add this line after the "power11" line.

Again, like in POWERPC_MASKS, all of the entries are sorted in alphabetical
order.

> Also, in the routine expand_compare_loop(), we should handle PROCESSOR_FUTURE
> when computing max_bytes.

Thanks, I missed that.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Alexander Monakov



On Tue, 24 Jun 2025, Richard Biener wrote:

> On Tue, Jun 24, 2025 at 1:18 PM Alexander Monakov  wrote:
> >
> > > > On Fri, May 23, 2025 at 2:31 PM Alexander Monakov  
> > > > wrote:
> > > > >
> > > > > In PR 105965 we accepted a request to form FMA instructions when the
> > > > > source code is using a narrow generic vector that contains just one
> > > > > element, corresponding to V1SF or V1DF mode, while the backend does 
> > > > > not
> > > > > expand fma patterns for such modes.
> > > > >
> > > > > For this to work under -ffp-contract=on, we either need to modify
> > > > > backends, or emulate such degenerate-vector FMA via scalar FMA in
> > > > > tree-vect-generic.  Do the latter.
> > > >
> > > > Can you instead apply the lowering during gimplification?  That is 
> > > > because
> > > > having an unsupported internal-function in the IL the user could not 
> > > > have
> > > > emitted directly is somewhat bad.  I thought the vector lowering could
> > > > be generalized for more single-argument internal functions but then no
> > > > such unsupported calls should exist in the first place.
> > >
> > > Sure, like below?  Not fully tested yet.
> >
> > Ping — now bootstrapped and regtested.
> 
> LGTM.

Thanks! Any thoughts on the other patch in the thread, about flipping
-ffp-contract from =fast to =on?

Alexander

Re: [COMMITTED 02/12] - Move to an always available relation oracle.

2025-06-24 Thread Andrew MacLeod

yeah, sure. Its wasted memory.  If we ever need it fro anything, it 
could be added back in.


I'll add it to my next commit.

Thanks

Andrew

On 6/23/25 18:21, Martin Jambor wrote:

Hello,

On Thu, May 23 2024, Andrew MacLeod wrote:

This patch provides a basic oracle which doesn't do anything, but will
still respond when queried.  This allows passes to avoid the NULL check
for an oracle pointer before they do anything, and results in a slight
speedup in VRP, and a slightly more significant 0.3% speedup in jump
threading..

It also unifies the register and query names to nor specify what is
already apparent in the parameters.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

 From 2f80eb1feb3f92c7e9e57d4726ec52ca7d27ce92 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 30 Apr 2024 09:35:23 -0400
Subject: [PATCH 02/12] Move to an always available relation oracle.

This eliminates the need to check if the relation oracle pointer is NULL
before every call by providing a default oracle which does nothing.
REmove unused routines, and Unify register_relation method names.

* gimple-range-cache.cc (ranger_cache::dump_bb): Remove check for
NULL oracle pointer.
(ranger_cache::fill_block_cache): Likewise.
* gimple-range-fold.cc (fur_stmt::get_phi_operand): Likewise.
(fur_depend::fur_depend): Likewise.
(fur_depend::register_relation): Likewise, use qury_relation.
(fold_using_range::range_of_phi): Likewise.
(fold_using_range::relation_fold_and_or): Likewise.
* gimple-range-fold.h (fur_source::m_oracle): Delete.  Oracle
can be accessed dirctly via m_query now.
* gimple-range-path.cc (path_range_query::path_range_query):
Adjust for oracle reference pointer.
(path_range_query::compute_ranges): Likewise.
(jt_fur_source::jt_fur_source): Adjust for no m_oracle member.
(jt_fur_source::register_relation): Do not check for NULL
pointer.
(jt_fur_source::query_relation): Likewise.
* gimple-range.cc (gimple_ranger::gimple_ranger):  Adjust for
reference pointer.
* value_query.cc (default_relation_oracle): New.
(range_query::create_relation_oracle): Relocate from header.
Ensure not being added to global query.
(range_query::destroy_relation_oracle): Relocate from header.
(range_query::range_query): Initailize to default oracle.
(ange_query::~range_query): Call destroy_relation_oracle.
* value-query.h (class range_query): Adjust prototypes.
(range_query::create_relation_oracle): Move to source file.
(range_query::destroy_relation_oracle): Move to source file.
* value-relation.cc (relation_oracle::validate_relation): Delete.
(relation_oracle::register_stmt): Rename to register_relation.
(relation_oracle::register_edge): Likewise.
* value-relation.h (register_stmt): Rename to register_relation and
provide default function in base class.
(register_edge): Likewise.
(relation_oracle::query_relation): Provide default in base class.
(relation_oracle::dump): Likewise.
(relation_oracle::equiv_set): Likewise.
(default_relation_oracle): New extenal reference.
(partial_equiv_set, add_partial_equiv): Move to protected.
* value-relation.h (relation_oracle::validate_relation): Delete.

[...]


@@ -208,66 +208,6 @@ static const tree_code relation_to_code [VREL_LAST] = {
ERROR_MARK, ERROR_MARK, LT_EXPR, LE_EXPR, GT_EXPR, GE_EXPR, EQ_EXPR,
NE_EXPR };
  
-// This routine validates that a relation can be applied to a specific set of

-// ranges.  In particular, floating point x == x may not be true if the NaN bit
-// is set in the range.  Symbolically the oracle will determine x == x,
-// but specific range instances may override this.
-// To verify, attempt to fold the relation using the supplied ranges.
-// One would expect [1,1] to be returned, anything else means there is 
something
-// in the range preventing the relation from applying.
-// If there is no mechanism to verify, assume the relation is acceptable.
-
-relation_kind
-relation_oracle::validate_relation (relation_kind rel, vrange &op1, vrange 
&op2)
-{
-  // If there is no mapping to a tree code, leave the relation as is.
-  tree_code code = relation_to_code [rel];

This seems to have been the only use of the array relation_to_code which
we however still have around.  Should it be removed too?

Thanks,

Martin

Re: [PATCH v2] expand: Improve expand_constructor for BLKmode mode and zeros constructors [PR110459]

2025-06-24 Thread Andrew Pinski

On Mon, Jun 23, 2025, 6:09 PM Jeff Law  wrote:

>
>
> On 6/7/25 1:04 AM, Andrew Pinski wrote:
> > Currently expand_constructor when it comes to BLKmode types, we store out
> > the constructor to memory. But for the zero case, we could just use
> `const_int 0`
> > instead for types who are less than or equal to BITS_PER_WORD.
> > This is only valid if we expanding without a target.
> >
> > This fixes the storing zeros of `char[3]` and `char[7]` into registers.
> >
> > Boostrapped and tested on x86_64-linux-gnu.
> > Build and tested for aarch64-linux-gnu.
> >
> >   PR middle-end/110459
> >
> > gcc/ChangeLog:
> >
> >   * expr.cc (expand_constructor): For expansion of `{}` without a
> target
> >   and the size is less than BITS_PER_WORD, just return the constant
> 0.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.target/aarch64/array-return-1.C: New test.
> >   * g++.target/i386/array-return-1.C: New test.
> OK.  I guess.  My worry is that even for a non-addressable, we can't do
> directly BLKmode assignments and this increases the risk that one of
> those creeps in.  Though emit_move_insn and friend check for this IIRC.
> So I don't have a concrete objection, just a vague uneasiness :-)
>
> So I'll ACK.  But we probably need to keep an eye out for fallout.
>


I forgot to withdraw this patch via a reply here as it did cause some
failures on arm-linux-gnueabi . I did make a mention of it in
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686065.html .
And richi asked something slightly different but I Have not got around to
trying it out as I was on vacation until yesterday.

Thanks,
Andrew


> jeff
>

Re: [PATCH] c++/modules: Only compare types of DECL_TEMPLATE_RESULTs [PR120644]

2025-06-24 Thread Jason Merrill


On 6/23/25 5:41 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?

-- >8 --

We were erroring because the TEMPLATE_DECL of the existing partial
specialisation has an undeduced return type, but the imported
declaration did not.

The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
where modules streaming code assumes that a TEMPLATE_DECL and its
DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
fixed the issue by ensuring that when the type of a variable is deduced
the TEMPLATE_DECL is updated as well, but this missed handling partial
specialisations.

However, I don't think we actually care about that, since it seems that
only the type of the inner decl actually matters in practice.  Instead,
this patch handles the issue on the modules side when deduping a
streamed decl, by only comparing the inner type.

PR c++/120644

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Remove workaround.


Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL 
correct, maybe we should always set it to NULL_TREE to make sure we only 
look at the inner type.


The rest of the patch is OK.


* module.cc (trees_in::is_matching_decl): Only compare types of
inner decls. Clarify function return type deduction should only
occur for non-TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/auto-7.h: New test.
* g++.dg/modules/auto-7_a.H: New test.
* g++.dg/modules/auto-7_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc  |  6 --
  gcc/cp/module.cc|  5 +++--
  gcc/testsuite/g++.dg/modules/auto-7.h   | 12 
  gcc/testsuite/g++.dg/modules/auto-7_a.H |  5 +
  gcc/testsuite/g++.dg/modules/auto-7_b.C |  5 +
  5 files changed, 25 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index febdc89f89d..150d26079a8 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8921,12 +8921,6 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
/* Now that we have a type, try these again.  */
layout_decl (decl, 0);
cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
-
-  /* Update the type of the corresponding TEMPLATE_DECL to match.  */
-  if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl)
- && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl)
-   TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type;
  }
  
if (ensure_literal_type_for_constexpr_object (decl) == error_mark_node)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c99988da05b..606eac77db9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12193,7 +12193,8 @@ trees_in::is_matching_decl (tree existing, tree decl, 
bool is_typedef)
{
  dump (dumper::MERGE)
&& dump ("Propagating deduced return type to %N", existing);
- FNDECL_USED_AUTO (e_inner) = true;
+ gcc_checking_assert (existing == e_inner);
+ FNDECL_USED_AUTO (existing) = true;
  DECL_SAVED_AUTO_RETURN_TYPE (existing) = TREE_TYPE (e_type);
  TREE_TYPE (existing) = change_return_type (TREE_TYPE (d_type), 
e_type);
}
@@ -12248,7 +12249,7 @@ trees_in::is_matching_decl (tree existing, tree decl, 
bool is_typedef)
/* Using cp_tree_equal because we can meet TYPE_ARGUMENT_PACKs
   here. I suspect the entities that directly do that are things
   that shouldn't go to duplicate_decls (FIELD_DECLs etc).   */
-  else if (!cp_tree_equal (TREE_TYPE (decl), TREE_TYPE (existing)))
+  else if (!cp_tree_equal (TREE_TYPE (d_inner), TREE_TYPE (e_inner)))
  {
mismatch_msg = G_("conflicting type for imported declaration %#qD");
  mismatch:
diff --git a/gcc/testsuite/g++.dg/modules/auto-7.h 
b/gcc/testsuite/g++.dg/modules/auto-7.h
new file mode 100644
index 000..324b60cfa0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/auto-7.h
@@ -0,0 +1,12 @@
+// PR c++/120644
+
+enum class E { E0, E1 };
+
+template 
+constexpr auto fmt_kind = E::E0;
+
+template 
+class opt{};
+
+template 
+constexpr auto fmt_kind> = E::E1;
diff --git a/gcc/testsuite/g++.dg/modules/auto-7_a.H 
b/gcc/testsuite/g++.dg/modules/auto-7_a.H
new file mode 100644
index 000..40cb0f886c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/auto-7_a.H
@@ -0,0 +1,5 @@
+// PR c++/120644
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+#include "auto-7.h"
diff --git a/gcc/testsuite/g++.dg/modules/auto-7_b.C 
b/gcc/testsuite/g++.dg/modules/auto-7_b.C
new file mode 100644
index 000..c6ad37fd828
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/auto-7_b.C
@@ -0,0 +1,5 @@
+// PR c++/120644
+// { dg-addition

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Paul Richard Thomas

Hi Andre,

All six patches require git apply --whitespace=fix --ignore-space-change <
~/prs/Shared_Memory/pr88076_v1_x.patch to apply.

 The build fails with:
Makefile:3848: caf/.deps/caf_error.Plo: No such file or directory
make[2]: *** No rule to make target 'caf/.deps/caf_error.Plo'.  Stop.
make[2]: Leaving directory
'/home/pault/gitsources/build/x86_64-pc-linux-gnu/libgfortran'
make[1]: *** [Makefile:16529: install-target-libgfortran] Error 2
make[1]: Leaving directory '/home/pault/gitsources/build'
make: *** [Makefile:2668: install] Error 2

I am afraid that I have timed out for the next two weeks - sorry.

Regards

Paul


On Tue, 24 Jun 2025 at 14:10, Andre Vehreschild  wrote:

> Hi all,
>
> this series of patches (six in total) adds a new coarray backend library to
> libgfortran.  The library uses shared memory and processes to implement
> running multiple images on the same node.  The work is based on work
> started by
> Thomas and Nicolas Koenig. No changes to the gfortran compile part are
> required
> for this.
>
> Unfortunately I found some defects in the gfortran compiler, that needed
> to be
> fixed. These are the first four tiny patches. The fifth patch then adds the
> library and sixth patches the testcases in
> gcc/testsuite/gfortran.dg/coarray to
> also run (and pass) when linked against caf_shmem.
>
> The development has been done on x86_64-pc-linux-gnu / Fedora 41. I am
> curious
> to learn which fixes will be needed for other platforms.
>
> This will be the last big patch that was funded by the STF/STA. My funding
> has
> run out and I will only be available for a few days before a new project
> will
> consume my attention. Therefore please bring any deficiencies to my
> attention
> as soon as possible.
>
> I have done some performance measurement against OpenCoarrays measuring
> coarray_icarr from https://github.com/gutmann/coarray_icar . The figures
> are:
>
> OpenCoarrays (mpich4-backend): 165.578s (real 2m59,947s)
> --
> caf_shmem (16-trunk): 61.489s (real 1m3,681s)
>
> The first number is the "Model run time:" as reported by the program. In
> the
> parentheses the real run time as reported by the bash command `time` is
> given.
>
> Both are done using a debug build of coarray_icar on an Intel Core
> i7-5775C CPU
> @ 3.30GHz having 24GB, and running Fedora Linux 41 with all recent patches.
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jason Merrill


On 6/24/25 2:25 PM, Jakub Jelinek wrote:

On Tue, Jun 24, 2025 at 11:57:01AM -0400, Jason Merrill wrote:

The two other errors on the testcase are expectedly gone with C++26,
but the last one remains.  The problem is that when parsing nm3.a
inside of mutable_subobjects()::A::f()
build_class_member_access_expr calls build_base_path which calls
cp_build_addr_expr and that makes nm3 odr-used.  I must say I have
no idea whether nm3 ought to be odr-used or not just because of nm3.a
use and if not, how that should be changed.


build_simple_base_path is how we avoid this odr-use; seems we also need to
use it early in the case of (v_binfo && !virtual_access).


--- gcc/cp/class.cc.jj  2025-06-18 17:24:03.973867379 +0200
+++ gcc/cp/class.cc 2025-06-24 20:11:21.728169508 +0200
@@ -349,7 +349,11 @@ build_base_path (enum tree_code code,
  
/* For a non-pointer simple base reference, express it as a COMPONENT_REF

   without taking its address (and so causing lambda capture, 91933).  */
-  if (code == PLUS_EXPR && !v_binfo && !want_pointer && !has_empty && !uneval)
+  if (code == PLUS_EXPR
+  && !want_pointer
+  && !has_empty
+  && !uneval
+  && (!v_binfo || resolves_to_fixed_type_p (expr) > 0))
  return build_simple_base_path (expr, binfo);
  
if (!want_pointer)


seems to fix that and doesn't regress anything else in make check-g++.
I guess it can be handled separately from the rest.
Or do you prefer some other way to avoid calling resolves_to_fixed_type_p
twice in some cases?


I think we could move the initialization of the fixed_type_p and 
virtual_access variables up, they don't need to be after cp_build_addr_expr.



works at runtime.  In the patch I've adjusted the function
comment of cxx_eval_dynamic_cast_fn because with virtual bases
I believe hint -1 might be possible, though I'm afraid I don't


Yes, we would get -1 for dynamic_cast from B to A.


The routine then has some
   /* Given dynamic_cast(v),

  [expr.dynamic.cast] If C is the class type to which T points or refers,
  the runtime check logically executes as follows:

  If, in the most derived object pointed (referred) to by v, v points
  (refers) to a public base class subobject of a C object, and if only
  one object of type C is derived from the subobject pointed (referred)
  to by v the result points (refers) to that C object.

  In this case, HINT >= 0 or -3.  */
   if (hint >= 0 || hint == -3)
Should that include the hint == -1 case too (so effectively if (hint != -2)
or is -1 not relevant to that block.


I think -1 doesn't distinguish between single or multiple virtual 
derivation, so handling -1 in that block might mean succeeding for a 
multiple derivation case where it ought to fail.



know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
to figure out what needs to change there.  It is hint -2 that
fails, not hint -1.


Yes, this is a -2 case because C does not derive from B.

How does cxx_eval_dynamic_cast_fn fail in this case?  From looking at the
function it seems like it ought to work.

>>...

Actually, I see the reason now.
get_component_path is called with
a.D.2692, A, NULL
where a.D.2692 has B type.
And the reason why it fails is
2538  /* We need to check that the component we're accessing is in 
fact
2539 accessible.  */
2540  if (TREE_PRIVATE (TREE_OPERAND (path, 1))
2541  || TREE_PROTECTED (TREE_OPERAND (path, 1)))
2542return error_mark_node;
The D.2692 FIELD_DECL has been created by build_base_field_1
called from build_base_field from layout_virtual_bases and that one calls it
with
6753  if (!BINFO_PRIMARY_P (vbase))
6754{
6755  /* This virtual base is not a primary base of any class in the
6756 hierarchy, so we have to add space for it.  */
6757  next_field = build_base_field (rli, vbase,
6758 access_private_node,
6759 offsets, next_field);
6760}
access_private_node forces TREE_PRIVATE on the FIELD_DECL and so it doesn't
reflect whether the base in question was private/protected or public.
struct A has also D.2689 FIELD_DECL with C type and that one is the primary
base, neither TREE_PRIVATE nor TREE_PROTECTED.


So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has
CLASSTYPE_VBASECLASSES walk the
for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase))
if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase))
and in that case try to compare byte_position (TREE_OPERAND (path, 1))
against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type
check?) then decide based on BINFO_BASE_ACCESS or something like that
whether it was a private/protected vs. public virtual base?


It seems simpler to pass an accurate access to the build_base_field 
above.  At least whether the whole BINFO_INHERITANCE_CHAIN is

Re: [PATCH] RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate [PR119100]

2025-06-24 Thread Jeff Law





On 6/18/25 7:55 AM, Paul-Antoine Arras wrote:

On 17/06/2025 18:19, Jeff Law wrote:

On 6/17/25 7:15 AM, Paul-Antoine Arras wrote:

This is part of my vector-scalar FMA series. See:
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679513.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685624.html

The attached patch handles vfmacc and vfmsac. However I ran into an 
issue while writing testcases. Since the add and acc variants share 
the same RTL patterns, the only difference being operand constraints, 
RA is free to pick either of the two, which makes it difficult to 
test in a straightforward and reliable way.
I managed to get appropriate testcases in vf-3-* and vf-4-* by making 
the loop body longer and register pressure higher. Now I haven't 
found a simple equivalent either for vf-1-* and vf-2-* or for the run 
tests.


The attempt shown in the current patch (e.g. vf-1-f64) exercises the 
right pattern (*pred_mul_addrvvm4df_scalar_undef) but the wrong 
alternative (got 2 while wanting to test 3).


Any suggestion would be appreciated!
Peter B had a good one.  Use -ffixed- to take registers out of 
the available set to force register pressure to increase.  It won't 
give perfect control, but is probably easier to manage that trying to 
do so by adding code into the testcase.


Thanks Jeff. After sleeping on it, I ended up returning the multiplicand 
in the testcase so that the addend got overwritten instead, which was 
enough to reliably exercise the acc variant.


The patch, updated this way, is in the attachment, ready to review.

Is it OK for trunk?

Robin ACK'd earlier today and I've pushed it to the trunk.  Thanks!

jeff

[PATCH] [PR modula2/120761] GM2_FOR_BUILD is not substituted in the toplevel Makefile

2025-06-24 Thread Gaius Mulley



Ok for master?   Bootstrapped on x86_64 gnu/linux

regards,
Gaius

---

[PR modula2/120761] GM2_FOR_BUILD is not substituted in the toplevel Makefile

This patch removes the unused GM2_FOR_BUILD in the toplevel Makefile.tpl
and removes any reference to GM2_FOR_BUILD in libgm2.  The only reference
found in libgm2 was a debugging echo.

ChangeLog:

PR modula2/120761
* Makefile.in: Rebuilt.
* Makefile.tpl (GM2_FOR_BUILD): Remove.

libgm2/ChangeLog:

PR modula2/120761
* libm2min/Makefile.am (SYSTEM.def): Remove GM2_FOR_BUILD
from the debugging echo.
* libm2min/Makefile.in: Rebuilt.



diff --git a/Makefile.in b/Makefile.in
index 12d4395d8e2..90ea215ec69 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -165,8 +165,6 @@ BUILD_EXPORTS = \
GOCFLAGS="$(GOCFLAGS_FOR_BUILD)"; export GOCFLAGS; \
GDC="$(GDC_FOR_BUILD)"; export GDC; \
GDCFLAGS="$(GDCFLAGS_FOR_BUILD)"; export GDCFLAGS; \
-   GM2="$(GM2_FOR_BUILD)"; export GM2; \
-   GM2FLAGS="$(GM2FLAGS_FOR_BUILD)"; export GM2FLAGS; \
DLLTOOL="$(DLLTOOL_FOR_BUILD)"; export DLLTOOL; \
DSYMUTIL="$(DSYMUTIL_FOR_BUILD)"; export DSYMUTIL; \
LD="$(LD_FOR_BUILD)"; export LD; \
@@ -382,7 +380,6 @@ DSYMUTIL_FOR_BUILD = @DSYMUTIL_FOR_BUILD@
 GFORTRAN_FOR_BUILD = @GFORTRAN_FOR_BUILD@
 GOC_FOR_BUILD = @GOC_FOR_BUILD@
 GDC_FOR_BUILD = @GDC_FOR_BUILD@
-GM2_FOR_BUILD = @GM2_FOR_BUILD@
 GNATMAKE_FOR_BUILD = @GNATMAKE_FOR_BUILD@
 LDFLAGS_FOR_BUILD = @LDFLAGS_FOR_BUILD@
 LD_FOR_BUILD = @LD_FOR_BUILD@
@@ -1009,7 +1006,6 @@ POSTSTAGE1_FLAGS_TO_PASS = \
CC="$${CC}" CC_FOR_BUILD="$${CC_FOR_BUILD}" \
CXX="$${CXX}" CXX_FOR_BUILD="$${CXX_FOR_BUILD}" \
GDC="$${GDC}" GDC_FOR_BUILD="$${GDC_FOR_BUILD}" \
-   GM2="$${GM2}" GM2_FOR_BUILD="$${GM2_FOR_BUILD}" \
GNATBIND="$${GNATBIND}" \
LDFLAGS="$${LDFLAGS}" \
HOST_LIBS="$${HOST_LIBS}" \
diff --git a/Makefile.tpl b/Makefile.tpl
index ddcca558913..99b97ff8225 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -168,8 +168,6 @@ BUILD_EXPORTS = \
GOCFLAGS="$(GOCFLAGS_FOR_BUILD)"; export GOCFLAGS; \
GDC="$(GDC_FOR_BUILD)"; export GDC; \
GDCFLAGS="$(GDCFLAGS_FOR_BUILD)"; export GDCFLAGS; \
-   GM2="$(GM2_FOR_BUILD)"; export GM2; \
-   GM2FLAGS="$(GM2FLAGS_FOR_BUILD)"; export GM2FLAGS; \
DLLTOOL="$(DLLTOOL_FOR_BUILD)"; export DLLTOOL; \
DSYMUTIL="$(DSYMUTIL_FOR_BUILD)"; export DSYMUTIL; \
LD="$(LD_FOR_BUILD)"; export LD; \
@@ -385,7 +383,6 @@ DSYMUTIL_FOR_BUILD = @DSYMUTIL_FOR_BUILD@
 GFORTRAN_FOR_BUILD = @GFORTRAN_FOR_BUILD@
 GOC_FOR_BUILD = @GOC_FOR_BUILD@
 GDC_FOR_BUILD = @GDC_FOR_BUILD@
-GM2_FOR_BUILD = @GM2_FOR_BUILD@
 GNATMAKE_FOR_BUILD = @GNATMAKE_FOR_BUILD@
 LDFLAGS_FOR_BUILD = @LDFLAGS_FOR_BUILD@
 LD_FOR_BUILD = @LD_FOR_BUILD@
@@ -765,7 +762,6 @@ POSTSTAGE1_FLAGS_TO_PASS = \
CC="$${CC}" CC_FOR_BUILD="$${CC_FOR_BUILD}" \
CXX="$${CXX}" CXX_FOR_BUILD="$${CXX_FOR_BUILD}" \
GDC="$${GDC}" GDC_FOR_BUILD="$${GDC_FOR_BUILD}" \
-   GM2="$${GM2}" GM2_FOR_BUILD="$${GM2_FOR_BUILD}" \
GNATBIND="$${GNATBIND}" \
LDFLAGS="$${LDFLAGS}" \
HOST_LIBS="$${HOST_LIBS}" \
diff --git a/libgm2/libm2min/Makefile.am b/libgm2/libm2min/Makefile.am
index b95b5dd3ea5..0947cd09174 100644
--- a/libgm2/libm2min/Makefile.am
+++ b/libgm2/libm2min/Makefile.am
@@ -132,7 +132,7 @@ libc.o: $(GM2_SRC)/gm2-libs-min/libc.c
 
 
 SYSTEM.def: Makefile
-   echo "CC = $(CC_FOR_BUILD)  CC_FOR_TARGET = $(CC_FOR_TARGET)  GM2 = 
$(GM2)  GM2_FOR_TARGET = $(GM2_FOR_TARGET) GM2_FOR_BUILD = $(GM2_FOR_BUILD)"
+   echo "CC = $(CC_FOR_BUILD)  CC_FOR_TARGET = $(CC_FOR_TARGET)  GM2 = 
$(GM2)  GM2_FOR_TARGET = $(GM2_FOR_TARGET) "
bash $(GM2_SRC)/tools-src/makeSystem -fpim \
  $(GM2_SRC)/gm2-libs-min/SYSTEM.def \
  $(GM2_SRC)/gm2-libs-min/SYSTEM.mod \
diff --git a/libgm2/libm2min/Makefile.in b/libgm2/libm2min/Makefile.in
index ce0efff26ba..288f76d064a 100644
--- a/libgm2/libm2min/Makefile.in
+++ b/libgm2/libm2min/Makefile.in
@@ -765,7 +765,7 @@ uninstall-am: uninstall-toolexeclibLTLIBRARIES
 libc.o: $(GM2_SRC)/gm2-libs-min/libc.c
 
 SYSTEM.def: Makefile
-   echo "CC = $(CC_FOR_BUILD)  CC_FOR_TARGET = $(CC_FOR_TARGET)  GM2 = 
$(GM2)  GM2_FOR_TARGET = $(GM2_FOR_TARGET) GM2_FOR_BUILD = $(GM2_FOR_BUILD)"
+   echo "CC = $(CC_FOR_BUILD)  CC_FOR_TARGET = $(CC_FOR_TARGET)  GM2 = 
$(GM2)  GM2_FOR_TARGET = $(GM2_FOR_TARGET) "
bash $(GM2_SRC)/tools-src/makeSystem -fpim \
  $(GM2_SRC)/gm2-libs-min/SYSTEM.def \
  $(GM2_SRC)/gm2-libs-min/SYSTEM.mod \

[PATCH] gcc: remove atan from edom_only_function

2025-06-24 Thread Yuao Ma

Hi Richard,

> OK.
>
> Thanks,
> Richard.

Thanks for the quick review! Could you please help me merge this patch? I'll
post the rest of the original patch soon.

Thanks,
Yuao

Re: [PATCH] c++/modules: Only compare types of DECL_TEMPLATE_RESULTs [PR120644]

2025-06-24 Thread Patrick Palka

On Tue, 24 Jun 2025, Jason Merrill wrote:

> On 6/23/25 5:41 PM, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?
> > 
> > -- >8 --
> > 
> > We were erroring because the TEMPLATE_DECL of the existing partial
> > specialisation has an undeduced return type, but the imported
> > declaration did not.
> > 
> > The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
> > where modules streaming code assumes that a TEMPLATE_DECL and its
> > DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
> > fixed the issue by ensuring that when the type of a variable is deduced
> > the TEMPLATE_DECL is updated as well, but this missed handling partial
> > specialisations.
> > 
> > However, I don't think we actually care about that, since it seems that
> > only the type of the inner decl actually matters in practice.  Instead,
> > this patch handles the issue on the modules side when deduping a
> > streamed decl, by only comparing the inner type.
> > 
> > PR c++/120644
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (cp_finish_decl): Remove workaround.
> 
> Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL correct,
> maybe we should always set it to NULL_TREE to make sure we only look at the
> inner type.

FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL
corresponding to a partial specialization via

 TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl)))

if we do want to end up keeping the two TREE_TYPEs in sync.

> 
> The rest of the patch is OK.
> 
> > * module.cc (trees_in::is_matching_decl): Only compare types of
> > inner decls. Clarify function return type deduction should only
> > occur for non-TEMPLATE_DECL.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/auto-7.h: New test.
> > * g++.dg/modules/auto-7_a.H: New test.
> > * g++.dg/modules/auto-7_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/decl.cc  |  6 --
> >   gcc/cp/module.cc|  5 +++--
> >   gcc/testsuite/g++.dg/modules/auto-7.h   | 12 
> >   gcc/testsuite/g++.dg/modules/auto-7_a.H |  5 +
> >   gcc/testsuite/g++.dg/modules/auto-7_b.C |  5 +
> >   5 files changed, 25 insertions(+), 8 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index febdc89f89d..150d26079a8 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -8921,12 +8921,6 @@ cp_finish_decl (tree decl, tree init, bool
> > init_const_expr_p,
> > /* Now that we have a type, try these again.  */
> > layout_decl (decl, 0);
> > cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
> > -
> > -  /* Update the type of the corresponding TEMPLATE_DECL to match.  */
> > -  if (DECL_LANG_SPECIFIC (decl)
> > - && DECL_TEMPLATE_INFO (decl)
> > - && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl)
> > -   TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type;
> >   }
> >   if (ensure_literal_type_for_constexpr_object (decl) ==
> > error_mark_node)
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index c99988da05b..606eac77db9 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -12193,7 +12193,8 @@ trees_in::is_matching_decl (tree existing, tree
> > decl, bool is_typedef)
> > {
> >   dump (dumper::MERGE)
> > && dump ("Propagating deduced return type to %N", existing);
> > - FNDECL_USED_AUTO (e_inner) = true;
> > + gcc_checking_assert (existing == e_inner);
> > + FNDECL_USED_AUTO (existing) = true;
> >   DECL_SAVED_AUTO_RETURN_TYPE (existing) = TREE_TYPE (e_type);
> >   TREE_TYPE (existing) = change_return_type (TREE_TYPE (d_type),
> > e_type);
> > }
> > @@ -12248,7 +12249,7 @@ trees_in::is_matching_decl (tree existing, tree
> > decl, bool is_typedef)
> > /* Using cp_tree_equal because we can meet TYPE_ARGUMENT_PACKs
> >here. I suspect the entities that directly do that are things
> >that shouldn't go to duplicate_decls (FIELD_DECLs etc).   */
> > -  else if (!cp_tree_equal (TREE_TYPE (decl), TREE_TYPE (existing)))
> > +  else if (!cp_tree_equal (TREE_TYPE (d_inner), TREE_TYPE (e_inner)))
> >   {
> > mismatch_msg = G_("conflicting type for imported declaration %#qD");
> >   mismatch:
> > diff --git a/gcc/testsuite/g++.dg/modules/auto-7.h
> > b/gcc/testsuite/g++.dg/modules/auto-7.h
> > new file mode 100644
> > index 000..324b60cfa0a
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/auto-7.h
> > @@ -0,0 +1,12 @@
> > +// PR c++/120644
> > +
> > +enum class E { E0, E1 };
> > +
> > +template 
> > +constexpr auto fmt_kind = E::E0;
> > +
> > +template 
> > +class opt{};
> > +
> > +template

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jakub Jelinek

On Tue, Jun 24, 2025 at 10:33:01PM +0200, Jakub Jelinek wrote:
> On Tue, Jun 24, 2025 at 08:25:33PM +0200, Jakub Jelinek wrote:
> > > > know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
> > > > to figure out what needs to change there.  It is hint -2 that
> > > > fails, not hint -1.
> > > 
> > > Yes, this is a -2 case because C does not derive from B.
> > > 
> > > How does cxx_eval_dynamic_cast_fn fail in this case?  From looking at the
> > > function it seems like it ought to work.
> > 
> > I'll study it in detail tomorrow.
> 
> Actually, I see the reason now.
> get_component_path is called with
> a.D.2692, A, NULL
> where a.D.2692 has B type.
> And the reason why it fails is
> 2538/* We need to check that the component we're accessing is in 
> fact
> 2539   accessible.  */
> 2540if (TREE_PRIVATE (TREE_OPERAND (path, 1))
> 2541|| TREE_PROTECTED (TREE_OPERAND (path, 1)))
> 2542  return error_mark_node;
> The D.2692 FIELD_DECL has been created by build_base_field_1
> called from build_base_field from layout_virtual_bases and that one calls it
> with
> 6753if (!BINFO_PRIMARY_P (vbase))
> 6754  {
> 6755/* This virtual base is not a primary base of any class in the
> 6756   hierarchy, so we have to add space for it.  */
> 6757next_field = build_base_field (rli, vbase,
> 6758   access_private_node,
> 6759   offsets, next_field);
> 6760  }
> access_private_node forces TREE_PRIVATE on the FIELD_DECL and so it doesn't
> reflect whether the base in question was private/protected or public.
> struct A has also D.2689 FIELD_DECL with C type and that one is the primary
> base, neither TREE_PRIVATE nor TREE_PROTECTED.

So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has
CLASSTYPE_VBASECLASSES walk the
for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase))
if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase))
and in that case try to compare byte_position (TREE_OPERAND (path, 1))
against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type
check?) then decide based on BINFO_BASE_ACCESS or something like that
whether it was a private/protected vs. public virtual base?

Jakub

Re: [PATCH] s390: Optimize fmin/fmax.

2025-06-24 Thread Stefan Schulze Frielinghaus

On Mon, Jun 23, 2025 at 09:51:13AM +0200, Juergen Christ wrote:
> On VXE targets, we can directly use the fp min/max instruction instead of
> calling into libm for fmin/fmax etc.
> 
> Provide fmin/fmax versions also for vectors even though it cannot be
> called directly.  This will be exploited with a follow-up patch when
> reductions are introduced.

This looks very similar to vfmin / vfmax.  Couldn't we merge
those by using appropriate mode iterators?  The expander for fmin
/ fmax could set the mask operand.

> 
> Bootstrapped and regtested on s390.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md (VFT_BFP): New iterator.
>   * config/s390/vector.md (fmax3): Implement.
>   (fmin3): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/fminmax-1.c: New test.
>   * gcc.target/s390/fminmax-2.c: New test.
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/s390.md   |  3 +
>  gcc/config/s390/vector.md | 24 +++
>  gcc/testsuite/gcc.target/s390/fminmax-1.c | 77 +++
>  gcc/testsuite/gcc.target/s390/fminmax-2.c | 29 +
>  4 files changed, 133 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-2.c
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 97a4bdf96b2d..c4e836dd4af4 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -256,6 +256,9 @@
>  
> UNSPEC_NNPA_VCFN_V8HI
> UNSPEC_NNPA_VCNF_V8HI
> +
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN

See above UNSPEC_VEC_VFMIN / UNSPEC_VEC_VFMAX.

>  ])
>  
>  ;;
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 6f4e1929eb80..668e7475ef21 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -3576,3 +3576,27 @@
>  ; vec_unpacks_float_lo
>  ; vec_unpacku_float_hi
>  ; vec_unpacku_float_lo
> +(define_mode_iterator VFT_BFP [SF DF
> +   (V1SF "TARGET_VXE") (V2SF "TARGET_VXE") (V4SF "TARGET_VXE")
> +V1DF V2DF
> +(V1TF "TARGET_VXE") (TF "TARGET_VXE")])
> +

Whitespace.

> +; fmax
> +(define_insn "fmax3"
> +  [(set (match_operand:VFT_BFP 0 "register_operand" "=f")
> +(unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "f")
> + (match_operand:VFT_BFP 2 "register_operand" 
> "f")]

Whitespace.
We should use v constraint here.

> + UNSPEC_FMAX))]
> +  "TARGET_VXE"
> +  "fmaxb\t%v0,%v1,%v2,4"
> +  [(set_attr "op_type" "VRR")])
> +
> +; fmin
> +(define_insn "fmin3"
> +  [(set (match_operand:VFT_BFP 0 "register_operand" "=f")
> +(unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "f")
> +  (match_operand:VFT_BFP 2 "register_operand" 
> "f")]
> + UNSPEC_FMIN))]

Ditto.

> +  "TARGET_VXE"
> +  "fminb\t%v0,%v1,%v2,4"
> +  [(set_attr "op_type" "VRR")])
> diff --git a/gcc/testsuite/gcc.target/s390/fminmax-1.c 
> b/gcc/testsuite/gcc.target/s390/fminmax-1.c
> new file mode 100644
> index ..df10905f037a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/fminmax-1.c
> @@ -0,0 +1,77 @@
> +/* Check fmin/fmax expanders for scalars on VXE targets.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=z14 -mzarch" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** dofmaxl:
> +**   vl  (%v.),0\(%r3\),3
> +**   vl  (%v.),0\(%r4\),3
> +**   wfmaxxb (%v.),\1,\2,4
> +**   vst \3,0\(%r2\),3
> +**   br  %r14
> +*/
> +long double
> +dofmaxl (long double d1, long double d2)
> +{
> +  return __builtin_fmaxl (d1, d2);
> +}
> +
> +/*
> +** dofminl:
> +**   vl  (%v.),0\(%r3\),3
> +**   vl  (%v.),0\(%r4\),3
> +**   wfminxb (%v.),\1,\2,4
> +**   vst \3,0\(%r2\),3
> +**   br  %r14
> +*/
> +long double
> +dofminl (long double d1, long double d2)
> +{
> +  return __builtin_fminl (d1, d2);
> +}
> +
> +/*
> +** dofmax:
> +**   wfmaxdb %v0,%v0,%v2,4
> +**   br  %r14
> +*/
> +double
> +dofmax (double d1, double d2)
> +{
> +  return __builtin_fmax (d1, d2);
> +}
> +
> +/*
> +** dofmin:
> +**   wfmindb %v0,%v0,%v2,4
> +**   br  %r14
> +*/
> +double
> +dofmin (double d1, double d2)
> +{
> +  return __builtin_fmin (d1, d2);
> +}
> +
> +/*
> +** dofmaxf:
> +**   wfmaxsb %v0,%v0,%v2,4
> +**   br  %r14
> +*/
> +float
> +dofmaxf (float f1, float f2)
> +{
> +  return __builtin_fmaxf (f1, f2);
> +}
> +
> +/*
> +** dofminf:
> +**   wfminsb %v0,%v0,%v2,4
> +**   br  %r14
> +*/
> +float
> +dofminf (float f1, float f2)
> +{
> +  return __builtin_fminf (f1, f2);
> +}
> diff --git a/gcc/testsuite/gcc.target/s390/fminmax-2.c 
> b/gcc/testsuite/gcc.target/s390/fminmax-2.c
> new file mode 100644
> index ..ea37a0a821de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/fminmax-2.c
> @@ -0,0 +1,29 @@
> +/* Check fmin/fmax expa

Re: [PATCH v6 2/3] Use the counted_by attribute of pointers in builtinin-object-size.

2025-06-24 Thread Siddhesh Poyarekar


On 2025-06-24 03:26, Richard Biener wrote:

+   /* Handle the following stmt #2 to propagate the size from the
+  stmt #1 to #3:
+   1  _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B);
+   2  _5 = *_1;
+   3  _6 = __builtin_dynamic_object_size (_5, 1);
+*/
+   else if (TREE_CODE (rhs) == MEM_REF
+&& POINTER_TYPE_P (TREE_TYPE (rhs))
+&& TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME)


&& integer_zerop (TREE_OPERAND (rhs, 1))

?



Ahh yes, thanks for spotting that.

Sid

Re: [PATCH] gcc: remove atan from edom_only_function

2025-06-24 Thread Richard Biener

On Tue, Jun 24, 2025 at 11:31 AM Yuao Ma  wrote:
>
> Hi Richard,
>
> > OK.
> >
> > Thanks,
> > Richard.
>
> Thanks for the quick review! Could you please help me merge this patch? I'll
> post the rest of the original patch soon.

I pushed it.

Richard.

> Thanks,
> Yuao
>

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Richard Biener

On Tue, Jun 24, 2025 at 1:53 PM Alexander Monakov  wrote:
>
>
>
> On Tue, 24 Jun 2025, Richard Biener wrote:
>
> > On Tue, Jun 24, 2025 at 1:18 PM Alexander Monakov  
> > wrote:
> > >
> > > > > On Fri, May 23, 2025 at 2:31 PM Alexander Monakov 
> > > > >  wrote:
> > > > > >
> > > > > > In PR 105965 we accepted a request to form FMA instructions when the
> > > > > > source code is using a narrow generic vector that contains just one
> > > > > > element, corresponding to V1SF or V1DF mode, while the backend does 
> > > > > > not
> > > > > > expand fma patterns for such modes.
> > > > > >
> > > > > > For this to work under -ffp-contract=on, we either need to modify
> > > > > > backends, or emulate such degenerate-vector FMA via scalar FMA in
> > > > > > tree-vect-generic.  Do the latter.
> > > > >
> > > > > Can you instead apply the lowering during gimplification?  That is 
> > > > > because
> > > > > having an unsupported internal-function in the IL the user could not 
> > > > > have
> > > > > emitted directly is somewhat bad.  I thought the vector lowering could
> > > > > be generalized for more single-argument internal functions but then no
> > > > > such unsupported calls should exist in the first place.
> > > >
> > > > Sure, like below?  Not fully tested yet.
> > >
> > > Ping — now bootstrapped and regtested.
> >
> > LGTM.
>
> Thanks! Any thoughts on the other patch in the thread, about flipping
> -ffp-contract from =fast to =on?

I can't find this mail, not in spam either, but I'm OK with such change if it
comes with test coverage.

Richard.

>
> Alexander

[Patch, Fortran, Coarray, PR88076, v1] 1/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Andre Vehreschild

Hi all,

this small patch unifies handling of the optional team argument to
failed_/stopped_images(). I did not find a ticket for this, but stumbled over
it while implementing caf_shmem.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Tel.: +49 241 9291018 * Email: ve...@gmx.de 
From 4fb21b466973b66e705de3aaca0dd9990960adc3 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 25 Apr 2025 14:37:47 +0200
Subject: [PATCH 1/6] Fortran: Unify check of teams parameter in
 failed/stopped_images().

gcc/fortran/ChangeLog:

	* check.cc (gfc_check_failed_or_stopped_images): Support teams
	argument and check for incorrect type.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/failed_images_1.f08: Adapt check of error
	message.
	* gfortran.dg/coarray/stopped_images_1.f08: Same.
---
 gcc/fortran/check.cc   | 9 ++---
 gcc/testsuite/gfortran.dg/coarray/failed_images_1.f08  | 2 +-
 gcc/testsuite/gfortran.dg/coarray/stopped_images_1.f08 | 2 +-
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 838d523f7c4..a4040cae53a 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -1878,13 +1878,8 @@ gfc_check_f_c_string (gfc_expr *string, gfc_expr *asis)
 bool
 gfc_check_failed_or_stopped_images (gfc_expr *team, gfc_expr *kind)
 {
-  if (team)
-{
-  gfc_error ("%qs argument of %qs intrinsic at %L not yet supported",
-		 gfc_current_intrinsic_arg[0]->name, gfc_current_intrinsic,
-		 &team->where);
-  return false;
-}
+  if (team && (!scalar_check (team, 0) || !team_type_check (team, 0)))
+return false;
 
   if (kind)
 {
diff --git a/gcc/testsuite/gfortran.dg/coarray/failed_images_1.f08 b/gcc/testsuite/gfortran.dg/coarray/failed_images_1.f08
index 4898dd8a7a2..34ae131d15f 100644
--- a/gcc/testsuite/gfortran.dg/coarray/failed_images_1.f08
+++ b/gcc/testsuite/gfortran.dg/coarray/failed_images_1.f08
@@ -8,7 +8,7 @@ program test_failed_images_1
   integer :: i
 
   fi = failed_images() ! OK
-  fi = failed_images(TEAM=1)   ! { dg-error "'team' argument of 'failed_images' intrinsic at \\(1\\) not yet supported" }
+  fi = failed_images(TEAM=1)   ! { dg-error "'team' argument of 'failed_images' intrinsic at \\(1\\) shall be of type 'team_type' from the intrinsic module 'ISO_FORTRAN_ENV'" }
   fi = failed_images(KIND=1)   ! OK
   fi = failed_images(KIND=4)   ! OK
   fi = failed_images(KIND=0)   ! { dg-error "'kind' argument of 'failed_images' intrinsic at \\\(1\\\) must be positive" }
diff --git a/gcc/testsuite/gfortran.dg/coarray/stopped_images_1.f08 b/gcc/testsuite/gfortran.dg/coarray/stopped_images_1.f08
index 403de585b9a..7658e6bb6bb 100644
--- a/gcc/testsuite/gfortran.dg/coarray/stopped_images_1.f08
+++ b/gcc/testsuite/gfortran.dg/coarray/stopped_images_1.f08
@@ -8,7 +8,7 @@ program test_stopped_images_1
   integer :: i
 
   gi = stopped_images() ! OK
-  gi = stopped_images(TEAM=1)   ! { dg-error "'team' argument of 'stopped_images' intrinsic at \\(1\\) not yet supported" }
+  gi = stopped_images(TEAM=1)   ! { dg-error "'team' argument of 'stopped_images' intrinsic at \\(1\\) shall be of type 'team_type' from the intrinsic module 'ISO_FORTRAN_ENV'" }
   gi = stopped_images(KIND=1)   ! OK
   gi = stopped_images(KIND=4)   ! OK
   gi = stopped_images(KIND=0)   ! { dg-error "'kind' argument of 'stopped_images' intrinsic at \\\(1\\\) must be positive" }
-- 
2.49.0

[PATCH] diagnostic: fix for older version of GCC

2025-06-24 Thread Marc Poulhiès

Having both an enum and a variable with the same name triggers an error with
gcc 5.

ChangeLog:

* c/gcc/diagnostic-state-to-dot.cc (get_color_for_dynalloc_state):
Rename argument dynalloc_state to dynalloc_st.
(add_title_tr): Rename argument style to styl.
(on_xml_node): Rename local variable dynalloc_state to dynalloc_st.
---
Bootstrapped on x86_64-linux using GCC 5.5.0. 
Ok for master?


 gcc/diagnostic-state-to-dot.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/diagnostic-state-to-dot.cc b/gcc/diagnostic-state-to-dot.cc
index b6d7ec5a082b..ddae83b85cd2 100644
--- a/gcc/diagnostic-state-to-dot.cc
+++ b/gcc/diagnostic-state-to-dot.cc
@@ -51,9 +51,9 @@ enum class dynalloc_state
 };
 
 static const char *
-get_color_for_dynalloc_state (enum dynalloc_state dynalloc_state)
+get_color_for_dynalloc_state (enum dynalloc_state dynalloc_st)
 {
-  switch (dynalloc_state)
+  switch (dynalloc_st)
 {
 default:
   gcc_unreachable ();
@@ -242,7 +242,7 @@ private:
int num_columns,
const xml::element &input_element,
std::string heading,
-   enum style style,
+   enum style styl,
enum dynalloc_state dynalloc_state)
   {
 xp.push_tag ("tr", true);
@@ -258,7 +258,7 @@ private:
color = "white";
   }
 else
-  switch (style)
+  switch (styl)
{
default:
  gcc_unreachable ();
@@ -323,12 +323,12 @@ private:
 else if (input_element->m_kind == "heap-buffer")
   {
const char *extents = input_element->get_attr ("dynamic-extents");
-   enum dynalloc_state dynalloc_state = get_dynalloc_state 
(*input_element);
+   enum dynalloc_state dynalloc_st = get_dynalloc_state (*input_element);
if (auto region_id = input_element->get_attr ("region_id"))
-   m_region_id_to_dynalloc_state[region_id] = dynalloc_state;
+   m_region_id_to_dynalloc_state[region_id] = dynalloc_st;
const char *type = input_element->get_attr ("type");
pretty_printer pp;
-   switch (dynalloc_state)
+   switch (dynalloc_st)
  {
  default:
gcc_unreachable ();
@@ -375,7 +375,7 @@ private:
add_title_tr (id_of_node, xp, num_columns, *input_element,
  pp_formatted_text (&pp),
  style::h2,
- dynalloc_state);
+ dynalloc_st);
   }
 else
   {
-- 
2.43.0

Re: [PATCH] Fortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]

2025-06-24 Thread Steve Kargl

On Tue, Jun 24, 2025 at 09:00:46PM +0200, Harald Anlauf wrote:
> 
> here's an obvious fix for a recent regression: substring offset
> calculations used a wrong type that crashed in gimplification.
> Andre basically OK'ed it in the PR, but here it is nevertheless.
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 

Yes.  Thanks for the patch.


-- 
Steve

[PATCH v1 0/2] middle-end: Enable masked load with non-constant offset

2025-06-24 Thread Karl Meakin

The function `vect_check_gather_scatter` requires the `base` of the load
to be loop-invariant and the `off`set to be not loop-invariant. When faced
with a scenario where `base` is not loop-invariant, instead of giving up
immediately we can try swapping the `base` and `off`, if `off` is
actually loop-invariant.

Previously, it would only swap if `off` was the constant zero (and so
trivially loop-invariant). This is too conservative: we can still
perform the swap if `off` is a more complex but still loop-invariant
expression, such as a variable defined outside of the loop.

This patch allows loops like the function below to be vectorised, if the
target has masked loads and sufficiently large vector registers (eg
`-march=armv8-a+sve -msve-vector-bits=128`):

```c
typedef struct Array {
int elems[3];
} Array;

int loop(Array **pp, int len, int idx) {
int nRet = 0;

for (int i = 0; i < len; i++) {
Array *p = pp[i];
if (p) {
nRet += p->elems[idx];
}
}

return nRet;
}
```

Changelog:
- v1: Initial patch

Karl Meakin (2):
  AArch64: precommit test for masked load vectorisation.
  middle-end: Enable masked load with non-constant offset

 .../gcc.target/aarch64/sve/mask_load_2.c  | 23 
 gcc/tree-vect-data-refs.cc| 26 ---
 2 files changed, 34 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c

-- 
2.45.2

Re: [PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

2025-06-24 Thread Nathaniel Shead

On Tue, Jun 24, 2025 at 01:03:53PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following patch implements the P3618R0 paper by tweaking pedwarn
> condition, adjusting pedwarn wording, adjusting one testcase and adding 4
> new ones.  The paper was voted in as DR, so it isn't guarded on C++ version.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2025-06-24  Jakub Jelinek  
> 
>   PR c++/120773
>   * decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
>   main to the global module.  Only pedwarn for current_lang_name
>   other than lang_name_cplusplus and adjust pedwarn wording.
> 
>   * g++.dg/parse/linkage5.C: Don't expect error on
>   extern "C++" int main ();.
>   * g++.dg/parse/linkage7.C: New test.
>   * g++.dg/parse/linkage8.C: New test.
>   * g++.dg/modules/main-2.C: New test.
>   * g++.dg/modules/main-3.C: New test.
> 
> --- gcc/cp/decl.cc.jj 2025-06-19 08:55:04.408676724 +0200
> +++ gcc/cp/decl.cc2025-06-23 17:47:13.942011687 +0200
> @@ -11326,9 +11326,9 @@ grokfndecl (tree ctype,
> "cannot declare %<::main%> to be %qs", "consteval");
>if (!publicp)
>   error_at (location, "cannot declare %<::main%> to be static");
> -  if (current_lang_depth () != 0)
> +  if (current_lang_name != lang_name_cplusplus)
>   pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a"
> -  " linkage specification");
> +  " linkage specification other than %<\"C++\"%>");
>if (module_attach_p ())
>   error_at (location, "cannot attach %<::main%> to a named module");

Maybe it would be nice to add a note/fixit that users can now work
around this error by marking main as 'extern "C++"'?  But overall LGTM.

>inlinep = 0;
> --- gcc/testsuite/g++.dg/parse/linkage5.C.jj  2024-05-22 09:11:46.979234663 
> +0200
> +++ gcc/testsuite/g++.dg/parse/linkage5.C 2025-06-23 18:00:38.067742494 
> +0200
> @@ -1,5 +1,6 @@
>  // { dg-do compile }
> -// The main function shall not be declared with a linkage-specification.
> +// The main function shall not be declared with a linkage-specification
> +// other than "C++".
>  
>  extern "C" {
>int main();  // { dg-error "linkage" }
> @@ -9,6 +10,6 @@ namespace foo {
>extern "C" int main();  // { dg-error "linkage" }
>  }
>  
> -extern "C++" int main(); // { dg-error "linkage" }
> +extern "C++" int main();
>  
>  extern "C" struct S { int main(); };  // OK
> --- gcc/testsuite/g++.dg/parse/linkage7.C.jj  2025-06-23 18:01:17.622237056 
> +0200
> +++ gcc/testsuite/g++.dg/parse/linkage7.C 2025-06-23 18:01:32.385048426 
> +0200
> @@ -0,0 +1,7 @@
> +// { dg-do compile }
> +// The main function shall not be declared with a linkage-specification
> +// other than "C++".
> +
> +extern "C++" {
> +  int main();
> +}
> --- gcc/testsuite/g++.dg/parse/linkage8.C.jj  2025-06-23 18:01:39.830953283 
> +0200
> +++ gcc/testsuite/g++.dg/parse/linkage8.C 2025-06-23 18:01:57.657725492 
> +0200
> @@ -0,0 +1,5 @@
> +// { dg-do compile }
> +// The main function shall not be declared with a linkage-specification
> +// other than "C++".
> +
> +extern "C" int main();   // { dg-error "linkage" }
> --- gcc/testsuite/g++.dg/modules/main-2.C.jj  2025-06-23 18:25:17.058941644 
> +0200
> +++ gcc/testsuite/g++.dg/modules/main-2.C 2025-06-23 18:26:11.416253264 
> +0200
> @@ -0,0 +1,4 @@
> +// { dg-additional-options "-fmodules" }
> +
> +export module M;
> +extern "C++" int main() {}
> --- gcc/testsuite/g++.dg/modules/main-3.C.jj  2025-06-23 18:26:20.393139580 
> +0200
> +++ gcc/testsuite/g++.dg/modules/main-3.C 2025-06-23 18:26:33.190977509 
> +0200
> @@ -0,0 +1,7 @@
> +// { dg-additional-options "-fmodules" }
> +
> +export module M;
> +extern "C++" {
> +  int main() {}
> +}
> +
> 
>   Jakub
>

[PATCH] RISC-V: Refactor the function bitmap_union_of_preds_with_entry

2025-06-24 Thread Jin Ma

The current implementation of this function is somewhat difficult to
understand, as it uses a direct break statement within the for loop,
rendering the loop meaningless. Additionally, during the Coverity check
on the for loop, a warning appeared: "unreachable: Since the loop
increment ix++; is unreachable, the loop body will never execute more
than once." Therefore, I have made some simple refactoring to address
these issues.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): 
Refactor.

Signed-off-by: Jin Ma 
---
 gcc/config/riscv/riscv-vsetvl.cc | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 4891b6c95e8..ffbe8986cec 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -100,31 +100,27 @@ using namespace riscv_vector;
 static void
 bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b)
 {
-  unsigned int set_size = dst->size;
-  edge e;
-  unsigned ix;
-
-  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
+  /* Handle case with no predecessors (including ENTRY block).  */
+  if (EDGE_COUNT (b->preds) == 0)
 {
-  e = EDGE_PRED (b, ix);
-  bitmap_copy (dst, src[e->src->index]);
-  break;
+  bitmap_clear (dst);
+  return;
 }
 
-  if (ix == EDGE_COUNT (b->preds))
-bitmap_clear (dst);
-  else
-for (ix++; ix < EDGE_COUNT (b->preds); ix++)
-  {
-   unsigned int i;
-   SBITMAP_ELT_TYPE *p, *r;
-
-   e = EDGE_PRED (b, ix);
-   p = src[e->src->index]->elms;
-   r = dst->elms;
-   for (i = 0; i < set_size; i++)
- *r++ |= *p++;
-  }
+  /* Initialize with first predecessor's bitmap.  */
+  edge first_pred = EDGE_PRED (b, 0);
+  bitmap_copy (dst, src[first_pred->src->index]);
+
+  /* Union remaining predecessors' bitmaps.  */
+  for (unsigned ix = 1; ix < EDGE_COUNT (b->preds); ix++)
+{
+  edge e = EDGE_PRED (b, ix);
+  const sbitmap pred_src = src[e->src->index];
+
+  /* Perform bitmap OR operation element-wise.  */
+  for (unsigned i = 0; i < dst->size; i++)
+   dst->elms[i] |= pred_src->elms[i];
+}
 }
 
 /* Compute the reaching definition in and out based on the gen and KILL
-- 
2.25.1

[PATCH 3/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

* tree-vect-stmts.cc (vectorizable_load): One more tricky
!SLP path removal.
---
 gcc/tree-vect-stmts.cc | 40 +---
 1 file changed, 1 insertion(+), 39 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c5fe7879d5a..eca7e70adf4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12007,9 +12007,6 @@ vectorizable_load (vec_info *vinfo,
 stmt_info, bump);
}
 
-  if (!slp_perm)
-   continue;
-
   if (slp_perm)
{
  unsigned n_perms;
@@ -12031,43 +12028,8 @@ vectorizable_load (vec_info *vinfo,
  nullptr, true);
  gcc_assert (ok);
}
+ dr_chain.release ();
}
-  else
-   {
- if (grouped_load)
-   {
- gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
- /* We assume that the cost of a single load-lanes instruction
-is equivalent to the cost of DR_GROUP_SIZE separate loads.
-If a grouped access is instead being provided by a
-load-and-permute operation, include the cost of the
-permutes.  */
- if (costing_p && first_stmt_info == stmt_info)
-   {
- /* Uses an even and odd extract operations or shuffle
-operations for each needed permute.  */
- int group_size = DR_GROUP_SIZE (first_stmt_info);
- int nstmts = ceil_log2 (group_size) * group_size;
- inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
-  slp_node, 0, vect_body);
-
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"vect_model_load_cost: "
-"strided group_size = %d .\n",
-group_size);
-   }
- else if (!costing_p)
-   {
- vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
-  group_size, gsi);
- *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
-   }
-   }
- else if (!costing_p)
-   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
-   }
-  dr_chain.release ();
 }
 
   if (costing_p)
-- 
2.43.0

[PATCH 1/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

This cleans the rest of vectorizable_load from non-SLP

* tree-vect-stmts.cc (vectorizable_load): Step 1.
---
 gcc/tree-vect-stmts.cc | 62 +-
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f699d808e68..92739903754 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9850,7 +9850,7 @@ vectorizable_load (vec_info *vinfo,
   bool compute_in_loop = false;
   class loop *at_loop;
   int vec_num;
-  bool slp = (slp_node != NULL);
+  bool slp = true;
   bool slp_perm = false;
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   poly_uint64 vf;
@@ -9909,7 +9909,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   mask_index = internal_fn_mask_index (ifn);
-  if (mask_index >= 0 && slp_node)
+  if (mask_index >= 0 && 1)
mask_index = vect_slp_child_index_for_operand
(call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (mask_index >= 0
@@ -9918,7 +9918,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   els_index = internal_fn_else_index (ifn);
-  if (els_index >= 0 && slp_node)
+  if (els_index >= 0 && 1)
els_index = vect_slp_child_index_for_operand
  (call, els_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (els_index >= 0
@@ -9942,7 +9942,7 @@ vectorizable_load (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp)
+  if (1)
 ncopies = 1;
   else
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
@@ -9951,7 +9951,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* FORNOW. This restriction should be relaxed.  */
   if (nested_in_vect_loop
-  && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
+  && (ncopies > 1 || (1 && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9998,7 +9998,7 @@ vectorizable_load (vec_info *vinfo,
   group_size = DR_GROUP_SIZE (first_stmt_info);
 
   /* Refuse non-SLP vectorization of SLP-only groups.  */
-  if (!slp && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info))
+  if (0 && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -10046,7 +10046,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* ???  The following checks should really be part of
  get_group_load_store_type.  */
-  if (slp
+  if (1
   && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
   && !((memory_access_type == VMAT_ELEMENTWISE
|| memory_access_type == VMAT_GATHER_SCATTER)
@@ -10090,7 +10090,7 @@ vectorizable_load (vec_info *vinfo,
}
 }
 
-  if (slp_node
+  if (1
   && slp_node->ldst_lanes
   && memory_access_type != VMAT_LOAD_STORE_LANES)
 {
@@ -10142,7 +10142,7 @@ vectorizable_load (vec_info *vinfo,
 
   if (costing_p) /* transformation not required.  */
 {
-  if (slp_node
+  if (1
  && mask
  && !vect_maybe_update_slp_op_vectype (slp_op,
mask_vectype))
@@ -10153,7 +10153,7 @@ vectorizable_load (vec_info *vinfo,
  return false;
}
 
-  if (!slp)
+  if (0)
STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
   else
SLP_TREE_MEMORY_ACCESS_TYPE (slp_node) = memory_access_type;
@@ -10210,7 +10210,7 @@ vectorizable_load (vec_info *vinfo,
   if (elsvals.length ())
 maskload_elsval = *elsvals.begin ();
 
-  if (!slp)
+  if (0)
 gcc_assert (memory_access_type
== STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
   else
@@ -10289,7 +10289,7 @@ vectorizable_load (vec_info *vinfo,
   vectype, &gsi2);
}
   gimple *new_stmt = SSA_NAME_DEF_STMT (new_temp);
-  if (slp)
+  if (1)
for (j = 0; j < (int) SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); ++j)
  slp_node->push_vec_def (new_stmt);
   else
@@ -10616,11 +10616,11 @@ vectorizable_load (vec_info *vinfo,
 }
 
   if (memory_access_type == VMAT_GATHER_SCATTER
-  || (!slp && memory_access_type == VMAT_CONTIGUOUS))
+  || (0 && memory_access_type == VMAT_CONTIGUOUS))
 grouped_load = false;
 
   if (grouped_load
-  || (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()))
+  || (1 && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()))
 {
   if (grouped_load)
{
@@ -10634,7 +10634,7 @@ vectorizable_load (vec_info *vinfo,
}
   /* For SLP vectorization we directly vectorize a subchain
  without permutation.  */
-  if (slp && ! SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
+  if (1 && ! SLP_TREE_LOAD_PERMUTAT

[PATCH 5/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

Elide loops over ncopies.

* tree-vect-stmts.cc (vectorizable_load):
---
 gcc/tree-vect-stmts.cc | 54 --
 1 file changed, 20 insertions(+), 34 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 2efa34c..717d4694b88 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11007,10 +11007,9 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (!grouped_load && !slp_perm);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
-  for (j = 0; j < 1; j++)
{
  /* 1. Create the vector or array pointer update chain.  */
- if (j == 0 && !costing_p)
+ if (!costing_p)
{
  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
@@ -11022,13 +11021,6 @@ vectorizable_load (vec_info *vinfo,
  at_loop, offset, &dummy, gsi,
  &ptr_incr, false, bump);
}
- else if (!costing_p)
-   {
- gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
- if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
-  gsi, stmt_info, bump);
-   }
 
  gimple *new_stmt = NULL;
  for (i = 0; i < vec_num; i++)
@@ -11039,12 +11031,11 @@ vectorizable_load (vec_info *vinfo,
  if (!costing_p)
{
  if (mask)
-   vec_mask = vec_masks[vec_num * j + i];
+   vec_mask = vec_masks[i];
  if (loop_masks)
final_mask
  = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-   vec_num, vectype,
-   vec_num * j + i);
+   vec_num, vectype, i);
  if (vec_mask)
final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
   final_mask, vec_mask, gsi);
@@ -11067,7 +11058,7 @@ vectorizable_load (vec_info *vinfo,
  continue;
}
  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   vec_offset = vec_offsets[vec_num * j + i];
+   vec_offset = vec_offsets[i];
  tree zero = build_zero_cst (vectype);
  tree scale = size_int (gs_info.scale);
 
@@ -11076,8 +11067,7 @@ vectorizable_load (vec_info *vinfo,
  if (loop_lens)
final_len
  = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
-  vec_num, vectype,
-  vec_num * j + i, 1);
+  vec_num, vectype, i, 1);
  else
final_len
  = build_int_cst (sizetype,
@@ -11148,7 +11138,7 @@ vectorizable_load (vec_info *vinfo,
{
  new_stmt = vect_build_one_gather_load_call
   (vinfo, stmt_info, gsi, &gs_info,
-   dataref_ptr, vec_offsets[vec_num * j + i],
+   dataref_ptr, vec_offsets[i],
final_mask);
  data_ref = NULL_TREE;
}
@@ -11159,7 +11149,7 @@ vectorizable_load (vec_info *vinfo,
 data with just the lower lanes filled.  */
  new_stmt = vect_build_one_gather_load_call
  (vinfo, stmt_info, gsi, &gs_info,
-  dataref_ptr, vec_offsets[2 * vec_num * j + 2 * i],
+  dataref_ptr, vec_offsets[2 * i],
   final_mask);
  tree low = make_ssa_name (vectype);
  gimple_set_lhs (new_stmt, low);
@@ -11204,7 +11194,7 @@ vectorizable_load (vec_info *vinfo,
  new_stmt = vect_build_one_gather_load_call
   (vinfo, stmt_info, gsi, &gs_info,
dataref_ptr,
-   vec_offsets[2 * vec_num * j + 2 * i + 1],
+   vec_offsets[2 * i + 1],
final_mask);
  tree high = make_ssa_name (vectype);
  gimple_set_lhs (new_stmt, high);
@@ -11229,8 +11219,8 @@ vectorizable_load (vec_info *vinfo,
{
  /* We have a offset vector with double the number of
 lanes.  Select the low/high part accordingly.  */
- vec_offset =

[PATCH 4/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

Propagate out ncopies == 1.

* tree-vect-stmts.cc (vectorizable_load): Step 3.
---
 gcc/tree-vect-stmts.cc | 46 +++---
 1 file changed, 12 insertions(+), 34 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index eca7e70adf4..2efa34c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9836,7 +9836,6 @@ vectorizable_load (vec_info *vinfo,
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple *ptr_incr = NULL;
-  int ncopies;
   int i, j;
   unsigned int group_size;
   poly_uint64 group_gap_adj;
@@ -9938,16 +9937,9 @@ vectorizable_load (vec_info *vinfo,
   else
 vf = 1;
 
-  /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
- case of SLP.  */
-  ncopies = 1;
-
-  gcc_assert (ncopies >= 1);
-
   /* FORNOW. This restriction should be relaxed.  */
   if (nested_in_vect_loop
-  && (ncopies > 1 || SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1))
+  && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9955,20 +9947,6 @@ vectorizable_load (vec_info *vinfo,
   return false;
 }
 
-  /* Invalidate assumptions made by dependence analysis when vectorization
- on the unrolled body effectively re-orders stmts.  */
-  if (ncopies > 1
-  && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
-  && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
-  STMT_VINFO_MIN_NEG_DIST (stmt_info)))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"cannot perform implicit CSE when unrolling "
-"with negative dependence distance\n");
-  return false;
-}
-
   elem_type = TREE_TYPE (vectype);
   mode = TYPE_MODE (vectype);
 
@@ -10018,7 +9996,7 @@ vectorizable_load (vec_info *vinfo,
   int maskload_elsval = 0;
   bool need_zeroing = false;
   if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, 
VLS_LOAD,
-   ncopies, &memory_access_type, &poffset,
+   1, &memory_access_type, &poffset,
&alignment_support_scheme, &misalignment, &gs_info,
&lanes_ifn, &elsvals))
 return false;
@@ -10194,8 +10172,7 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (memory_access_type == SLP_TREE_MEMORY_ACCESS_TYPE (slp_node));
 
   if (dump_enabled_p () && !costing_p)
-dump_printf_loc (MSG_NOTE, vect_location,
- "transform load. ncopies = %d\n", ncopies);
+dump_printf_loc (MSG_NOTE, vect_location, "transform load.\n");
 
   /* Transform.  */
 
@@ -10443,6 +10420,7 @@ vectorizable_load (vec_info *vinfo,
   /* For SLP permutation support we need to load the whole group,
 not only the number of vector stmts the permutation result
 fits in.  */
+  int ncopies;
   if (slp_perm)
{
  /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
@@ -10869,7 +10847,7 @@ vectorizable_load (vec_info *vinfo,
   /* For costing some adjacent vector loads, we'd like to cost with
 the total number of them once instead of cost each one by one. */
   unsigned int n_adjacent_loads = 0;
-  ncopies = slp_node->vec_stmts_size / group_size;
+  int ncopies = slp_node->vec_stmts_size / group_size;
   for (j = 0; j < ncopies; j++)
{
  if (costing_p)
@@ -11029,7 +11007,7 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (!grouped_load && !slp_perm);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
-  for (j = 0; j < ncopies; j++)
+  for (j = 0; j < 1; j++)
{
  /* 1. Create the vector or array pointer update chain.  */
  if (j == 0 && !costing_p)
@@ -11065,7 +11043,7 @@ vectorizable_load (vec_info *vinfo,
  if (loop_masks)
final_mask
  = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-   vec_num * ncopies, vectype,
+   vec_num, vectype,
vec_num * j + i);
  if (vec_mask)
final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
@@ -11098,7 +11076,7 @@ vectorizable_load (vec_info *vinfo,
  if (loop_lens)
final_len
  = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
-  vec_num * ncopies, vectype,
+  vec_num, vectype,
   vec_num * j + i, 1);
  else
final_len
@@ -11394,7 +11372,7 @@ vectorizable_load (v

[PATCH] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

This cleans the rest of vectorizable_load from non-SLP, propagates
out ncopies == 1, and elides loops from 0 to ncopies.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vectorizable_load): Remove non-SLP
paths and propagate out ncopies == 1.
---
 gcc/tree-vect-stmts.cc | 1935 ++--
 1 file changed, 876 insertions(+), 1059 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f699d808e68..db1b539b6c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9836,7 +9836,6 @@ vectorizable_load (vec_info *vinfo,
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple *ptr_incr = NULL;
-  int ncopies;
   int i, j;
   unsigned int group_size;
   poly_uint64 group_gap_adj;
@@ -9850,7 +9849,6 @@ vectorizable_load (vec_info *vinfo,
   bool compute_in_loop = false;
   class loop *at_loop;
   int vec_num;
-  bool slp = (slp_node != NULL);
   bool slp_perm = false;
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   poly_uint64 vf;
@@ -9909,7 +9907,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   mask_index = internal_fn_mask_index (ifn);
-  if (mask_index >= 0 && slp_node)
+  if (mask_index >= 0)
mask_index = vect_slp_child_index_for_operand
(call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (mask_index >= 0
@@ -9918,7 +9916,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   els_index = internal_fn_else_index (ifn);
-  if (els_index >= 0 && slp_node)
+  if (els_index >= 0)
els_index = vect_slp_child_index_for_operand
  (call, els_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (els_index >= 0
@@ -9939,19 +9937,9 @@ vectorizable_load (vec_info *vinfo,
   else
 vf = 1;
 
-  /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
- case of SLP.  */
-  if (slp)
-ncopies = 1;
-  else
-ncopies = vect_get_num_copies (loop_vinfo, vectype);
-
-  gcc_assert (ncopies >= 1);
-
   /* FORNOW. This restriction should be relaxed.  */
   if (nested_in_vect_loop
-  && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
+  && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9959,20 +9947,6 @@ vectorizable_load (vec_info *vinfo,
   return false;
 }
 
-  /* Invalidate assumptions made by dependence analysis when vectorization
- on the unrolled body effectively re-orders stmts.  */
-  if (ncopies > 1
-  && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
-  && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
-  STMT_VINFO_MIN_NEG_DIST (stmt_info)))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"cannot perform implicit CSE when unrolling "
-"with negative dependence distance\n");
-  return false;
-}
-
   elem_type = TREE_TYPE (vectype);
   mode = TYPE_MODE (vectype);
 
@@ -9997,15 +9971,6 @@ vectorizable_load (vec_info *vinfo,
   first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
   group_size = DR_GROUP_SIZE (first_stmt_info);
 
-  /* Refuse non-SLP vectorization of SLP-only groups.  */
-  if (!slp && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"cannot vectorize load in non-SLP mode.\n");
- return false;
-   }
-
   /* Invalidate assumptions made by dependence analysis when vectorization
 on the unrolled body effectively re-orders stmts.  */
   if (STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
@@ -10031,7 +9996,7 @@ vectorizable_load (vec_info *vinfo,
   int maskload_elsval = 0;
   bool need_zeroing = false;
   if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, 
VLS_LOAD,
-   ncopies, &memory_access_type, &poffset,
+   1, &memory_access_type, &poffset,
&alignment_support_scheme, &misalignment, &gs_info,
&lanes_ifn, &elsvals))
 return false;
@@ -10046,8 +10011,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* ???  The following checks should really be part of
  get_group_load_store_type.  */
-  if (slp
-  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
+  if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
   && !((memory_access_type == VMAT_ELEMENTWISE
|| memory_access_type == VMAT_GATHER_SCATTER)
   && SLP_TREE_LANES (slp_node) == 1))
@@ -10090,8 +10054,7 @@ vectorizable_load (vec_info *vinfo,
}
 }
 
-  if (slp_node
-  && slp_node->ldst_lanes

Re: [Patch] gcn: Fix glc vs. sc0 handling for scalar memory access

2025-06-24 Thread Andrew Stubbs


On 23/06/2025 22:39, Tobias Burnus wrote:

This is more based on documentation reading that on testing
as still only limited MI300 testing has been done and seemingly
this code does not usually get touched.

MI300's "9.1.10 Memory Scope and Temporal Control" distinguishes
between scalar memory (9.1.10.1) for which a single control bit exists:
GLC (Globally Coherent) [+ dlc, slc, scc, but not used by MI300].

And, for vector memory (9.1.10.2; flat, global, scratch, buffer),
there is the system cache level SC[1:0] (wave, group, device system)
and also NT (non temporal).

This patch moves back to 'glc' for scalar memory access.

OK for mainline?

Tobias

PS: Some more smaller fixes are in the pipeline and there are some
known MI300 issues, not all fully understood. Likewise in the
(to-do) pipeline is more more in depth testing.



You still seem to have the unrelated preload bits in this patch, but 
other than that, this looks fine.


In principle, we could use %Gn everywhere and use the address space from 
the MEM to determine which cache to use, but that's probably overkill 
until we need it.


Andrew

[Patch] Fortran/OpenACC: Add Fortran support for acc_attach/acc_detach

2025-06-24 Thread Tobias Burnus


This patch adds the OpenACC routines acc_attach and acc_detach.

However, in order to avoid the generation of a temporary, which
breaks this feature, a special case had to be added to
gfc_trans_call.

Otherwise, I think it completes the Fortran additions of existing
C/C++ functions, by adding this OpenACC 3.3 feature, which is used
by ICON.

Any comments, suggestions, remarks before I commit this patch?

Tobias
Fortran/OpenACC: Add Fortran support for acc_attach/acc_detach

While C/++ support the routines acc_attach{,_async} and
acc_detach{,_finalize}{,_async} routines since a long time, the Fortran
API routines where only added in OpenACC 3.3.

Unfortunately, they cannot directly be implemented in the library as
GCC will introduce a temporary array descriptor in some cases, which
causes the attempted attachment to the this temporary variable instead
of to the original one.

Therefore, those API routines are handled in a special way in the compiler.

gcc/fortran/ChangeLog:

	* trans-stmt.cc (gfc_trans_call_acc_attach_detach): New.
	(gfc_trans_call): Call it.

libgomp/ChangeLog:

	* libgomp.texi (acc_attach, acc_detach): Update for Fortran
	version.
	* openacc.f90 acc_attach{,_async}, acc_detach{,_finalize}{,_async}:
	Add.
	* openacc_lib.h: Likewise.
	* testsuite/libgomp.oacc-fortran/acc-attach-detach-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-attach-detach-2.f90: New test.

 gcc/fortran/trans-stmt.cc  | 74 +-
 libgomp/libgomp.texi   | 40 ++--
 libgomp/openacc.f90| 44 +
 libgomp/openacc_lib.h  | 42 
 .../libgomp.oacc-fortran/acc-attach-detach-1.f90   | 25 
 .../libgomp.oacc-fortran/acc-attach-detach-2.f90   | 62 ++
 6 files changed, 265 insertions(+), 22 deletions(-)

diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index 487b7687ef1..f1054015862 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -377,6 +377,57 @@ get_intrinsic_for_code (gfc_code *code)
 }
 
 
+/* Handle the OpenACC routines acc_attach{,_async} and
+   acc_detach{,_finalize}{,_async} explicitly.  This is required as the
+   the corresponding device pointee is attached to the corresponding device
+   pointer, but if a temporary array descriptor is created for the call,
+   that one is used as pointer instead of the original pointer.  */
+
+tree
+gfc_trans_call_acc_attach_detach (gfc_code *code)
+{
+  stmtblock_t block;
+  gfc_se ptr_addr_se, async_se;
+  tree fn;
+
+  fn = code->resolved_sym->backend_decl;
+  if (fn == NULL)
+{
+  fn = gfc_get_symbol_decl (code->resolved_sym);
+  code->resolved_sym->backend_decl = fn;
+}
+
+  gfc_start_block (&block);
+
+  gfc_init_se (&ptr_addr_se, NULL);
+  ptr_addr_se.descriptor_only = 1;
+  ptr_addr_se.want_pointer = 1;
+  gfc_conv_expr (&ptr_addr_se, code->ext.actual->expr);
+  gfc_add_block_to_block (&block, &ptr_addr_se.pre);
+  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (ptr_addr_se.expr)))
+ptr_addr_se.expr = gfc_conv_descriptor_data_get (ptr_addr_se.expr);
+  ptr_addr_se.expr = build_fold_addr_expr (ptr_addr_se.expr);
+
+  bool async = code->ext.actual->next != NULL;
+  if (async)
+{
+  gfc_init_se (&async_se, NULL);
+  gfc_conv_expr (&async_se, code->ext.actual->next->expr);
+  fn = build_call_expr_loc (gfc_get_location (&code->loc), fn, 2,
+ptr_addr_se.expr, async_se.expr);
+}
+  else
+fn = build_call_expr_loc (gfc_get_location (&code->loc),
+			  fn, 1, ptr_addr_se.expr);
+  gfc_add_expr_to_block (&block, fn);
+  gfc_add_block_to_block (&block, &ptr_addr_se.post);
+  if (async)
+gfc_add_block_to_block (&block, &async_se.post);
+
+  return gfc_finish_block (&block);
+}
+
+
 /* Translate the CALL statement.  Builds a call to an F95 subroutine.  */
 
 tree
@@ -392,13 +443,32 @@ gfc_trans_call (gfc_code * code, bool dependency_check,
   tree tmp;
   bool is_intrinsic_mvbits;
 
+  gcc_assert (code->resolved_sym);
+
+  /* Unfortunately, acc_attach* and acc_detach* need some special treatment for
+ attaching the the pointee to a pointer as GCC might introduce a temporary
+ array descriptor, whose data component is then used as to be attached to
+ pointer.  */
+  if (flag_openacc
+  && code->resolved_sym->attr.subroutine
+  && code->resolved_sym->formal
+  && code->resolved_sym->formal->sym->ts.type == BT_ASSUMED
+  && code->resolved_sym->formal->sym->attr.dimension
+  && code->resolved_sym->formal->sym->as->type == AS_ASSUMED_RANK
+  && startswith (code->resolved_sym->name, "acc_")
+  && (!strcmp (code->resolved_sym->name + 4, "attach")
+	  || !strcmp (code->resolved_sym->name + 4, "attach_async")
+	  || !strcmp (code->resolved_sym->name + 4, "detach")
+	  || !strcmp (code->resolved_sym->name + 4, "detach_async")
+	  || !strcmp (code->resolved_sym->name + 4, "detach_finalize")

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Alexander Monakov

> I'd say we want to fix these kind of things before switching the default.  Can
> you file bugreports for the distinct issues you noticed when adjusting the
> testcases?

Sure, filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120808 for the most
frequently hit issue on x86 for now.

> I suppose they are reproducible as well when using the C fma() function
> directly?

No, unfortunately there are multiple issues with fma builtin:

1) __builtin_fma does not accept generic vector types

2) we have FMS FNMA FNMS FMADDSUB FMSUBADD internal functions, but
no corresponding builtins

3) __builtin_fma and .FMA internal function are not the same in the middle-end,
I reported one instance arising from that in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892

Alexander

Re: [Patch, Fortran, Coarray, PR88076, v1] 6/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Steve Kargl

Damian,

I submitted a patch a long time ago to make -fcoarray=single the
default behavior.  The patch made -fcoarray=none a NOP.  With
inclusion of a shmem implementation of the runtime parts, this
might be the way to go.  I'll leave that decision to Andre, Thomas,
and Nicolas.

I believe that the gfortran contributors have not considered
coarray as an optional add-on.  The problem for gfortran is
that it runs on dozens of CPUs and dozens upon dozens of
operating systems.  The few gfortran contributors simply cannot
ensure that opencoarray+mpich or opencoarray+openmpi runs on
all of the possible combinations of hardware and OS's.  Andre
has hinted that he expects some rough edges on non-linux system.
I'll find out this weekend when I give his patch a spin on
FreeBSD.  Hopefully, a windows10/11 user can test the patch. 

-- 
steve

On Tue, Jun 24, 2025 at 06:34:53AM -0700, Damian Rouson wrote:
> If gfortran will have a shared-memory coarray implemented, it would be
> great to also drop the requirement to pass  -fcoarray.  Other compilers
> have trended in the direction of dropping the flag too, including Cray and
> NAG.
> 
> Even all these years after Fortran 2008 introduced multi-image execution, I
> still here vendors talk about the multi-image features as if they are an
> optional add-on rather than a large and significant feature set that was
> fully integrated into the language 3 standards ago. Making the flag
> optional would help to communicate that coarrays are a first-class feature
> rather than an after-thought.  Users who want to link in an external
> library such as OpenCoarrays (or someday Caffeine?) can still use the flag
> for that purpose.
> 
> D
> 
> On Tue, Jun 24, 2025 at 06:14 Andre Vehreschild  wrote:
> 
> > Hi all,
> >
> > this is the last patch of the mini-series. It just updates the testcases
> > common
> > to coarrays in the gfortran testsuite. All tests in the
> > gcc/testsuite/gfortran.dg/caf directory are now also run with caf_shmem.
> > The
> > test driver ensures, that no more than 8 images are used per testcase (if
> > not
> > specified differently by the tester, setting GFORTRAN_NUM_IMAGES
> > beforehand).
> > This is to prevent large machines testing on all hardware threads without
> > any
> > benefit. The minimum number of images required is 8 and therefore that
> > number
> > was chosen.
> >
> > Bootstrapped and regtests fine on x86_64-pc-linux-gnu / F41. Ok for
> > mainline?
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >

-- 
Steve

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jakub Jelinek

On Tue, Jun 24, 2025 at 11:57:01AM -0400, Jason Merrill wrote:
> > The two other errors on the testcase are expectedly gone with C++26,
> > but the last one remains.  The problem is that when parsing nm3.a
> > inside of mutable_subobjects()::A::f()
> > build_class_member_access_expr calls build_base_path which calls
> > cp_build_addr_expr and that makes nm3 odr-used.  I must say I have
> > no idea whether nm3 ought to be odr-used or not just because of nm3.a
> > use and if not, how that should be changed.
> 
> build_simple_base_path is how we avoid this odr-use; seems we also need to
> use it early in the case of (v_binfo && !virtual_access).

--- gcc/cp/class.cc.jj  2025-06-18 17:24:03.973867379 +0200
+++ gcc/cp/class.cc 2025-06-24 20:11:21.728169508 +0200
@@ -349,7 +349,11 @@ build_base_path (enum tree_code code,
 
   /* For a non-pointer simple base reference, express it as a COMPONENT_REF
  without taking its address (and so causing lambda capture, 91933).  */
-  if (code == PLUS_EXPR && !v_binfo && !want_pointer && !has_empty && !uneval)
+  if (code == PLUS_EXPR
+  && !want_pointer
+  && !has_empty
+  && !uneval
+  && (!v_binfo || resolves_to_fixed_type_p (expr) > 0))
 return build_simple_base_path (expr, binfo);
 
   if (!want_pointer)

seems to fix that and doesn't regress anything else in make check-g++.
I guess it can be handled separately from the rest.
Or do you prefer some other way to avoid calling resolves_to_fixed_type_p
twice in some cases?

> > works at runtime.  In the patch I've adjusted the function
> > comment of cxx_eval_dynamic_cast_fn because with virtual bases
> > I believe hint -1 might be possible, though I'm afraid I don't
> 
> Yes, we would get -1 for dynamic_cast from B to A.

The routine then has some
  /* Given dynamic_cast(v),

 [expr.dynamic.cast] If C is the class type to which T points or refers,
 the runtime check logically executes as follows:

 If, in the most derived object pointed (referred) to by v, v points
 (refers) to a public base class subobject of a C object, and if only
 one object of type C is derived from the subobject pointed (referred)
 to by v the result points (refers) to that C object.

 In this case, HINT >= 0 or -3.  */
  if (hint >= 0 || hint == -3)
Should that include the hint == -1 case too (so effectively if (hint != -2)
or is -1 not relevant to that block.

> > know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
> > to figure out what needs to change there.  It is hint -2 that
> > fails, not hint -1.
> 
> Yes, this is a -2 case because C does not derive from B.
> 
> How does cxx_eval_dynamic_cast_fn fail in this case?  From looking at the
> function it seems like it ought to work.

I'll study it in detail tomorrow.

Jakub

Re: [PATCH] libstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]

2025-06-24 Thread Jonathan Wakely

On Tue, 24 Jun 2025 at 03:20, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> -- >8 --
>
> When checking __is_complete_or_unbounded on a reference to incomplete
> type, we overeagerly try to instantiate/complete the referenced type
> which besides being unnecessary may also produce a -Wsfinae-incomplete
> warning (added in r16-1527) if the referenced type is later defined.
>
> This patch fixes this by effectively restricting the sizeof check to
> object (except unknown-bound array) types.  In passing simplify the
> implementation by using is_object instead of is_function/reference/void.
>
> PR libstdc++/120717
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (__is_complete_or_unbounded): Don't
> check sizeof on a reference or unbounded array type.  Simplify
> using is_object.  Correct formatting.
> * testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.
> ---
>  libstdc++-v3/include/std/type_traits  | 34 +--
>  .../is_complete_or_unbounded/120717.cc| 20 +++
>  2 files changed, 37 insertions(+), 17 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index abff9f880001..28960befd2c7 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -280,11 +280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>// Forward declarations
>template
> -struct is_reference;
> -  template
> -struct is_function;
> -  template
> -struct is_void;
> +struct is_object;
>template
>  struct remove_cv;
>template
> @@ -297,18 +293,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Helper functions that return false_type for incomplete classes,
>// incomplete unions and arrays of known bound from those.
>
> -  template 
> -constexpr true_type __is_complete_or_unbounded(__type_identity<_Tp>)
> -{ return {}; }
> -
> -  template  -  typename _NestedType = typename _TypeIdentity::type>
> -constexpr typename __or_<
> -  is_reference<_NestedType>,
> -  is_function<_NestedType>,
> -  is_void<_NestedType>,
> -  __is_array_unknown_bounds<_NestedType>
> ->::type __is_complete_or_unbounded(_TypeIdentity)
> +  // More specialized overload for complete object types.
> +  template +  typename = __enable_if_t>,
> +  
> __is_array_unknown_bounds<_Tp>>::value>,

Maybe it's because I'm congested and my head feels like it's full of
potatoes, but the double negative is confusing for me.

Would __and_, __not_<__is_array_unknown_bounds> work?

We could even name that:

// An object type which is not an unbounded array.
// It might still be an incomplete type, but if this is false_type
// then we can be certain it's not a complete object type.
template
  using __maybe_complete_object_type = ...;


> +  size_t = sizeof(_Tp)>
> +constexpr true_type
> +__is_complete_or_unbounded(__type_identity<_Tp>)
> +{ return {}; };
> +
> +  // Less specialized overload for reference and unknown-bound array types, 
> and
> +  // incomplete types.
> +  template +  typename _NestedType = typename _TypeIdentity::type>
> +constexpr typename __or_<__not_>,
> +__is_array_unknown_bounds<_NestedType>>::type

Then this would be __not_<__maybe_complete_object_type<_NestedType>>
but maybe that's as bad as the double negative above.


> +__is_complete_or_unbounded(_TypeIdentity)
>  { return {}; }
>
>// __remove_cv_t (std::remove_cv_t for C++11).
> diff --git 
> a/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc 
> b/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc
> new file mode 100644
> index ..31fdf8fe9227
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc
> @@ -0,0 +1,20 @@
> +// PR libstdc++/120717
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wsfinae-incomplete" }
> +
> +#include 
> +
> +// Verify __is_complete_or_unbounded doesn't try to instantiate the 
> underlying
> +// type of a reference or array of unknown bound.
> +template struct A { static_assert(false, "do not instantiate"); };
> +static_assert(std::__is_complete_or_unbounded(std::__type_identity&>{}),
>  "");
> +static_assert(std::__is_complete_or_unbounded(std::__type_identity&&>{}),
>  "");
> +static_assert(std::__is_complete_or_unbounded(std::__type_identity[]>{}),
>  "");
> +
> +// Verify __is_complete_or_unbounded doesn't produce -Wsfinae-incomplete
> +// warnings.
> +struct B;
> +static_assert(std::__is_complete_or_unbounded(std::__type_identity{}), 
> "");
> +static_assert(std::__is_complete_or_unbounded(std::__type_identity{}), 
> "");
> +static_assert(std::__is_complete_or_unbounded(std::__t

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >> >
> >> >> Tamar Christina  writes:
> >> >> > store_bit_field_1 has an optimization where if a target is not a 
> >> >> > memory operand
> >> >> > and the entire value is being set from something larger we can just 
> >> >> > wrap a
> >> >> > subreg around the source and emit a move.
> >> >> >
> >> >> > For vector constructors this is however problematic because the 
> >> >> > subreg means
> >> >> > that the expansion of the constructor won't happen through vec_init 
> >> >> > anymore.
> >> >> >
> >> >> > Complicated constructors which aren't natively supported by targets 
> >> >> > then ICE as
> >> >> > they wouldn't have been expanded so recog fails.
> >> >> >
> >> >> > This patch blocks the optimization on non-constant vector 
> >> >> > constructors. Or non-uniform
> >> >> > non-constant vectors. I allowed constant vectors because if I read 
> >> >> > the code right
> >> >> > simplify-rtx should be able to perform the simplification of pulling 
> >> >> > out the element
> >> >> > or merging the constant values.  There are several testcases in 
> >> >> > aarch64-sve-pcs.exp
> >> >> > that test this as well. I allowed uniform non-constant vectors 
> >> >> > because they
> >> >> > would be folded into a vec_select later on.
> >> >> >
> >> >> > Note that codegen is quite horrible, for what should only be an lsr.  
> >> >> > But I'll
> >> >> > address that separately so that this patch is backportable.
> >> >> >
> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >> >> > -m32, -m64 and no issues.
> >> >> >
> >> >> > Ok for master? and GCC 15, 14, 13?
> >> >> 
> >> >> I was discussing this Alex off-list last week, and the fix we talked
> >> >> about there was:
> >> >> 
> >> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
> >> >> index 7799a98053b..8b138f54f75 100644
> >> >> --- a/gcc/explow.cc
> >> >> +++ b/gcc/explow.cc
> >> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> >> >>   machine_mode innermode, poly_uint64 byte)
> >> >>  {
> >> >>rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> >> >> -  if (x)
> >> >> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x
> >> >>  return x;
> >> >>  
> >> >>auto *start = get_last_insn ();
> >> >> 
> >> >> The justification is that force_subreg is somewhat like a "subreg
> >> >> version of force_operand", and so should try to avoid returning
> >> >> subregs that force_operand would have replaced.  The force_operand
> >> >> code I mean is:
> >> >
> >> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
> >> > subreg operands, only registers (and memory) are.  But isn't this
> >> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
> >> > of a non-REG/MEM?
> >> 
> >> I don't think the documentation is correct/up-to-date.  subreg is
> >> de facto used as a general operation, and for example there are
> >> patterns like:
> >> 
> >> (define_insn ""
> >>   [(set (match_operand:QI 0 "general_operand_dst" 
> >> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
> >> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
> >> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
> >> (const_int 16)) 3))
> >>(clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
> >>(clobber (reg:CC CC_REG))]
> >>   ""
> >>   "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
> >>   [(set_attr "length" "10")])
> >
> > I see.  Is the subreg for such define_insn generated by the middle-end
> > though?
> 
> I assume it was written to match something that combine could generate.
> Whether it still does in another question.
> 
> >> (from h8300).  This is also why simplify_gen_subreg has:
> >> 
> >>   if (GET_CODE (op) == SUBREG
> >>   || GET_CODE (op) == CONCAT
> >>   || GET_MODE (op) == VOIDmode)
> >> return NULL_RTX;
> >> 
> >>   if (MODE_COMPOSITE_P (outermode)
> >>   && (CONST_SCALAR_INT_P (op)
> >>  || CONST_DOUBLE_AS_FLOAT_P (op)
> >>  || CONST_FIXED_P (op)
> >>  || GET_CODE (op) == CONST_VECTOR))
> >> return NULL_RTX;
> >> 
> >> rather than the !REG_P (op) && !MEM_P (op) that the documentation
> >> would imply.
> >
> > So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
> > we don't seem to ever legitmize constants wrapped in a SUBREG, so
> > we shouldn't generate a SUBREG of a constant (in the middle-end)?
> 
> Hmm, yeah, maybe.  I'd originally rejected that because I assumed
> the MODE_COMPOSITE_P was there for a reason.  But looking at the
> history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f,
> where the reason given was:
> 
> As for the simplify_gen_subreg change, I think it would be desirable
> to just avoid creating SUBREGs of constants on all tar

Re: [PATCH] RISC-V: Refactor the function bitmap_union_of_preds_with_entry

2025-06-24 Thread Robin Dapp


Hi Ma Jin,

thanks for looking into this, it has been on my todo list with very low 
priority since the vsetvl rewrite.



+  /* Handle case with no predecessors (including ENTRY block).  */
+  if (EDGE_COUNT (b->preds) == 0)
 {
-  e = EDGE_PRED (b, ix);
-  bitmap_copy (dst, src[e->src->index]);
-  break;
+  bitmap_clear (dst);
+  return;


This is ok.


+  /* Initialize with first predecessor's bitmap.  */
+  edge first_pred = EDGE_PRED (b, 0);
+  bitmap_copy (dst, src[first_pred->src->index]);
+
+  /* Union remaining predecessors' bitmaps.  */
+  for (unsigned ix = 1; ix < EDGE_COUNT (b->preds); ix++)
+{
+  edge e = EDGE_PRED (b, ix);
+  const sbitmap pred_src = src[e->src->index];
+
+  /* Perform bitmap OR operation element-wise.  */
+  for (unsigned i = 0; i < dst->size; i++)
+   dst->elms[i] |= pred_src->elms[i];
+}


To my taste this could be simplified further like

 FOR_EACH_EDGE (e, ei, b->preds)
   {
 if (ei.index == 0)
{
  bitmap_copy (dst, src[e->src->index]);
  continue;
}

 bitmap_ior (dst, dst, src[e->src->index]);
   }

Does that work as well?

--
Regards
Robin

Re: [PATCH] RISC-V: Add Profiles RVA/B23S64 support.

2025-06-24 Thread Dongsheng Song

On Tue, Jun 24, 2025 at 5:39 PM Jiawei  wrote:
>
> This patch adds support for the RISC-V Profiles RVA23S64 and RVB23S64.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: New Profiles.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-rva23s.c: New test.
> * gcc.target/riscv/arch-rvb23s.c: New test.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc  | 18 +-
>  gcc/testsuite/gcc.target/riscv/arch-rva23s.c | 14 ++
>  gcc/testsuite/gcc.target/riscv/arch-rvb23s.c | 12 
>  3 files changed, 43 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-rva23s.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-rvb23s.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 3c25848ccd3..43a5ae5f449 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -295,6 +295,15 @@ static const riscv_profiles riscv_profiles_table[] =
> "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb"
> "_zfa_zawrs_supm"},
>
> +  /* RVA23S contains all mandatory base ISA for RVA23U64 and the privileged
> + extensions as mandatory extensions.  */
> +  {"rva23s64", "rv64imafdcbv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
> +   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
> +   "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb"
> +   "_zfa_zawrs_svbare_svade_ssccptr_sstvecd_sstvala_sscounterenw_svpbmt"
> +   "_svinval_svnapot_sstc_sscofpmf_ssnpm_ssu64xl_sha"
> +  },
> +
>/* RVB23 contains all mandatory base ISA for RVA22U64 and the new extension
>   'zihintntl,zicond,zimop,zcmop,zfa,zawrs' as mandatory
>   extensions.  */
> @@ -303,7 +312,14 @@ static const riscv_profiles riscv_profiles_table[] =
> "_zicboz_zfhmin_zkt_zihintntl_zicond_zimop_zcmop_zcb"
> "_zfa_zawrs"},
>
> -  /* Currently we do not define S/M mode Profiles in gcc part.  */
> +  /* RVB23S contains all mandatory base ISA for RVB23U64 and the privileged
> + extensions as mandatory extensions.  */
> +  {"rvb23s64", "rv64imafdcb_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
> +   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
> +   "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb"
> +   "_zfa_zawrs_svbare_svade_ssccptr_sstvecd_sstvala_sscounterenw_svpbmt"
> +   "_svinval_svnapot_sstc_sscofpmf_ssu64xl"
> +  },
>
>/* Terminate the list.  */
>{NULL, NULL}
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-rva23s.c 
> b/gcc/testsuite/gcc.target/riscv/arch-rva23s.c
> new file mode 100644
> index 000..49b406caa1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-rva23s.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rva23s64 -mabi=lp64d" } */
> +
> +void foo(){}
> +
> +/* { dg-final { scan-assembler-times ".attribute arch, 
> \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
> +"_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0"
> +"_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0"
> +"_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0"
> +"_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0"
> +"_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_sha1p0_shcounterenw1p0"
> +"_shgatpa1p0_shtvala1p0_shvsatpa1p0_shvstvala1p0_shvstvecd1p0_ssccptr1p0_sscofpmf1p0"
> +"_sscounterenw1p0_ssnpm1p0_ssstateen1p0_sstc1p0_sstvala1p0_sstvecd1p0_ssu64xl1p0"
> +"_svade1p0_svbare1p0_svinval1p0_svnapot1p0_svpbmt1p0\" 1} } */
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-rvb23s.c 
> b/gcc/testsuite/gcc.target/riscv/arch-rvb23s.c
> new file mode 100644
> index 000..fdf2625b56d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-rvb23s.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rva23s64 -mabi=lp64d" } */

rva23s64 -> rvb23s64

> +
> +void foo(){}
> +
> +/* { dg-final { scan-assembler-times ".attribute arch, 
> \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
> +"_b1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0"
> +"_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_zmmul1p0"
> +"_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0"
> +"_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zvfhmin1p0_zvkb1p0_zvkt1p0"
> +"_zvl32b1p0_ssccptr1p0_sscofpmf1p0_sscounterenw1p0_sstc1p0_sstvala1p0_sstvecd1p0"
> +"_ssu64xl1p0_svade1p0_svbare1p0_svinval1p0_svnapot1p0_svpbmt1p0\" 1} } */
> --
> 2.43.0
>

Re: [PATCH] libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

2025-06-24 Thread Jonathan Wakely

On Tue, 24 Jun 2025 at 12:19, Tomasz Kaminski  wrote:
>
>
>
> On Tue, Jun 24, 2025 at 9:38 AM Tomasz Kamiński  wrote:
>>
>> For month_day we incorrectly reported day information to be available, which 
>> lead
>> to format_error being thrown from the call to formatter::format at runtime, 
>> instead
>> of making call to format ill-formed.
>>
>> The included test cover most of the combinations of _ChronoParts and format
>> specifiers.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/chrono_io.h
>> (formatter::parse): Call _M_parse with
>> only Month being available.
>> * testsuite/std/time/format/data_not_present_neg.cc: New test.
>> ---
>> I want to merged the data_not_present_neg.cc, as my type erasing 
>> implementation
>> relies on detection durin parsing.
>> Testing on x86_64-linux. std/time/format* tests passed.
>> OK for trunk when all test passes? And chrono_io only change for v15?
>>
>>  libstdc++-v3/include/bits/chrono_io.h |   3 +-
>>  .../std/time/format/data_not_present_neg.cc   | 163 ++
>>  2 files changed, 164 insertions(+), 2 deletions(-)
>>  create mode 100644 
>> libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
>>
>> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
>> b/libstdc++-v3/include/bits/chrono_io.h
>> index abbf4efcc3b..4eb00f4932d 100644
>> --- a/libstdc++-v3/include/bits/chrono_io.h
>> +++ b/libstdc++-v3/include/bits/chrono_io.h
>> @@ -2199,8 +2199,7 @@ namespace __format
>>constexpr typename basic_format_parse_context<_CharT>::iterator
>>parse(basic_format_parse_context<_CharT>& __pc)
>>{
>> -   return _M_f._M_parse(__pc, __format::_Month|__format::_Day,
>> -__defSpec);
>> +   return _M_f._M_parse(__pc, __format::_Month, __defSpec);
>>}
>>
>>template
>> diff --git a/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc 
>> b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
>> new file mode 100644
>> index 000..bcc943b86ad
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
>> @@ -0,0 +1,163 @@
>> +// { dg-do compile { target c++20 } }
>> +
>> +#include 
>> +#include 
>> +
>> +using namespace std::chrono;
>> +
>> +auto d1 = std::format("{:%w}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d2 = std::format("{:%m}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d3 = std::format("{:%y}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d4 = std::format("{:%F}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d5 = std::format("{:%T}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d6 = std::format("{:%Q}", 10d); // { dg-error "call to consteval 
>> function" }
>> +auto d7 = std::format("{:%Z}", 10d); // { dg-error "call to consteval 
>> function" }
>> +
>> +auto w1 = std::format("{:%d}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w2 = std::format("{:%m}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w3 = std::format("{:%y}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w4 = std::format("{:%F}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w5 = std::format("{:%T}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w6 = std::format("{:%Q}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +auto w7 = std::format("{:%Z}", Thursday); // { dg-error "call to consteval 
>> function" }
>> +
>> +auto wi1 = std::format("{:%d}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi2 = std::format("{:%m}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi3 = std::format("{:%y}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi4 = std::format("{:%F}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi5 = std::format("{:%T}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi6 = std::format("{:%Q}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +auto wi7 = std::format("{:%Z}", Thursday[2]); // { dg-error "call to 
>> consteval function" }
>> +
>> +auto wl1 = std::format("{:%d}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl2 = std::format("{:%m}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl3 = std::format("{:%y}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl4 = std::format("{:%F}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl5 = std::format("{:%T}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl6 = std::format("{:%Q}", Thursday[last]); // { dg-error "call to 
>> consteval function" }
>> +auto wl7 = std::format("{:%Z}", Thursday[last]); // { dg-error "call to 
>> consteval f

[PATCH v6 1/9] AArch64: place branch instruction rules together

2025-06-24 Thread Karl Meakin

The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e11e13033d2..fcc24e300e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -682,6 +682,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -700,6 +704,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -739,6 +749,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

[PATCH v6 2/9] AArch64: reformat branch instruction rules

2025-06-24 Thread Karl Meakin

Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 77 +--
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fcc24e300e6..d059a6362d5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -714,7 +714,7 @@ (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
   "
@@ -725,34 +725,34 @@ (define_expand "cbranch4"
 )
 
 (define_expand "cbranch4"
-  [(set (pc) (if_then_else
-   (match_operator 0 "aarch64_comparison_operator"
-[(match_operand:GPF_F16 1 "register_operand")
- (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
-   (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPF_F16 1 "register_operand")
+(match_operand:GPF_F16 2 
"aarch64_fp_compare_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "
+  {
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
-  "
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
@@ -789,10 +789,9 @@ (define_insn "condjump"
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2))
   (pc)))]
   "!aarch64_move_imm (INTVAL (operands[1]), mode)
&& !aarch64_plus_operand (operands[1], mode)
@@ -816,8 +815,8 @@ (define_insn_and_split "*compare_condjump"
 
 (define_insn "aarch64_cb1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
-   (const_int 0))
-  (label_ref (match_operand 1 "" ""))
+   (const_int 0))
+  (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
@@ -841,8 +840,8 @@ (define_insn "aarch64_cb1"
 
 (define_insn "*cb1"
   [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
-(const_int 0))
-  (label_ref (match_operand 1 "" ""))
+(const_int 0))
+  (label_ref (match_operand 1))
   (pc)))
(clobber (reg:CC CC_REGNUM))]
   "!aarch64_track_speculation"
@@ -883,11 +882,11 @@ (define_insn "*cb1"
 ;; ---
 
 (define_expand "tbranch_3"
-  [(set (pc) (if_then_else
- (EQL (match_operand

[PATCH v6 8/9] AArch64: rules for CMPBR instructions

2025-06-24 Thread Karl Meakin

Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch4): Rename to ...
(cbranch4): ...here, and emit CMPBR if possible.
(cbranch4): New expand rule.
(aarch64_cb): New insn rule.
(aarch64_cb): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.
* config/aarch64/predicates.md (const_0_to_63_operand): New predicate.
(aarch64_cb_immediate): Likewise.
(aarch64_cb_operand): Likewise.
(aarch64_cb_short_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c:
---
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64.cc|  33 ++
 gcc/config/aarch64/aarch64.md|  89 +++-
 gcc/config/aarch64/constraints.md|  18 +
 gcc/config/aarch64/iterators.md  |  19 +
 gcc/config/aarch64/predicates.md |  15 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 586 ---
 7 files changed, 376 insertions(+), 386 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 31f2f5b8bd2..0f104d0641b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, 
vec,
 unsigned int, tree, unsigned int,
 tree *);
 
+bool aarch64_cb_rhs (rtx op, rtx rhs);
+
 namespace aarch64 {
   void report_non_ice (location_t, tree, unsigned int);
   void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 667e42ba401..3dc139e9a72 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern)
   gcc_unreachable ();
 }
 
+/* Return true if rhs is an operand suitable for a CB (immediate)
+   instruction. */
+bool
+aarch64_cb_rhs (rtx op, rtx rhs)
+{
+  if (!CONST_INT_P (rhs))
+return REG_P (rhs);
+
+  HOST_WIDE_INT rhs_val = INTVAL (rhs);
+
+  switch (GET_CODE (op))
+{
+case EQ:
+case NE:
+case GT:
+case GTU:
+case LT:
+case LTU:
+  return IN_RANGE (rhs_val, 0, 63);
+
+case GE:  /* CBGE:   signed greater than or equal */
+case GEU: /* CBHS: unsigned greater than or equal */
+  return IN_RANGE (rhs_val, 1, 64);
+
+case LE:  /* CBLE:   signed less than or equal */
+case LEU: /* CBLS: unsigned less than or equal */
+  return IN_RANGE (rhs_val, -1, 62);
+
+default:
+  return false;
+}
+}
+
 /* Return the location of a piece that is known to be passed or returned
in registers.  FIRST_ZR is the first unused vector argument register
and FIRST_PR is the first unused predicate argument register.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0a378ab377d..23bce55f620 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -717,6 +717,10 @@ (define_constants
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
@@ -724,18 +728,35 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_rhs(operands[0], operands[2]))
+{
+// Fall-through to `aarch64_cbranch`
+}
+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 
"aarch64_

[PATCH v6 3/9] AArch64: rename branch instruction rules

2025-06-24 Thread Karl Meakin

Give the `define_insn` rules used in lowering `cbranch4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump): Rename to ...
(*aarch64_bcond_wide_imm): ...here.
(aarch64_cb): Rename to ...
(aarch64_cbz1): ...here.
(*cb1): Rename to ...
(*aarch64_tbz1): ...here.
(@aarch64_tb): Rename to ...
(@aarch64_tbz): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/aarch64/aarch64-sme.md  |  2 +-
 gcc/config/aarch64/aarch64.cc  |  4 ++--
 gcc/config/aarch64/aarch64.md  | 21 -
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e771defc73f..33839f2fec7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3966,7 +3966,7 @@ (define_expand "cbranch4"
 
   rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
   rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[3]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index f7958c90eae..b8bb4cc14b6 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -391,7 +391,7 @@ (define_insn_and_split "aarch64_restore_za"
 auto label = gen_label_rtx ();
 auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
 emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
-auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+auto jump = emit_likely_jump_insn (gen_aarch64_cbznedi1 (tpidr2, label));
 JUMP_LABEL (jump) = label;
 
 aarch64_restore_za (operands[0]);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index abbb97768f5..667e42ba401 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2884,9 +2884,9 @@ aarch64_gen_test_and_branch (rtx_code code, rtx x, int 
bitnum,
   emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask));
   rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM);
   rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx);
-  return gen_condjump (x, cc_reg, label);
+  return gen_aarch64_bcond (x, cc_reg, label);
 }
-  return gen_aarch64_tb (code, mode, mode,
+  return gen_aarch64_tbz (code, mode, mode,
 x, gen_int_mode (bitnum, mode), label);
 }
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d059a6362d5..d362a4c6f24 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -748,7 +748,8 @@ (define_expand "cbranchcc4"
   ""
 )
 
-(define_insn "condjump"
+;; Emit `B`, assuming that the condition is already in the CC register.
+(define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
@@ -788,7 +789,7 @@ (define_insn "condjump"
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
-(define_insn_and_split "*compare_condjump"
+(define_insn_and_split "*aarch64_bcond_wide_imm"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(match_operand:GPI 1 "aarch64_imm24" "n"))
   (label_ref:P (match_operand 2))
@@ -808,12 +809,13 @@ (define_insn_and_split "*compare_condjump"
 rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
 rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
  cc_reg, const0_rtx);
-emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[2]));
 DONE;
   }
 )
 
-(define_insn "aarch64_cb1"
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
@@ -838,7 +840,8 @@ (define_insn "aarch64_cb1"
  (const_int 1)))]
 )
 
-(define_insn "*cb1"
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
   [(set (pc) (if_then

[PATCH v6 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-06-24 Thread Karl Meakin

Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 159 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  32 ++---
 2 files changed, 101 insertions(+), 90 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 23bce55f620..dd58e88fa2f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -726,6 +726,17 @@ (define_constants
 
 ;; ---
 ;; Conditional jumps
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an immediate in the range 0-63 
are
+;;  handled by CB (immediate).
+;; 4) Otherwise, emit a CMP+B sequence.
 ;; ---
 
 (define_expand "cbranch4"
@@ -783,6 +794,80 @@ (define_expand "cbranchcc4"
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
 ;; Emit a `CB (register)` or `CB (immediate)` instruction.
 ;; Only immediates in the range 0-63 are supported.
 ;; Comparisons against immediates outside this range fall back to
@@ -909,80 +994,6 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
   }
 )
 
-;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
-(define_insn "aarch64_cbz1"
-  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
-   (const_int 0))
-

[committed] i386: Convert LEA stack adjust insn to SUB when FLAGS_REG is dead

2025-06-24 Thread Uros Bizjak

ADD/SUB is faster than LEA for most processors. Also, there are
several peephole2 patterns available that convert prologue esp
subtractions to pushes (at the end of i386.md). These process only
patterns with flags reg clobber, so they are ineffective
with clobber-less stack ptr adjustments, introduced by r16-1551
("x86: Enable separate shrink wrapping").

Introduce a peephole2 pattern that adds a clobber to a clobber-less
stack ptr adjustments when FLAGS_REG is dead.

gcc/ChangeLog:

* config/i386/i386.md
(@pro_epilogue_adjust_stack_add_nocc): Add type attribute.
(pro_epilogue_adjust_stack_add_nocc peephole2 pattern):
Convert pro_epilogue_adjust_stack_add_nocc variant to
pro_epilogue_adjust_stack_add when FLAGS_REG is dead.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 423ef48e518..41a86544bbf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -27449,7 +27449,7 @@ (define_insn "@pro_epilogue_adjust_stack_add_"
(cond [(and (eq_attr "alternative" "0")
(not (match_test "TARGET_OPT_AGU")))
 (const_string "alu")
-  (match_operand: 2 "const0_operand")
+  (match_operand 2 "const0_operand")
 (const_string "imov")
  ]
  (const_string "lea")))
@@ -27470,7 +27470,7 @@ (define_insn "@pro_epilogue_adjust_stack_add_nocc"
(clobber (mem:BLK (scratch)))]
   ""
 {
-  if (operands[2] == CONST0_RTX (mode))
+  if (get_attr_type (insn) == TYPE_IMOV)
 return "mov{}\t{%1, %0|%0, %1}";
   else
 {
@@ -27478,13 +27478,31 @@ (define_insn 
"@pro_epilogue_adjust_stack_add_nocc"
   return "lea{}\t{%E2, %0|%0, %E2}";
 }
 }
-  [(set (attr "length_immediate")
+  [(set (attr "type")
+   (cond [(match_operand 2 "const0_operand")
+(const_string "imov")
+ ]
+ (const_string "lea")))
+   (set (attr "length_immediate")
(cond [(eq_attr "type" "imov")
 (const_string "0")
  ]
  (const_string "*")))
(set_attr "mode" "")])
 
+(define_peephole2
+  [(parallel
+ [(set (match_operand:P 0 "register_operand")
+  (plus:P (match_dup 0)
+  (match_operand:P 1 "")))
+  (clobber (mem:BLK (scratch)))])]
+  "peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel
+ [(set (match_dup 0)
+  (plus:P (match_dup 0) (match_dup 1)))
+  (clobber (reg:CC FLAGS_REG))
+  (clobber (mem:BLK (scratch)))])])
+
 (define_insn "@pro_epilogue_adjust_stack_sub_"
   [(set (match_operand:P 0 "register_operand" "=r")
(minus:P (match_operand:P 1 "register_operand" "0")

[RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jakub Jelinek

Hi!

The following patch attempts to implement the C++26
P3533R2 - constexpr virtual inheritance
paper.
The changes include not rejecting it for C++26, tweaking the
error wording to show that it is valid in C++26, adjusting
synthesized_method_walk not to make synthetized cdtors non-constexpr
just because of virtual base classes in C++26 and various tweaks in
constexpr.cc so that it can deal with the expressions used for
virtual base member accesses or cdtor calls which need __in_chrg
and/or __vtt_parm arguments to be passed in some cases implicitly when
they aren't passed explicitly.

There are two places where I'm not sure what to do:

1) one can be seen on the constexpr-ice21.C testcase:
struct NoMut1 { int a, b; };
struct NoMut3 : virtual NoMut1 { constexpr NoMut3(int a, int b) : NoMut1{a, b} 
{} };
void mutable_subobjects() {
  constexpr NoMut3 nm3 = {1, 2};
  struct A {
void f() {
  static_assert(nm3.a == 1, ""); // ERROR here: "local variable"
}
  };
}
The two other errors on the testcase are expectedly gone with C++26,
but the last one remains.  The problem is that when parsing nm3.a
inside of mutable_subobjects()::A::f()
build_class_member_access_expr calls build_base_path which calls
cp_build_addr_expr and that makes nm3 odr-used.  I must say I have
no idea whether nm3 ought to be odr-used or not just because of nm3.a
use and if not, how that should be changed.  Plus whether the fact
if nm3.a is odr-use or not is somehow affected by the (so far unimplemented)
part of P2686R4 paper.

2) another one can be seen on the constexpr-dynamic10.C testcase
struct C { virtual void a(); };
struct B { virtual void b(); };
struct A : virtual B, C { virtual void c(); };
constexpr A a;
constexpr bool b1 = (dynamic_cast((B&)a), false);   // ERROR here 
"reference 'dynamic_cast' failed"
I think the error is incorrect here, because
struct C { virtual void a(); };
struct B { virtual void b(); };
struct A : virtual B, C { virtual void c(); };
A a;
bool b1 = (dynamic_cast((B&)a), false);
int
main ()
{
  C &c = dynamic_cast((B&)a);
  C &c2 = dynamic_cast(a);
}
works at runtime.  In the patch I've adjusted the function
comment of cxx_eval_dynamic_cast_fn because with virtual bases
I believe hint -1 might be possible, though I'm afraid I don't
know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
to figure out what needs to change there.  It is hint -2 that
fails, not hint -1.

In any case, this has been successfully bootstrapped/regtested on
x86_64-linux and i686-linux.

2025-06-24  Jakub Jelinek  

PR c++/120777
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_constexpr_virtual_inheritance=202506L for C++26.
gcc/cp/
* constexpr.cc: Implement C++26 P3533R2 - constexpr virtual
inheritance.
(is_valid_constexpr_fn): Don't reject constexpr cdtors in classes
with virtual bases for C++26, adjust error wording.
(cxx_bind_parameters_in_call): Add ORIG_FUN argument, add
values for __in_chrg and __vtt_parm arguments when needed.
(cxx_eval_dynamic_cast_fn): Adjust function comment, HINT -1
should be possible.
(cxx_eval_call_expression): Add orig_fun variable, set it to
fun before looking through clones, pass it to
cxx_bind_parameters_in_call.
(reduced_constant_expression_p): Add SZ argument, pass DECL_SIZE
of FIELD_DECL e.index to recursive calls and don't return false
if SZ is non-NULL and there are unfilled fields with bit position
at or above SZ.
(cxx_fold_indirect_ref_1): Handle reading of vtables using
ptrdiff_t dynamic type instead of some pointer type.  Set el_sz
to DECL_SIZE_UNIT value rather than TYPE_SIZE_UNIT of
DECL_FIELD_IS_BASE fields in classes with virtual bases.
(cxx_fold_indirect_ref): In canonicalize_obj_off lambda look
through COMPONENT_REFs with DECL_FIELD_IS_BASE in classes with
virtual bases and adjust off correspondingly.
* cp-tree.h (reduced_constant_expression_p): Add another
tree argument defaulted to NULL_TREE.
* method.cc (synthesized_method_walk): Don't clear *constexpr_p
if there are virtual bases for C++26.
gcc/testsuite/
* g++.dg/cpp26/constexpr-virt-inherit1.C: New test.
* g++.dg/cpp26/constexpr-virt-inherit2.C: New test.
* g++.dg/cpp26/feat-cxx26.C: Add __cpp_constexpr_virtual_inheritance
tersts.
* g++.dg/cpp2a/constexpr-dtor16.C: Don't expect errors for C++26.
* g++.dg/cpp2a/constexpr-dynamic10.C: Don't expect 2 errors for
C++26, temporarily accept one invalid one.
* g++.dg/cpp0x/constexpr-ice21.C: Don't expect 2 errors for C++26.
* g++.dg/cpp0x/constexpr-ice4.C: Don't expect any errors for C++26.

--- gcc/c-family/c-cppbuiltin.cc.jj 2025-06-23 15:57:14.746215440 +0200
+++ gcc/c-family/c-cppbuiltin.cc2025-06-23 18:35:02.776544246 +0200
@@ -1094,6 +1094

Re: [PATCH] libstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]

2025-06-24 Thread Jonathan Wakely

On Tue, 24 Jun 2025 at 13:53, Patrick Palka  wrote:
>
> On Tue, 24 Jun 2025, Jonathan Wakely wrote:
>
> > On Tue, 24 Jun 2025 at 03:20, Patrick Palka  wrote:
> > >
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > >
> > > -- >8 --
> > >
> > > When checking __is_complete_or_unbounded on a reference to incomplete
> > > type, we overeagerly try to instantiate/complete the referenced type
> > > which besides being unnecessary may also produce a -Wsfinae-incomplete
> > > warning (added in r16-1527) if the referenced type is later defined.
> > >
> > > This patch fixes this by effectively restricting the sizeof check to
> > > object (except unknown-bound array) types.  In passing simplify the
> > > implementation by using is_object instead of is_function/reference/void.
> > >
> > > PR libstdc++/120717
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/std/type_traits (__is_complete_or_unbounded): Don't
> > > check sizeof on a reference or unbounded array type.  Simplify
> > > using is_object.  Correct formatting.
> > > * testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.
> > > ---
> > >  libstdc++-v3/include/std/type_traits  | 34 +--
> > >  .../is_complete_or_unbounded/120717.cc| 20 +++
> > >  2 files changed, 37 insertions(+), 17 deletions(-)
> > >  create mode 100644 
> > > libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc
> > >
> > > diff --git a/libstdc++-v3/include/std/type_traits 
> > > b/libstdc++-v3/include/std/type_traits
> > > index abff9f880001..28960befd2c7 100644
> > > --- a/libstdc++-v3/include/std/type_traits
> > > +++ b/libstdc++-v3/include/std/type_traits
> > > @@ -280,11 +280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >
> > >// Forward declarations
> > >template
> > > -struct is_reference;
> > > -  template
> > > -struct is_function;
> > > -  template
> > > -struct is_void;
> > > +struct is_object;
> > >template
> > >  struct remove_cv;
> > >template
> > > @@ -297,18 +293,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >// Helper functions that return false_type for incomplete classes,
> > >// incomplete unions and arrays of known bound from those.
> > >
> > > -  template 
> > > -constexpr true_type __is_complete_or_unbounded(__type_identity<_Tp>)
> > > -{ return {}; }
> > > -
> > > -  template  > > -  typename _NestedType = typename _TypeIdentity::type>
> > > -constexpr typename __or_<
> > > -  is_reference<_NestedType>,
> > > -  is_function<_NestedType>,
> > > -  is_void<_NestedType>,
> > > -  __is_array_unknown_bounds<_NestedType>
> > > ->::type __is_complete_or_unbounded(_TypeIdentity)
> > > +  // More specialized overload for complete object types.
> > > +  template > > +  typename = __enable_if_t>,
> > > +  
> > > __is_array_unknown_bounds<_Tp>>::value>,
> >
> > Maybe it's because I'm congested and my head feels like it's full of
> > potatoes, but the double negative is confusing for me.
> >
> > Would __and_, __not_<__is_array_unknown_bounds> work?
> >
> > We could even name that:
> >
> > // An object type which is not an unbounded array.
> > // It might still be an incomplete type, but if this is false_type
> > // then we can be certain it's not a complete object type.
> > template
> >   using __maybe_complete_object_type = ...;
>
> I like it.
>
> >
> >
> > > +  size_t = sizeof(_Tp)>
> > > +constexpr true_type
> > > +__is_complete_or_unbounded(__type_identity<_Tp>)
> > > +{ return {}; };
> > > +
> > > +  // Less specialized overload for reference and unknown-bound array 
> > > types, and
> > > +  // incomplete types.
> > > +  template > > +  typename _NestedType = typename _TypeIdentity::type>
> > > +constexpr typename __or_<__not_>,
> > > +__is_array_unknown_bounds<_NestedType>>::type
> >
> > Then this would be __not_<__maybe_complete_object_type<_NestedType>>
> > but maybe that's as bad as the double negative above.
>
> The new helper definitely makes things clearer to me.  Like so?

Yeah, I find that easier to follow even with a head full of potato ;-)

OK for trunk, thanks.

>
> -- >8 --
>
> Subject: [PATCH] libstdc++: Unnecessary type completion in
>  __is_complete_or_unbounded [PR120717]
>
> When checking __is_complete_or_unbounded on a reference to incomplete
> type, we overeagerly try to instantiate/complete the referenced type
> which besides being unnecessary may also produce a -Wsfinae-incomplete
> warning (added in r16-1527) if the referenced type is later defined.
>
> This patch fixes this by effectively restricting the sizeof check to
> object (except unknown-bound array) types.  In passing simplify the
> implementation by using is_object instead of is_function/reference/void
> and introducing a __maybe_complete_object_type helper.
>
> PR libstdc++/120717

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Tamar Christina  writes:
> > store_bit_field_1 has an optimization where if a target is not a memory 
> > operand
> > and the entire value is being set from something larger we can just wrap a
> > subreg around the source and emit a move.
> >
> > For vector constructors this is however problematic because the subreg means
> > that the expansion of the constructor won't happen through vec_init anymore.
> >
> > Complicated constructors which aren't natively supported by targets then 
> > ICE as
> > they wouldn't have been expanded so recog fails.
> >
> > This patch blocks the optimization on non-constant vector constructors. Or 
> > non-uniform
> > non-constant vectors. I allowed constant vectors because if I read the code 
> > right
> > simplify-rtx should be able to perform the simplification of pulling out 
> > the element
> > or merging the constant values.  There are several testcases in 
> > aarch64-sve-pcs.exp
> > that test this as well. I allowed uniform non-constant vectors because they
> > would be folded into a vec_select later on.
> >
> > Note that codegen is quite horrible, for what should only be an lsr.  But 
> > I'll
> > address that separately so that this patch is backportable.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master? and GCC 15, 14, 13?
> 
> I was discussing this Alex off-list last week, and the fix we talked
> about there was:
> 
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index 7799a98053b..8b138f54f75 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> machine_mode innermode, poly_uint64 byte)
>  {
>rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> -  if (x)
> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x
>  return x;
>  
>auto *start = get_last_insn ();
> 
> The justification is that force_subreg is somewhat like a "subreg
> version of force_operand", and so should try to avoid returning
> subregs that force_operand would have replaced.  The force_operand
> code I mean is:

Yeah, in particular CONSTANT_P isn't sth documented as valid as
subreg operands, only registers (and memory) are.  But isn't this
then a bug in simplify_gen_subreg itself, that it creates a SUBREG
of a non-REG/MEM?

Richard.

>   /* Check for subreg applied to an expression produced by loop optimizer.  */
>   if (code == SUBREG
>   && !REG_P (SUBREG_REG (value))
>   && !MEM_P (SUBREG_REG (value)))
> {
>   value
>   = simplify_gen_subreg (GET_MODE (value),
>  force_reg (GET_MODE (SUBREG_REG (value)),
> force_operand (SUBREG_REG (value),
>NULL_RTX)),
>  GET_MODE (SUBREG_REG (value)),
>  SUBREG_BYTE (value));
>   code = GET_CODE (value);
> }
> 
> Thanks,
> Richard
> 
> > Thanks,
> > Tamar
> >
> >
> > gcc/ChangeLog:
> >
> > PR target/120718
> > * expmed.cc (store_bit_field_1): Only push subreg over uniform vector
> > constructors.
> > (foldable_value_with_subreg): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/120718
> > * gcc.target/aarch64/sve/pr120718.c: New test.
> >
> > ---
> >
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index 
> > be427dca5d9afeed2013954472dde3a5430169e0..a468aa5c0c3f20bd62a7afc1d245d64e87be5396
> >  100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -740,6 +740,28 @@ store_bit_field_using_insv (const extraction_insn 
> > *insv, rtx op0,
> >return false;
> >  }
> >  
> > +/* For non-constant vectors wrapping a subreg around the RTX will not make
> > +   the expression expand properly through vec_init.  For constant vectors
> > +   we can because simplification can just extract the element out by
> > +   by merging the values.  This can be done by simplify-rtx and so the
> > +   subreg will be eliminated.  However poly constants require vec_init as
> > +   they are a runtime value.  So only allow the subreg for simple integer
> > +   or floating point constants.  */
> > +
> > +static bool
> > +foldable_value_with_subreg (rtx value)
> > +{
> > +  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
> > +return true;
> > +
> > +  for (unsigned i = 0; i < const_vector_encoded_nelts (value); i++)
> > +if (!CONST_INT_P (const_vector_elt (value, i))
> > +   && !CONST_DOUBLE_P (const_vector_elt (value, i)))
> > +  return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* A subroutine of store_bit_field, with the same arguments.  Return true
> > if the operation could be implemented.
> >  
> > @@ -795,7 +817,8 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> > poly_uint64 bitnum,
> >/* If the target is a register, o

[PATCH] rtl-ssa: Rewrite process_uses_of_deleted_def [PR120745]

2025-06-24 Thread Richard Sandiford

process_uses_of_deleted_def seems to have been written on the assumption
that non-degenerate phis would be explicitly deleted by an insn_change,
and that the function therefore only needed to delete degenerate phis.
But that was inconsistent with the rest of the code, and wouldn't be
very convenient in any case.

This patch therefore rewrites process_uses_of_deleted_def to handle
general phis.

I'm not aware that this fixes an issues in current code, but it is
needed to enable the rtl-ssa dce work that Ondřej and Honza are
working on.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/120745
* rtl-ssa/changes.cc (process_uses_of_deleted_def): Rewrite to
handle deletions of non-degenerate phis.
---
 gcc/rtl-ssa/changes.cc | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index f7aa6a66cdf..01f9c383b23 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -258,28 +258,40 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
 void
 function_info::process_uses_of_deleted_def (set_info *set)
 {
-  if (!set->has_any_uses ())
-return;
-
-  auto *use = *set->all_uses ().begin ();
-  do
+  // Each member of the worklist is either SET or a dead phi.
+  auto_vec worklist;
+  worklist.quick_push (set);
+  while (!worklist.is_empty ())
 {
-  auto *next_use = use->next_use ();
+  auto *this_set = worklist.pop ();
+  auto *use = this_set->first_use ();
+  if (!use)
+   {
+ if (this_set != set)
+   delete_phi (as_a (this_set));
+ continue;
+   }
   if (use->is_in_phi ())
{
- // This call will not recurse.
- process_uses_of_deleted_def (use->phi ());
- delete_phi (use->phi ());
+ // Removing all uses from the phi ensures that we'll only add
+ // it to the worklist once.
+ auto *phi = use->phi ();
+ for (auto *input : phi->inputs ())
+   {
+ remove_use (input);
+ input->set_def (nullptr);
+   }
+ worklist.safe_push (phi);
}
   else
{
  gcc_assert (use->is_live_out_use ());
  remove_use (use);
}
-  use = next_use;
+  // The phi handling above might have removed multiple uses of this_set.
+  if (this_set->has_any_uses ())
+   worklist.safe_push (this_set);
 }
-  while (use);
-  gcc_assert (!set->has_any_uses ());
 }
 
 // Update the REG_NOTES of INSN, whose pattern has just been changed.
-- 
2.43.0

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-24 Thread Jason Merrill


On 6/24/25 7:22 AM, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement the C++26
P3533R2 - constexpr virtual inheritance
paper.
The changes include not rejecting it for C++26, tweaking the
error wording to show that it is valid in C++26, adjusting
synthesized_method_walk not to make synthetized cdtors non-constexpr
just because of virtual base classes in C++26 and various tweaks in
constexpr.cc so that it can deal with the expressions used for
virtual base member accesses or cdtor calls which need __in_chrg
and/or __vtt_parm arguments to be passed in some cases implicitly when
they aren't passed explicitly.

There are two places where I'm not sure what to do:

1) one can be seen on the constexpr-ice21.C testcase:
struct NoMut1 { int a, b; };
struct NoMut3 : virtual NoMut1 { constexpr NoMut3(int a, int b) : NoMut1{a, b} 
{} };
void mutable_subobjects() {
   constexpr NoMut3 nm3 = {1, 2};
   struct A {
 void f() {
   static_assert(nm3.a == 1, ""); // ERROR here: "local variable"
 }
   };
}
The two other errors on the testcase are expectedly gone with C++26,
but the last one remains.  The problem is that when parsing nm3.a
inside of mutable_subobjects()::A::f()
build_class_member_access_expr calls build_base_path which calls
cp_build_addr_expr and that makes nm3 odr-used.  I must say I have
no idea whether nm3 ought to be odr-used or not just because of nm3.a
use and if not, how that should be changed.


build_simple_base_path is how we avoid this odr-use; seems we also need 
to use it early in the case of (v_binfo && !virtual_access).



Plus whether the fact
if nm3.a is odr-use or not is somehow affected by the (so far unimplemented)
part of P2686R4 paper.

2) another one can be seen on the constexpr-dynamic10.C testcase
struct C { virtual void a(); };
struct B { virtual void b(); };
struct A : virtual B, C { virtual void c(); };
constexpr A a;
constexpr bool b1 = (dynamic_cast((B&)a), false);   // ERROR here "reference 
'dynamic_cast' failed"
I think the error is incorrect here, because
struct C { virtual void a(); };
struct B { virtual void b(); };
struct A : virtual B, C { virtual void c(); };
A a;
bool b1 = (dynamic_cast((B&)a), false);
int
main ()
{
   C &c = dynamic_cast((B&)a);
   C &c2 = dynamic_cast(a);
}
works at runtime.  In the patch I've adjusted the function
comment of cxx_eval_dynamic_cast_fn because with virtual bases
I believe hint -1 might be possible, though I'm afraid I don't


Yes, we would get -1 for dynamic_cast from B to A.


know enough about dynamic_cast and cxx_eval_dynamic_cast_fn
to figure out what needs to change there.  It is hint -2 that
fails, not hint -1.


Yes, this is a -2 case because C does not derive from B.

How does cxx_eval_dynamic_cast_fn fail in this case?  From looking at 
the function it seems like it ought to work.


Jason

Re: Do not drop discriminator when inlining

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Jan Hubicka wrote:

> > On Sun, 22 Jun 2025, Jan Hubicka wrote:
> > 
> > > Hi,
> > > auto-fdo is currently confused by a fact that all inlined functions get
> > > locators with 0 discriminator, so it is not bale to distinguish multiple
> > > inlined calls from single line.
> > > 
> > > Discriminator is lost by calling LOCATION_LOCUS before copying it from
> > > former call statement.  I believe this is only intended to drop block
> > > annotations.
> > > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * tree-inline.cc (expand_call_inline): Preserve discriminator.
> > > 
> > > diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> > > index dee2dfc2620..fa2641780a5 100644
> > > --- a/gcc/tree-inline.cc
> > > +++ b/gcc/tree-inline.cc
> > > @@ -5014,6 +5014,9 @@ expand_call_inline (basic_block bb, gimple *stmt, 
> > > copy_body_data *id,
> > >/* We do want to assign a not UNKNOWN_LOCATION 
> > > BLOCK_SOURCE_LOCATION
> > >   to make inlined_function_outer_scope_p return true on this 
> > > BLOCK.  */
> > >location_t loc = LOCATION_LOCUS (gimple_location (stmt));
> > > +  if (loc != UNKNOWN_LOCATION)
> > > + loc = location_with_discriminator
> > > + (loc, get_discriminator_from_loc (gimple_location (stmt)));
> > 
> > So this doesn't preserve the discriminator on UNKNOWN_LOCATION?  Don't
> > you maybe want
> > 
> >   if (has_discriminator (gimple_location (stmt)))
> > loc = location_with_discriminator (loc, get_disc
> > 
> > ?  Also ...
> > 
> > >if (loc == UNKNOWN_LOCATION)
> > >   loc = LOCATION_LOCUS (DECL_SOURCE_LOCATION (fn));
> > >if (loc == UNKNOWN_LOCATION)
> > 
> > ... these no longer trigger when there's a discriminator, but we do
> > want a != UNKNOWN_LOCATION LOCATION_LOCUS as the comment says.  So
> > better apply the discriminator last?
> That is why I checked for loc != UNKNOWN_LOCATION.  I did not expect
> UNKNOWN_LOCATION to have discriminators. What they are good for?

I have no idea, this was simply a defensive review where it's no
longer obvious that inlined_function_outer_scope_p would still work
in all cases.

> I am re-testing the change as suggested.  I will commit it if tesitng
> succeds (I am in China with somewhat limited SSH access and I would like
> the lnt testers to pick that change soon, so I know what benchmarks I
> need to analyze for afdo regressions.)

LGTM.

> Thanks,
> Honza
> 
> gcc/ChangeLog:
> 
>   * tree-inline.cc (expand_call_inline): Copy discriminators.
> 
> diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> index dee2dfc2620..7e0ac698e5e 100644
> --- a/gcc/tree-inline.cc
> +++ b/gcc/tree-inline.cc
> @@ -5018,6 +5018,9 @@ expand_call_inline (basic_block bb, gimple *stmt, 
> copy_body_data *id,
>   loc = LOCATION_LOCUS (DECL_SOURCE_LOCATION (fn));
>if (loc == UNKNOWN_LOCATION)
>   loc = BUILTINS_LOCATION;
> +  if (has_discriminator (gimple_location (stmt)))
> + loc = location_with_discriminator
> + (loc, get_discriminator_from_loc (gimple_location (stmt)));
>id->block = make_node (BLOCK);
>BLOCK_ABSTRACT_ORIGIN (id->block) = DECL_ORIGIN (fn);
>BLOCK_SOURCE_LOCATION (id->block) = loc;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v6 5/9] AArch64: make `far_branch` attribute a boolean

2025-06-24 Thread Karl Meakin

The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz1): Likewise.
(*aarch64_tbz1): Likewise.
(@aarch64_tbz): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fffa97a1ef1..0a378ab377d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -569,9 +569,7 @@ (define_attr "enabled" "no,yes"
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -795,8 +793,8 @@ (define_insn "aarch64_bcond"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -860,8 +858,8 @@ (define_insn "aarch64_cbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
@@ -875,7 +873,7 @@ (define_insn "*aarch64_tbz1"
   {
 if (get_attr_length (insn) == 8)
   {
-   if (get_attr_far_branch (insn) == 1)
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
  return aarch64_gen_far_branch (operands, 1, "Ltb",
 "\\t%0, , ");
else
@@ -904,8 +902,8 @@ (define_insn "*aarch64_tbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; ---
@@ -969,8 +967,8 @@ (define_insn "@aarch64_tbz"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 
 )
 
-- 
2.45.2

Re: Do not drop discriminator when inlining

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Jan Hubicka wrote:

> > > That is why I checked for loc != UNKNOWN_LOCATION.  I did not expect
> > > UNKNOWN_LOCATION to have discriminators. What they are good for?
> > 
> > I have no idea, this was simply a defensive review where it's no
> > longer obvious that inlined_function_outer_scope_p would still work
> > in all cases.
> 
> Understood.  I am not too familiar with discriminator implementation,
> but it seems that afdo is actually quite useful testsuite for profile
> info, so I suppose I will learn.
> 
> Before inline stacks were applied correctly, it was kind of useless to
> debug other issues.  Now the profiles seems much more sane at least when
> inlining at instrumentation time matches inlining at profile use time.
> 
> What seems to be common now is profile breakage around loops that has
> been fully unrolled or vectorized which is bit undderstandbale thought I
> wonder if we can improve here.  I think we can fix problem where profile
> of loop header stmts is partly or fully lost (which seems to be main
> issue now that prevents loop optimization since then loop headers looks
> cold).  I suppose this can be fixed by making sure the debug statement
> is duplicated into the loop variants.
> 
> However I wonder if we can preserve info that after vectorizaiton one
> vectorized stmt actualy performs multiple original stmts...

I think the best we can possibly do is to preserve the original
scalar control IV via debug stmts.  Of course IVOPTs will likely
mess that up.  Currently the vectorizer invents new control IVs
for each loop it generates, I'd like to see it use a single
downward counting IV tracking remaining scalar iterations
throughout the main vector loop and the prologue/epliogues, that
should also simplify code a bit.  OTOH it will likely make IVOPTs
job a bit harder (since that only considers one loop at a time).

OTOH, there'll be N active scalar IV values at each point, so
maybe this doesn't help at all without some extra dwarf/consumer
support.

Richard.

Re: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread Richard Biener




> Am 25.06.2025 um 07:32 schrieb H.J. Lu :
> 
> On Wed, Jun 25, 2025 at 1:11 PM Hongtao Liu  wrote:
>> 
>>> On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu  wrote:
>>> 
>>> -mtune=intel is used to generate a single binary to run well on both big
>>> core and small core, similar to hybrid CPUs.  Update -mtune=intel to tune
>>> for Diamond Rapids and Clearwater Forest, instead of Silvermont.
>>> 
>>> PR target/120815
>>> * common/config/i386/i386-common.cc (processor_alias_table):
>>> Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
>>> PROCESSOR_INTEL.
>>> * config/i386/i386-options.cc (processor_cost_table): Replace
>>> intel_cost with alderlake_cost.
>>> * config/i386/x86-tune-costs.h (intel_cost): Removed.
>>> * config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
>>> PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
>>> (ix86_adjust_cost): Likewise.
>>> * doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
>>> Clearwater Forest.
>>> 
>>> OK for master?
>> Ok.
> 
> Should it be backported to release branches?

I think that’s disruption we want to avoid there.  The 15 branch could 
eventually be considered.

Richard 

> 
> 
> --
> H.J.

Re: [Patch, Fortran, Coarray, PR88076, v1] 6/6 Add a shared memory multi process coarray library.

2025-06-24 Thread Andre Vehreschild

Hi Damian, hi Steve,

enabling coarray-support by default has implications we need to consider. The
memory footprint of a coarray enabled program is larger than the one of a
non-coarray one. This is simply because the coarray token needs to be stored
somewhere.

Furthermore, I just yesterday figured, that with -fcoarray=single the space for
the token was allocated. I.e. every data structure, that could possibly be
stored in a coarray and had allocatable components in it, wasted another 8 byte
for an unused pointer. 

So when we default to having coarray support enabled, some work needs to be
done, to remove such inefficiencies. Given there are only a few developers,
that work on coarrays, this may take some time.

What we can of course do, is to switch on the coarray mode, when we detect the
first coarray construct and no longer need the user to do it. I hope this does
not have to many implications and causes only a hand full of bugs.

For the time being, I propose to first give the new coarray implementation some
time to mature and test. There will be bugs, because nobody is perfect.

@Steve caf_shmem does not use MPI. It is a shared memory, single node, multi
process approach. Just to prevent any misunderstanding.

Thanks for all the testing.

Regards,
Andre

On Tue, 24 Jun 2025 11:13:52 -0700
Steve Kargl  wrote:

> Damian,
> 
> I submitted a patch a long time ago to make -fcoarray=single the
> default behavior.  The patch made -fcoarray=none a NOP.  With
> inclusion of a shmem implementation of the runtime parts, this
> might be the way to go.  I'll leave that decision to Andre, Thomas,
> and Nicolas.
> 
> I believe that the gfortran contributors have not considered
> coarray as an optional add-on.  The problem for gfortran is
> that it runs on dozens of CPUs and dozens upon dozens of
> operating systems.  The few gfortran contributors simply cannot
> ensure that opencoarray+mpich or opencoarray+openmpi runs on
> all of the possible combinations of hardware and OS's.  Andre
> has hinted that he expects some rough edges on non-linux system.
> I'll find out this weekend when I give his patch a spin on
> FreeBSD.  Hopefully, a windows10/11 user can test the patch. 
> 

-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

2025-06-24 Thread Jason Merrill


On 6/24/25 10:16 AM, Nathaniel Shead wrote:

On Tue, Jun 24, 2025 at 01:03:53PM +0200, Jakub Jelinek wrote:

Hi!

The following patch implements the P3618R0 paper by tweaking pedwarn
condition, adjusting pedwarn wording, adjusting one testcase and adding 4
new ones.  The paper was voted in as DR, so it isn't guarded on C++ version.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-06-24  Jakub Jelinek  

PR c++/120773
* decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
main to the global module.  Only pedwarn for current_lang_name
other than lang_name_cplusplus and adjust pedwarn wording.

* g++.dg/parse/linkage5.C: Don't expect error on
extern "C++" int main ();.
* g++.dg/parse/linkage7.C: New test.
* g++.dg/parse/linkage8.C: New test.
* g++.dg/modules/main-2.C: New test.
* g++.dg/modules/main-3.C: New test.

--- gcc/cp/decl.cc.jj   2025-06-19 08:55:04.408676724 +0200
+++ gcc/cp/decl.cc  2025-06-23 17:47:13.942011687 +0200
@@ -11326,9 +11326,9 @@ grokfndecl (tree ctype,
  "cannot declare %<::main%> to be %qs", "consteval");
if (!publicp)
error_at (location, "cannot declare %<::main%> to be static");
-  if (current_lang_depth () != 0)
+  if (current_lang_name != lang_name_cplusplus)
pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a"
-" linkage specification");
+" linkage specification other than %<\"C++\"%>");
if (module_attach_p ())
error_at (location, "cannot attach %<::main%> to a named module");


Maybe it would be nice to add a note/fixit that users can now work
around this error by marking main as 'extern "C++"'?  But overall LGTM.


I suppose we could say "other than %" to make that a 
little clearer.  OK with that tweak.


I wouldn't object to a fixup but it sounds more complicated than it's 
worth to have different fixups for the extern "C" { int main(); } and 
extern "C" int main(); cases.


Jason

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-24 Thread H.J. Lu

On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu  wrote:
>
> On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu  wrote:
> >
> > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu  wrote:
> > > >
> > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu  wrote:
> > > > >
> > > > > Extend the remove_redundant_vector pass to handle vector broadcasts 
> > > > > from
> > > > > constant and variable scalars.  When broadcasting from constants and
> > > > > function arguments, we can place a single widest vector broadcast at
> > > > > entry of the nearest common dominator for basic blocks with all uses
> > > > > since constants and function arguments aren't changed.  For broadcast
> > > > > from variables with a single definition, the single definition is
> > > > > replaced with the widest broadcast.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/92080
> > > > > * config/i386/i386-expand.cc (ix86_expand_call): Set
> > > > > recursive_function to true for recursive call.
> > > > > * config/i386/i386-features.cc (ix86_place_single_vector_set):
> > > > > Add an argument for inner scalar, default to nullptr.  Set the
> > > > > source from inner scalar if not nullptr.
> > > > > (ix86_get_vector_load_mode): Renamed to ...
> > > > > (ix86_get_vector_cse_mode): This.  Add an argument for scalar 
> > > > > mode
> > > > > and handle integer and float scalar modes.
> > > > > (replace_vector_const): Add an argument for scalar mode and 
> > > > > pass
> > > > > it to ix86_get_vector_load_mode.
> > > > > (x86_cse_kind): New.
> > > > > (redundant_load): Likewise.
> > > > > (ix86_broadcast_inner): Likewise.
> > > > > (remove_redundant_vector_load): Also support const0_rtx and
> > > > > constm1_rtx broadcasts.  Handle vector broadcasts from 
> > > > > constant
> > > > > and variable scalars.
> > > > > * config/i386/i386.h (machine_function): Add 
> > > > > recursive_function.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to 
> > > > > expect
> > > > > movdqa instead pxor.
> > > > > * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> > > > > * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> > > > > * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> > > > > * gcc.target/i386/pr92080-4.c: New test.
> > > > > * gcc.target/i386/pr92080-5.c: Likewise.
> > > > > * gcc.target/i386/pr92080-6.c: Likewise.
> > > > > * gcc.target/i386/pr92080-7.c: Likewise.
> > > > > * gcc.target/i386/pr92080-8.c: Likewise.
> > > > > * gcc.target/i386/pr92080-9.c: Likewise.
> > > > > * gcc.target/i386/pr92080-10.c: Likewise.
> > > > > * gcc.target/i386/pr92080-11.c: Likewise.
> > > > > * gcc.target/i386/pr92080-12.c: Likewise.
> > > > > * gcc.target/i386/pr92080-13.c: Likewise.
> > > > > * gcc.target/i386/pr92080-14.c: Likewise.
> > > > > * gcc.target/i386/pr92080-15.c: Likewise.
> > > > > * gcc.target/i386/pr92080-16.c: Likewise.
> > > > >
> > > > > Signed-off-by: H.J. Lu 
> > > > > ---
> > > > >  gcc/config/i386/i386-expand.cc|   3 +
> > > > >  gcc/config/i386/i386-features.cc  | 410 
> > > > > ++
> > > > >  gcc/config/i386/i386.h|   3 +
> > > > >  .../i386/keylocker-aesdecwide128kl.c  |  14 +-
> > > > >  .../i386/keylocker-aesdecwide256kl.c  |  14 +-
> > > > >  .../i386/keylocker-aesencwide128kl.c  |  14 +-
> > > > >  .../i386/keylocker-aesencwide256kl.c  |  14 +-
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-10.c|  13 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-11.c|  33 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-12.c|  16 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-13.c|  32 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-14.c|  31 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-15.c|  25 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-16.c|  26 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-4.c |  50 +++
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-5.c | 109 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-6.c |  19 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-7.c |  20 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-8.c |  16 +
> > > > >  gcc/testsuite/gcc.target/i386/pr92080-9.c |  81 
> > > > >  20 files changed, 823 insertions(+), 120 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-10.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-11.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-12.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-13

Re: [PATCH v6 2/3] Use the counted_by attribute of pointers in builtinin-object-size.

2025-06-24 Thread Qing Zhao



> On Jun 24, 2025, at 03:26, Richard Biener  wrote:
> 
> On Mon, Jun 23, 2025 at 4:44 PM Qing Zhao  wrote:
>> 
>> gcc/ChangeLog:
>> 
>>* tree-object-size.cc (access_with_size_object_size): Update comments
>>for pointers with .ACCESS_WITH_SIZE.
>>(collect_object_sizes_for): Propagate size info through GIMPLE_ASSIGN
>>for pointers with .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/pointer-counted-by-4-char.c: New test.
>>* gcc.dg/pointer-counted-by-4-float.c: New test.
>>* gcc.dg/pointer-counted-by-4-struct.c: New test.
>>* gcc.dg/pointer-counted-by-4-union.c: New test.
>>* gcc.dg/pointer-counted-by-4.c: New test.
>>* gcc.dg/pointer-counted-by-5.c: New test.
>>* gcc.dg/pointer-counted-by-6.c: New test.
>>* gcc.dg/pointer-counted-by-7.c: New test.
>> ---
>> .../gcc.dg/pointer-counted-by-4-char.c|  6 ++
>> .../gcc.dg/pointer-counted-by-4-float.c   |  6 ++
>> .../gcc.dg/pointer-counted-by-4-struct.c  | 10 +++
>> .../gcc.dg/pointer-counted-by-4-union.c   | 10 +++
>> gcc/testsuite/gcc.dg/pointer-counted-by-4.c   | 77 +++
>> gcc/testsuite/gcc.dg/pointer-counted-by-5.c   | 56 ++
>> gcc/testsuite/gcc.dg/pointer-counted-by-6.c   | 54 +
>> gcc/testsuite/gcc.dg/pointer-counted-by-7.c   | 30 
>> gcc/tree-object-size.cc   | 18 -
>> 9 files changed, 264 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4-char.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4-float.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4-struct.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4-union.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-5.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-6.c
>> create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-7.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4-char.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4-char.c
>> new file mode 100644
>> index 000..c404e5b8cce
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4-char.c
>> @@ -0,0 +1,6 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +#define PTR_TYPE char
>> +#include "pointer-counted-by-4.c"
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4-float.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4-float.c
>> new file mode 100644
>> index 000..383d8fb656d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4-float.c
>> @@ -0,0 +1,6 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +#define PTR_TYPE float
>> +#include "pointer-counted-by-4.c"
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4-struct.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4-struct.c
>> new file mode 100644
>> index 000..50246d29477
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4-struct.c
>> @@ -0,0 +1,10 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +struct A {
>> +  int a;
>> +  char *b;
>> +};
>> +#define PTR_TYPE struct A
>> +#include "pointer-counted-by-4.c"
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4-union.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4-union.c
>> new file mode 100644
>> index 000..e786d996147
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4-union.c
>> @@ -0,0 +1,10 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +union A {
>> +  int a;
>> +  float b;
>> +};
>> +#define PTR_TYPE union A
>> +#include "pointer-counted-by-4.c"
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>> new file mode 100644
>> index 000..11ae6288030
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>> @@ -0,0 +1,77 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +#ifndef PTR_TYPE
>> +#define PTR_TYPE int
>> +#endif
>> +struct pointer_array {
>> +  int b;
>> +  PTR_TYPE *c;
>> +} *p_array;
>> +
>> +struct annotated {
>> +  PTR_TYPE *c __attribute__ ((counted_by (b)));
>> +  int b;
>> +} *p_array_annotated;
>> +
>> +struct nested_annotated {
>> +  PTR_TYPE *c __attribute__ ((counted_

Re: [PATCH] libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

2025-06-24 Thread Tomasz Kaminski

On Tue, Jun 24, 2025 at 9:38 AM Tomasz Kamiński  wrote:

> For month_day we incorrectly reported day information to be available,
> which lead
> to format_error being thrown from the call to formatter::format at
> runtime, instead
> of making call to format ill-formed.
>
> The included test cover most of the combinations of _ChronoParts and format
> specifiers.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h
> (formatter::parse): Call _M_parse
> with
> only Month being available.
> * testsuite/std/time/format/data_not_present_neg.cc: New test.
> ---
> I want to merged the data_not_present_neg.cc, as my type erasing
> implementation
> relies on detection durin parsing.
> Testing on x86_64-linux. std/time/format* tests passed.
> OK for trunk when all test passes? And chrono_io only change for v15?
>
>  libstdc++-v3/include/bits/chrono_io.h |   3 +-
>  .../std/time/format/data_not_present_neg.cc   | 163 ++
>  2 files changed, 164 insertions(+), 2 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index abbf4efcc3b..4eb00f4932d 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -2199,8 +2199,7 @@ namespace __format
>constexpr typename basic_format_parse_context<_CharT>::iterator
>parse(basic_format_parse_context<_CharT>& __pc)
>{
> -   return _M_f._M_parse(__pc, __format::_Month|__format::_Day,
> -__defSpec);
> +   return _M_f._M_parse(__pc, __format::_Month, __defSpec);
>}
>
>template
> diff --git
> a/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
> b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
> new file mode 100644
> index 000..bcc943b86ad
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
> @@ -0,0 +1,163 @@
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +#include 
> +
> +using namespace std::chrono;
> +
> +auto d1 = std::format("{:%w}", 10d); // { dg-error "call to consteval
> function" }
> +auto d2 = std::format("{:%m}", 10d); // { dg-error "call to consteval
> function" }
> +auto d3 = std::format("{:%y}", 10d); // { dg-error "call to consteval
> function" }
> +auto d4 = std::format("{:%F}", 10d); // { dg-error "call to consteval
> function" }
> +auto d5 = std::format("{:%T}", 10d); // { dg-error "call to consteval
> function" }
> +auto d6 = std::format("{:%Q}", 10d); // { dg-error "call to consteval
> function" }
> +auto d7 = std::format("{:%Z}", 10d); // { dg-error "call to consteval
> function" }
> +
> +auto w1 = std::format("{:%d}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w2 = std::format("{:%m}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w3 = std::format("{:%y}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w4 = std::format("{:%F}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w5 = std::format("{:%T}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w6 = std::format("{:%Q}", Thursday); // { dg-error "call to
> consteval function" }
> +auto w7 = std::format("{:%Z}", Thursday); // { dg-error "call to
> consteval function" }
> +
> +auto wi1 = std::format("{:%d}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi2 = std::format("{:%m}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi3 = std::format("{:%y}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi4 = std::format("{:%F}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi5 = std::format("{:%T}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi6 = std::format("{:%Q}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +auto wi7 = std::format("{:%Z}", Thursday[2]); // { dg-error "call to
> consteval function" }
> +
> +auto wl1 = std::format("{:%d}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl2 = std::format("{:%m}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl3 = std::format("{:%y}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl4 = std::format("{:%F}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl5 = std::format("{:%T}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl6 = std::format("{:%Q}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +auto wl7 = std::format("{:%Z}", Thursday[last]); // { dg-error "call to
> consteval function" }
> +
> +auto m1 = std::format("{:%d}", January); // { dg-error "call to consteval
> function" }
> +auto m2 = std::format("{:%w}", January); // { dg-error "call to consteval
> function" }
> +auto m3 =

Re: [PATCH] s390: Add some missing vector patterns.

2025-06-24 Thread Stefan Schulze Frielinghaus

On Tue, Jun 24, 2025 at 09:49:01AM +0200, Juergen Christ wrote:
> Some patterns that are detected by the autovectorizer can be supported by
> s390.  Add expanders such that autovectorization of these patterns works.
> 
> Bootstrapped and regtested on s390.  Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md (avg3_ceil): New pattern.
>   (uavg3_ceil): New pattern.
>   (smul3_highpart): New pattern.
>   (umul3_highpart): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/pattern-avg-1.c: New test.
>   * gcc.target/s390/vector/pattern-mulh-1.c: New test.
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/vector.md | 28 ++
>  .../gcc.target/s390/vector/pattern-avg-1.c| 26 +
>  .../gcc.target/s390/vector/pattern-mulh-1.c   | 29 +++
>  3 files changed, 83 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 6f4e1929eb80..16f4b8116432 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -3576,3 +3576,31 @@
>  ; vec_unpacks_float_lo
>  ; vec_unpacku_float_hi
>  ; vec_unpacku_float_lo
> +
> +(define_expand "avg3_ceil"
> +  [(set (match_operand:VIT_HW_VXE3_T0 
> "register_operand" "=v")
> + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
> "register_operand"  "v")
> +(match_operand:VIT_HW_VXE3_T 2 
> "register_operand"  "v")]
> +   UNSPEC_VEC_AVG))]
> +  "TARGET_VX")
> +
> +(define_expand "uavg3_ceil"
> +  [(set (match_operand:VIT_HW_VXE3_T0 
> "register_operand" "=v")
> + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
> "register_operand"  "v")
> +(match_operand:VIT_HW_VXE3_T 2 
> "register_operand"  "v")]
> +   UNSPEC_VEC_AVGU))]
> +  "TARGET_VX")
> +
> +(define_expand "smul3_highpart"
> +  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand"   
> "=v")
> + (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
> "register_operand" "v")
> + (match_operand:VIT_HW_VXE3_DT 2 
> "register_operand" "v")]
> +UNSPEC_VEC_SMULT_HI))]
> +  "TARGET_VX")
> +
> +(define_expand "umul3_highpart"
> +  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand"   
> "=v")
> + (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
> "register_operand" "v")
> + (match_operand:VIT_HW_VXE3_DT 2 
> "register_operand" "v")]
> +UNSPEC_VEC_UMULT_HI))]
> +  "TARGET_VX")

In commit r12-4231-g555fa3545efe23 RTX smul_highpart and umul_highpart
were introduced which we could use instead of the unspec, now.  So one
solution would be to move vec_smulh/vec_umulh from
vx-builtins.md to vector.md and rename those to
smul3_highpart/umul3_highpart and then making sure that
those are used in s390-builtins.def.  Of course, replacing the unspec by
the corresponding RTXs', too.

Sorry for bothering with this.  But I think it is worthwhile to replace
those unspecs.

Thanks,
Stefan

> diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c 
> b/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
> new file mode 100644
> index ..a15301aabe54
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mzarch -march=z16 -ftree-vectorize 
> -fdump-tree-optimized" } */
> +
> +#define TEST(T1,T2,N)   \
> +  void  \
> +  avg##T1 (signed T1 *__restrict res, signed T1 *__restrict a,  \
> +   signed T1 *__restrict b) \
> +  { \
> +for (int i = 0; i < N; ++i) \
> +  res[i] = ((signed T2)a[i] + b[i] + 1) >> 1;   \
> +  } \
> +\
> +  void  \
> +  uavg##T1 (unsigned T1 *__restrict res, unsigned T1 *__restrict a, \
> +unsigned T1 *__restrict b)  \
> +  { \
> +for (int i = 0; i < N; ++i) \
> +  res[i] = ((unsigned T2)a[i] + b[i] + 1) >> 1; \
> +  }
> +
> +TEST(char,short,16)
> +TEST(short,int,8)
> +

Re: [PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future

2025-06-24 Thread Surya Kumari Jangala

Hi Mike,

On 24/06/25 10:03 am, Michael Meissner wrote:
> On Fri, Jun 20, 2025 at 01:19:45PM -0500, Segher Boessenkool wrote:
>> Hi!
>>
>> On Fri, Jun 20, 2025 at 10:38:30PM +0530, Surya Kumari Jangala wrote:
>>> On 14/06/25 2:13 pm, Michael Meissner wrote:
 This is patch #4 of 4 to add -mcpu=future support to the PowerPC.
>>>
>>> I think this should be a separate patch in itself. As such, this
>>> patch is not required to enable the -mcpu=future option.
>>
>> It can in theory be helpful to have it in the same series, but yeah, it
>> certainly does not belong here.  It should be a separate patch, and it
>> should come with some evidence or at the very least some indication that
>> it would be a good idea to have it at all, and proof that is not a *bad*
>> idea!
> 
> Sure, I can separate it, as there are other patches waiting for -mcpu=future
> support to go in.  The important thing is just having a switch to generate
> these new instructions.
> 
 In the development for the power10 processor, GCC did not enable using the 
 load
 vector pair and store vector pair instructions when optimizing things like
>>>
>>> s/things/functions
>>
>> "Things" is nicely non-specific, hehe.
>>
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
>>>
>>> Just FUTURE_MASKS_SERVER

Just to clarify, what I meant here is that the code does not have any macro 
named
ISA_FUTURE_MASKS_SERVER.
There is only FUTURE_MASKS_SERVER, and the commit message should specify this 
macro.

Regards,
Surya

>>
>> The existing masks are ISA_3_1_MASKS_SERVER (and many older ISAs before
>> it), and POWER11_MASKS_SERVER .  We do not have to call ISA 3.2
>> "Future", certainly not by IBM's lawyers, it isn't IBM who will publish
>> Power Architecture revisions anyway!
> 
> But it may be ISA_4_0 and not ISA_3_2 or maybe ISA_50.  But we in GCC land 
> (and
> those in LLVM land) have to have some name to use before the 'official' name 
> is
> chosen.  But as hardware gells, we need to get some support into GCC so that 
> it
> ultimately goes in the Linux distributions.
> 
> I have another set of patches that separates flag bits from ISA bits that I
> have posted several times.  But I'm trying to break the logjam to get patches
> in, and I was proritizing getting -mcpu=future in first.
> 
> Patches are like an agile system, in that generally people don't want a
> sweeping set of changes.  But instead they do a step-wise improvement.  We 
> will
> need at some point to add in changes that may or may not be in future PowerPC
> architectures.
> 
> Future is a convenient way to have these possible changes.  We did as a 
> staging
> method for power7, power8, power9, and power10 systems.  I don't recall why we
> deleted the -mcpu=future support, but these patches provide a way for future
> patches to add possible experimental or future support.
> 
> For example, when I worked at AMD, we had a massive set of changes that I did
> and were put in.  I was working with the hardware team providing compilers and
> such.  However, the company ultimately decided to go in a different direction,
> and all of these 'future' patches were later removed.
> 
>> Yeah, ISA_FUTURE makes no sense in the first place, "Future" here is a
>> stand-in for the marketing name for the next IBM Power Server chip.  The
>> (lawyers') fear is that if we publish the expected name for the next
>> generation server CPU, and also GCC support for that CPU, that then some
>> potential customers can argue in the future (har har) that that was a
>> promise.  So we call it "Future", no specific version or timespan, and
>> of course we cannot really predict the future, and future plans can
>> always change, too.
>>
>> You can expect that in the future (when things have settled) we will
>> just do a tree-wide search and replace.
> 
> Sure, that happens.  It is a simple mechanical process.  But if/when that
> machine comes out, we should leave in -mcpu=future to allow for development
> beyond that machines.
> 
>>
* gcc/config/rs6000/rs6000.cc (rs6000_machine_from_flags): Disable
-mblock-ops-vector-pair from influcing .machine selection.
>>>
>>> nit: "influencing"
>>
>> Speling fixes are never a nit!  Attention to details is important.
>>
>>> Also, in rs6000.opt, mblock-ops-vector-pair is marked as Undocumented. 
>>> Should we
>>> change this?
>>
>> Probably yes.  If the option is worth being user-selectable at all, we
>> should document it.
> 
> -mblock-ops-vector-pair was a 'quick' hack to disable the memory/string
>  functions from generating vector pair load/store instructions.  It was done 
> in
>  the late stages of power10 development, where there was one specific use case
>  that ran much slower, but it would be addressed in future releases.  It 
> didn't
>  happen for power11, but it would be nice to have it ready if the -mcpu=future
>  changes make it into a future processor.
> 
> I ran spec 2017, and on power10 hardware, one benchmark (

Re: [PATCH 2/2] RISC-V: Add testcases for signed scalar SAT_ADD IMM form 2

2025-06-24 Thread Jeff Law





On 6/23/25 9:12 PM, Ciyan Pan wrote:

From: panciyan 

This patch adds testcase for form2, as shown below:

T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)\
{\
   T sum = (T)((UT)x + (UT)IMM);   \
   return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \
 (-(T)(x < 0) ^ MAX) : sum; \
}

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h:
* gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test.
Pan -- can you cover reviewing the testsuite bits since thisis an area 
where you've done a ton of work over the last year or so.


THanks!

jeff

Re: [PATCH] libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

2025-06-24 Thread Jonathan Wakely

On Tue, 24 Jun 2025 at 08:39, Tomasz Kamiński  wrote:
>
> For month_day we incorrectly reported day information to be available, which 
> lead
> to format_error being thrown from the call to formatter::format at runtime, 
> instead
> of making call to format ill-formed.
>
> The included test cover most of the combinations of _ChronoParts and format
> specifiers.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h
> (formatter::parse): Call _M_parse with
> only Month being available.
> * testsuite/std/time/format/data_not_present_neg.cc: New test.
> ---
> I want to merged the data_not_present_neg.cc, as my type erasing 
> implementation
> relies on detection durin parsing.
> Testing on x86_64-linux. std/time/format* tests passed.
> OK for trunk when all test passes? And chrono_io only change for v15?

Yes for trunk, and the header change for 15, thanks.


>
>  libstdc++-v3/include/bits/chrono_io.h |   3 +-
>  .../std/time/format/data_not_present_neg.cc   | 163 ++
>  2 files changed, 164 insertions(+), 2 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
> b/libstdc++-v3/include/bits/chrono_io.h
> index abbf4efcc3b..4eb00f4932d 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -2199,8 +2199,7 @@ namespace __format
>constexpr typename basic_format_parse_context<_CharT>::iterator
>parse(basic_format_parse_context<_CharT>& __pc)
>{
> -   return _M_f._M_parse(__pc, __format::_Month|__format::_Day,
> -__defSpec);
> +   return _M_f._M_parse(__pc, __format::_Month, __defSpec);
>}
>
>template
> diff --git a/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc 
> b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
> new file mode 100644
> index 000..bcc943b86ad
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
> @@ -0,0 +1,163 @@
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +#include 
> +
> +using namespace std::chrono;
> +
> +auto d1 = std::format("{:%w}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d2 = std::format("{:%m}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d3 = std::format("{:%y}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d4 = std::format("{:%F}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d5 = std::format("{:%T}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d6 = std::format("{:%Q}", 10d); // { dg-error "call to consteval 
> function" }
> +auto d7 = std::format("{:%Z}", 10d); // { dg-error "call to consteval 
> function" }
> +
> +auto w1 = std::format("{:%d}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w2 = std::format("{:%m}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w3 = std::format("{:%y}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w4 = std::format("{:%F}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w5 = std::format("{:%T}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w6 = std::format("{:%Q}", Thursday); // { dg-error "call to consteval 
> function" }
> +auto w7 = std::format("{:%Z}", Thursday); // { dg-error "call to consteval 
> function" }
> +
> +auto wi1 = std::format("{:%d}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi2 = std::format("{:%m}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi3 = std::format("{:%y}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi4 = std::format("{:%F}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi5 = std::format("{:%T}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi6 = std::format("{:%Q}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +auto wi7 = std::format("{:%Z}", Thursday[2]); // { dg-error "call to 
> consteval function" }
> +
> +auto wl1 = std::format("{:%d}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl2 = std::format("{:%m}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl3 = std::format("{:%y}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl4 = std::format("{:%F}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl5 = std::format("{:%T}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl6 = std::format("{:%Q}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +auto wl7 = std::format("{:%Z}", Thursday[last]); // { dg-error "call to 
> consteval function" }
> +
> +auto m1 = std::format("{:%d}", January); // { dg-error "call to consteval 
> function" }
> +auto m2 = std::

[PATCH] dwarf2out reproduceability: do not hash pointer

2025-06-24 Thread Thomas Otto

Erroneously changed when converting dwarf2out to the new inchash
interface (f768061c4c0). The other hash_loc_operands calls
were left in place.

gcc/Changelog:
* dwarf2out.cc (hash_loc_operands): do not hash pointer
---
 gcc/dwarf2out.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d1a55dbcbcb..63db76fca27 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -31913,7 +31913,7 @@ hash_loc_operands (dw_loc_descr_ref loc, inchash::hash 
&hstate)
   break;
 case DW_OP_entry_value:
 case DW_OP_GNU_entry_value:
-  hstate.add_object (val1->v.val_loc);
+  hash_loc_operands (val1->v.val_loc, hstate);
   break;
 case DW_OP_regval_type:
 case DW_OP_deref_type:
-- 
2.47.2

Re: [PATCH] libstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]

2025-06-24 Thread Patrick Palka

On Tue, 24 Jun 2025, Jonathan Wakely wrote:

> On Tue, 24 Jun 2025 at 03:20, Patrick Palka  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > -- >8 --
> >
> > When checking __is_complete_or_unbounded on a reference to incomplete
> > type, we overeagerly try to instantiate/complete the referenced type
> > which besides being unnecessary may also produce a -Wsfinae-incomplete
> > warning (added in r16-1527) if the referenced type is later defined.
> >
> > This patch fixes this by effectively restricting the sizeof check to
> > object (except unknown-bound array) types.  In passing simplify the
> > implementation by using is_object instead of is_function/reference/void.
> >
> > PR libstdc++/120717
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/type_traits (__is_complete_or_unbounded): Don't
> > check sizeof on a reference or unbounded array type.  Simplify
> > using is_object.  Correct formatting.
> > * testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.
> > ---
> >  libstdc++-v3/include/std/type_traits  | 34 +--
> >  .../is_complete_or_unbounded/120717.cc| 20 +++
> >  2 files changed, 37 insertions(+), 17 deletions(-)
> >  create mode 100644 
> > libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/120717.cc
> >
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index abff9f880001..28960befd2c7 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -280,11 +280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> >// Forward declarations
> >template
> > -struct is_reference;
> > -  template
> > -struct is_function;
> > -  template
> > -struct is_void;
> > +struct is_object;
> >template
> >  struct remove_cv;
> >template
> > @@ -297,18 +293,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >// Helper functions that return false_type for incomplete classes,
> >// incomplete unions and arrays of known bound from those.
> >
> > -  template 
> > -constexpr true_type __is_complete_or_unbounded(__type_identity<_Tp>)
> > -{ return {}; }
> > -
> > -  template  > -  typename _NestedType = typename _TypeIdentity::type>
> > -constexpr typename __or_<
> > -  is_reference<_NestedType>,
> > -  is_function<_NestedType>,
> > -  is_void<_NestedType>,
> > -  __is_array_unknown_bounds<_NestedType>
> > ->::type __is_complete_or_unbounded(_TypeIdentity)
> > +  // More specialized overload for complete object types.
> > +  template > +  typename = __enable_if_t>,
> > +  
> > __is_array_unknown_bounds<_Tp>>::value>,
> 
> Maybe it's because I'm congested and my head feels like it's full of
> potatoes, but the double negative is confusing for me.
> 
> Would __and_, __not_<__is_array_unknown_bounds> work?
> 
> We could even name that:
> 
> // An object type which is not an unbounded array.
> // It might still be an incomplete type, but if this is false_type
> // then we can be certain it's not a complete object type.
> template
>   using __maybe_complete_object_type = ...;

I like it.

> 
> 
> > +  size_t = sizeof(_Tp)>
> > +constexpr true_type
> > +__is_complete_or_unbounded(__type_identity<_Tp>)
> > +{ return {}; };
> > +
> > +  // Less specialized overload for reference and unknown-bound array 
> > types, and
> > +  // incomplete types.
> > +  template > +  typename _NestedType = typename _TypeIdentity::type>
> > +constexpr typename __or_<__not_>,
> > +__is_array_unknown_bounds<_NestedType>>::type
> 
> Then this would be __not_<__maybe_complete_object_type<_NestedType>>
> but maybe that's as bad as the double negative above.

The new helper definitely makes things clearer to me.  Like so?

-- >8 --

Subject: [PATCH] libstdc++: Unnecessary type completion in
 __is_complete_or_unbounded [PR120717]

When checking __is_complete_or_unbounded on a reference to incomplete
type, we overeagerly try to instantiate/complete the referenced type
which besides being unnecessary may also produce a -Wsfinae-incomplete
warning (added in r16-1527) if the referenced type is later defined.

This patch fixes this by effectively restricting the sizeof check to
object (except unknown-bound array) types.  In passing simplify the
implementation by using is_object instead of is_function/reference/void
and introducing a __maybe_complete_object_type helper.

PR libstdc++/120717

libstdc++-v3/ChangeLog:

* include/std/type_traits (__maybe_complete_object_type): New
trait, factored out from ...
(__is_complete_or_unbounded): ... here.  Only check sizeof on a
__maybe_complete_object_type type.  Fix formatting.
* testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.
---
 libstdc++-v3/include/std/type_trait

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Biener

On Tue, 24 Jun 2025, Tamar Christina wrote:

> store_bit_field_1 has an optimization where if a target is not a memory 
> operand
> and the entire value is being set from something larger we can just wrap a
> subreg around the source and emit a move.
> 
> For vector constructors this is however problematic because the subreg means
> that the expansion of the constructor won't happen through vec_init anymore.

But the expansion of the constructor happened already?

> Complicated constructors which aren't natively supported by targets then ICE 
> as
> they wouldn't have been expanded so recog fails.
> 
> This patch blocks the optimization on non-constant vector constructors. Or 
> non-uniform
> non-constant vectors.

+static bool
+foldable_value_with_subreg (rtx value)
+{
+  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
+return true;

that seems to allow any non-CONST_VECTOR, thus doesn't block
non-constant vector constructors?

It sounds like the problem happens upthread of store_bit_field_1?

 I allowed constant vectors because if I read the code right
> simplify-rtx should be able to perform the simplification of pulling out the 
> element
> or merging the constant values.  There are several testcases in 
> aarch64-sve-pcs.exp
> that test this as well. I allowed uniform non-constant vectors because they
> would be folded into a vec_select later on.
> 
> Note that codegen is quite horrible, for what should only be an lsr.  But I'll
> address that separately so that this patch is backportable.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master? and GCC 15, 14, 13?
> 
> Thanks,
> Tamar
> 
> 
> gcc/ChangeLog:
> 
>   PR target/120718
>   * expmed.cc (store_bit_field_1): Only push subreg over uniform vector
>   constructors.
>   (foldable_value_with_subreg): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/120718
>   * gcc.target/aarch64/sve/pr120718.c: New test.
> 
> ---
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index 
> be427dca5d9afeed2013954472dde3a5430169e0..a468aa5c0c3f20bd62a7afc1d245d64e87be5396
>  100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -740,6 +740,28 @@ store_bit_field_using_insv (const extraction_insn *insv, 
> rtx op0,
>return false;
>  }
>  
> +/* For non-constant vectors wrapping a subreg around the RTX will not make
> +   the expression expand properly through vec_init.  For constant vectors
> +   we can because simplification can just extract the element out by
> +   by merging the values.  This can be done by simplify-rtx and so the
> +   subreg will be eliminated.  However poly constants require vec_init as
> +   they are a runtime value.  So only allow the subreg for simple integer
> +   or floating point constants.  */
> +
> +static bool
> +foldable_value_with_subreg (rtx value)
> +{
> +  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
> +return true;
> +
> +  for (unsigned i = 0; i < const_vector_encoded_nelts (value); i++)
> +if (!CONST_INT_P (const_vector_elt (value, i))
> + && !CONST_DOUBLE_P (const_vector_elt (value, i)))
> +  return false;
> +
> +  return true;
> +}
> +
>  /* A subroutine of store_bit_field, with the same arguments.  Return true
> if the operation could be implemented.
>  
> @@ -795,7 +817,8 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>/* If the target is a register, overwriting the entire object, or storing
>   a full-word or multi-word field can be done with just a SUBREG.  */
>if (!MEM_P (op0)
> -  && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode)))
> +  && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode))
> +  && foldable_value_with_subreg (value))
>  {
>/* Use the subreg machinery either to narrow OP0 to the required
>words or to cope with mode punning between equal-sized modes.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c
> new file mode 100644
> index 
> ..cb21d94792f0679a48cc20c3dcdf78c89c05a5c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +typedef int __attribute__((vector_size(8))) v2si;
> +typedef struct { int x; int y; } A;
> +void bar(A a);
> +void foo()
> +{
> +A a;
> +*(v2si *)&a = (v2si){0, (int)svcntd_pat(SV_ALL)};
> +bar(a);
> +}
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, June 24, 2025 9:58 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: Re: [PATCH]middle-end: Fix store_bit_field expansions of vector
> constructors [PR120718]
> 
> On Tue, 24 Jun 2025, Tamar Christina wrote:
> 
> > store_bit_field_1 has an optimization where if a target is not a memory 
> > operand
> > and the entire value is being set from something larger we can just wrap a
> > subreg around the source and emit a move.
> >
> > For vector constructors this is however problematic because the subreg means
> > that the expansion of the constructor won't happen through vec_init anymore.
> 
> But the expansion of the constructor happened already?
> 

No, the RTL here is a CONST_VECTOR with one const_int and one const_poly_int.
If expansion had happened they would have been converted into a vec_merge 
because
It's not an expression the target can handle because it's not actually a 
constant.

So expansion definitely did not happen.

> > Complicated constructors which aren't natively supported by targets then 
> > ICE as
> > they wouldn't have been expanded so recog fails.
> >
> > This patch blocks the optimization on non-constant vector constructors. Or 
> > non-
> uniform
> > non-constant vectors.
> 
> +static bool
> +foldable_value_with_subreg (rtx value)
> +{
> +  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
> +return true;
> 
> that seems to allow any non-CONST_VECTOR, thus doesn't block
> non-constant vector constructors?
> 

Non-constant vectors don't reach here normally.  The problem is that
a const_poly_int is not a compile time constant.  But it's allowed inside
a CONST_VECTOR.

> It sounds like the problem happens upthread of store_bit_field_1?
> 

To answer that the question is whether a const_poly_int should be allowed
inside of a CONST_VECTOR.

Tamar

>  I allowed constant vectors because if I read the code right
> > simplify-rtx should be able to perform the simplification of pulling out the
> element
> > or merging the constant values.  There are several testcases in aarch64-sve-
> pcs.exp
> > that test this as well. I allowed uniform non-constant vectors because they
> > would be folded into a vec_select later on.
> >
> > Note that codegen is quite horrible, for what should only be an lsr.  But 
> > I'll
> > address that separately so that this patch is backportable.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master? and GCC 15, 14, 13?
> >
> > Thanks,
> > Tamar
> >
> >
> > gcc/ChangeLog:
> >
> > PR target/120718
> > * expmed.cc (store_bit_field_1): Only push subreg over uniform vector
> > constructors.
> > (foldable_value_with_subreg): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/120718
> > * gcc.target/aarch64/sve/pr120718.c: New test.
> >
> > ---
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index
> be427dca5d9afeed2013954472dde3a5430169e0..a468aa5c0c3f20bd62a7afc1d
> 245d64e87be5396 100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -740,6 +740,28 @@ store_bit_field_using_insv (const extraction_insn
> *insv, rtx op0,
> >return false;
> >  }
> >
> > +/* For non-constant vectors wrapping a subreg around the RTX will not make
> > +   the expression expand properly through vec_init.  For constant vectors
> > +   we can because simplification can just extract the element out by
> > +   by merging the values.  This can be done by simplify-rtx and so the
> > +   subreg will be eliminated.  However poly constants require vec_init as
> > +   they are a runtime value.  So only allow the subreg for simple integer
> > +   or floating point constants.  */
> > +
> > +static bool
> > +foldable_value_with_subreg (rtx value)
> > +{
> > +  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
> > +return true;
> > +
> > +  for (unsigned i = 0; i < const_vector_encoded_nelts (value); i++)
> > +if (!CONST_INT_P (const_vector_elt (value, i))
> > +   && !CONST_DOUBLE_P (const_vector_elt (value, i)))
> > +  return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* A subroutine of store_bit_field, with the same arguments.  Return true
> > if the operation could be implemented.
> >
> > @@ -795,7 +817,8 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize,
> poly_uint64 bitnum,
> >/* If the target is a register, overwriting the entire object, or storing
> >   a full-word or multi-word field can be done with just a SUBREG.  */
> >if (!MEM_P (op0)
> > -  && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode)))
> > +  && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode))
> > +  && foldable_value_with_subreg (value))
> >  {
> >/* Use the subreg machinery either to narrow OP0 to the required
> >  words or to cope with mode punning between equal-siz

Re: [PATCH] RISC-V: Add Profiles RVA/B23S64 support.

2025-06-24 Thread Jeff Law





On 6/24/25 3:38 AM, Jiawei wrote:

This patch adds support for the RISC-V Profiles RVA23S64 and RVB23S64.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New Profiles.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-rva23s.c: New test.
* gcc.target/riscv/arch-rvb23s.c: New test.

This is fine once the testsuite issue noticed by Dongsheng is fixed.

jeff

Re: [PATCH] RISC-V: Add Profiles RVA/B23S64 support.

2025-06-24 Thread Jiawei




--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-rvb23s.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rva23s64 -mabi=lp64d" } */


rva23s64 -> rvb23s64

Thank you for pointing this out.




On 6/24/25 3:38 AM, Jiawei wrote:

This patch adds support for the RISC-V Profiles RVA23S64 and RVB23S64.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New Profiles.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-rva23s.c: New test.
* gcc.target/riscv/arch-rvb23s.c: New test.

This is fine once the testsuite issue noticed by Dongsheng is fixed.

jeff


Committed to trunk with fixed.


BR,
Jiawei

Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-24 Thread Hongtao Liu

On Fri, May 23, 2025 at 1:56 PM H.J. Lu  wrote:
>
> Add preserve_none attribute which is similar to no_callee_saved_registers
> attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
> used for integer parameter passing.  This can be used in an interpreter
> to avoid saving/restoring the registers in functions which processing
> byte codes.  It improved the pystones benchmark by 6-7%:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15
>
> Remove -mgeneral-regs-only restriction on no_caller_saved_registers
> attribute.  Only SSE is allowed since SSE XMM register load preserves
> the upper bits in YMM/ZMM register while YMM register load zeros the
> upper 256 bits of ZMM register, and preserving 32 ZMM registers can
> be quite expensive.
>
> gcc/
>
> PR target/119628
> * config/i386/i386-expand.cc (ix86_expand_call): Call
> ix86_type_no_callee_saved_registers_p instead of looking up
> no_callee_saved_registers attribute.
> * config/i386/i386-options.cc (ix86_set_func_type): Look up
> preserve_none attribute.  Check preserve_none attribute for
> interrupt attribute.  Don't check no_caller_saved_registers nor
> no_callee_saved_registers conflicts here.
> (ix86_set_func_type): Check no_callee_saved_registers before
> checking no_caller_saved_registers attribute.
> (ix86_set_current_function): Allow SSE with
> no_caller_saved_registers attribute.
> (ix86_handle_call_saved_registers_attribute): Check preserve_none,
> no_callee_saved_registers and no_caller_saved_registers conflicts.
> (ix86_gnu_attributes): Add preserve_none attribute.
> * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p):
> New.
> * config/i386/i386.cc
> (x86_64_preserve_none_int_parameter_registers): New.
> (ix86_using_red_zone): Don't use red-zone when there are no
> caller-saved registers with SSE.
> (ix86_type_no_callee_saved_registers_p): New.
> (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE
> and call ix86_type_no_callee_saved_registers_p instead of looking
> up no_callee_saved_registers attribute.
> (ix86_comp_type_attributes): Call
> ix86_type_no_callee_saved_registers_p instead of looking up
> no_callee_saved_registers attribute.  Return 0 if preserve_none
> attribute doesn't match in 64-bit mode.
> (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE,
> use x86_64_preserve_none_int_parameter_registers.
> (init_cumulative_args): Set preserve_none_abi.
> (function_arg_64): Use x86_64_preserve_none_int_parameter_registers
> with preserve_none attribute.
> (setup_incoming_varargs_64): Use
> x86_64_preserve_none_int_parameter_registers with preserve_none
> attribute.
> (ix86_save_reg): Treat TYPE_PRESERVE_NONE like
> TYPE_NO_CALLEE_SAVED_REGISTERS.
> (ix86_nsaved_sseregs): Allow saving XMM registers for
> no_caller_saved_registers attribute.
> (ix86_compute_frame_layout): Likewise.
> (x86_this_parameter): Use
> x86_64_preserve_none_int_parameter_registers with preserve_none
> attribute.
> * config/i386/i386.h (ix86_args): Add preserve_none_abi.
> (call_saved_registers_type): Add TYPE_PRESERVE_NONE.
> (machine_function): Change call_saved_registers to 3 bits.
> * doc/extend.texi: Add preserve_none attribute.  Update
> no_caller_saved_registers attribute to remove -mgeneral-regs-only
> restriction.
>
> gcc/testsuite/
>
> PR target/119628
> * gcc.target/i386/no-callee-saved-3.c: Adjust error location.
> * gcc.target/i386/no-callee-saved-19a.c: New test.
> * gcc.target/i386/no-callee-saved-19b.c: Likewise.
> * gcc.target/i386/no-callee-saved-19c.c: Likewise.
> * gcc.target/i386/no-callee-saved-19d.c: Likewise.
> * gcc.target/i386/no-callee-saved-19e.c: Likewise.
> * gcc.target/i386/preserve-none-1.c: Likewise.
> * gcc.target/i386/preserve-none-2.c: Likewise.
> * gcc.target/i386/preserve-none-3.c: Likewise.
> * gcc.target/i386/preserve-none-4.c: Likewise.
> * gcc.target/i386/preserve-none-5.c: Likewise.
> * gcc.target/i386/preserve-none-6.c: Likewise.
> * gcc.target/i386/preserve-none-7.c: Likewise.
> * gcc.target/i386/preserve-none-8.c: Likewise.
> * gcc.target/i386/preserve-none-9.c: Likewise.
> * gcc.target/i386/preserve-none-10.c: Likewise.
> * gcc.target/i386/preserve-none-11.c: Likewise.
> * gcc.target/i386/preserve-none-12.c: Likewise.
> * gcc.target/i386/preserve-none-13.c: Likewise.
> * gcc.target/i386/preserve-none-14.c: Likewise.
> * gcc.target/i386/preserve-none-15.c: Likewise.
> * gcc

[PATCH 2/6] Remove non-SLP path from vectorizable_load

2025-06-24 Thread Richard Biener

This cleans the rest of vectorizable_load from non-SLP

* tree-vect-stmts.cc (vectorizable_load): Step 2.
---
 gcc/tree-vect-stmts.cc | 185 +++--
 1 file changed, 50 insertions(+), 135 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 92739903754..c5fe7879d5a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9850,7 +9850,6 @@ vectorizable_load (vec_info *vinfo,
   bool compute_in_loop = false;
   class loop *at_loop;
   int vec_num;
-  bool slp = true;
   bool slp_perm = false;
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   poly_uint64 vf;
@@ -9909,7 +9908,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   mask_index = internal_fn_mask_index (ifn);
-  if (mask_index >= 0 && 1)
+  if (mask_index >= 0)
mask_index = vect_slp_child_index_for_operand
(call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (mask_index >= 0
@@ -9918,7 +9917,7 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   els_index = internal_fn_else_index (ifn);
-  if (els_index >= 0 && 1)
+  if (els_index >= 0)
els_index = vect_slp_child_index_for_operand
  (call, els_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
   if (els_index >= 0
@@ -9942,16 +9941,13 @@ vectorizable_load (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (1)
-ncopies = 1;
-  else
-ncopies = vect_get_num_copies (loop_vinfo, vectype);
+  ncopies = 1;
 
   gcc_assert (ncopies >= 1);
 
   /* FORNOW. This restriction should be relaxed.  */
   if (nested_in_vect_loop
-  && (ncopies > 1 || (1 && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
+  && (ncopies > 1 || SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1))
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9997,15 +9993,6 @@ vectorizable_load (vec_info *vinfo,
   first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
   group_size = DR_GROUP_SIZE (first_stmt_info);
 
-  /* Refuse non-SLP vectorization of SLP-only groups.  */
-  if (0 && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"cannot vectorize load in non-SLP mode.\n");
- return false;
-   }
-
   /* Invalidate assumptions made by dependence analysis when vectorization
 on the unrolled body effectively re-orders stmts.  */
   if (STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
@@ -10046,8 +10033,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* ???  The following checks should really be part of
  get_group_load_store_type.  */
-  if (1
-  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
+  if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
   && !((memory_access_type == VMAT_ELEMENTWISE
|| memory_access_type == VMAT_GATHER_SCATTER)
   && SLP_TREE_LANES (slp_node) == 1))
@@ -10090,8 +10076,7 @@ vectorizable_load (vec_info *vinfo,
}
 }
 
-  if (1
-  && slp_node->ldst_lanes
+  if (slp_node->ldst_lanes
   && memory_access_type != VMAT_LOAD_STORE_LANES)
 {
   if (dump_enabled_p ())
@@ -10142,8 +10127,7 @@ vectorizable_load (vec_info *vinfo,
 
   if (costing_p) /* transformation not required.  */
 {
-  if (1
- && mask
+  if (mask
  && !vect_maybe_update_slp_op_vectype (slp_op,
mask_vectype))
{
@@ -10153,10 +10137,7 @@ vectorizable_load (vec_info *vinfo,
  return false;
}
 
-  if (0)
-   STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
-  else
-   SLP_TREE_MEMORY_ACCESS_TYPE (slp_node) = memory_access_type;
+  SLP_TREE_MEMORY_ACCESS_TYPE (slp_node) = memory_access_type;
 
   if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
@@ -10210,12 +10191,7 @@ vectorizable_load (vec_info *vinfo,
   if (elsvals.length ())
 maskload_elsval = *elsvals.begin ();
 
-  if (0)
-gcc_assert (memory_access_type
-   == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
-  else
-gcc_assert (memory_access_type
-   == SLP_TREE_MEMORY_ACCESS_TYPE (slp_node));
+  gcc_assert (memory_access_type == SLP_TREE_MEMORY_ACCESS_TYPE (slp_node));
 
   if (dump_enabled_p () && !costing_p)
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -10289,15 +10265,8 @@ vectorizable_load (vec_info *vinfo,
   vectype, &gsi2);
}
   gimple *new_stmt = SSA_NAME_DEF_STMT (new_temp);
-  if (1)
-   for (j = 0; j < (int) SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); ++j)
- slp_node->push_vec_def (new_stmt);
-  else

[PATCH v1 1/2] AArch64: precommit test for masked load vectorisation.

2025-06-24 Thread Karl Meakin

Commit the test file `mask_load_2.c` before the vectorisation analysis
is changed, so that the changes in codegen are more obvious in the next
commit.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/mask_load_2.c: New test.
---
 .../gcc.target/aarch64/sve/mask_load_2.c  | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
new file mode 100644
index 000..38fcf4f7206
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-options "-march=armv8-a+sve -msve-vector-bits=128 -O3" }
+
+typedef struct Array {
+int elems[3];
+} Array;
+
+int loop(Array **pp, int len, int idx) {
+int nRet = 0;
+
+#pragma GCC unroll 0
+for (int i = 0; i < len; i++) {
+Array *p = pp[i];
+if (p) {
+nRet += p->elems[idx];
+}
+}
+
+return nRet;
+}
+
+// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } }
+// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  0 } }
-- 
2.45.2

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-24 Thread Alexander Monakov

> > On Fri, May 23, 2025 at 2:31 PM Alexander Monakov  
> > wrote:
> > >
> > > In PR 105965 we accepted a request to form FMA instructions when the
> > > source code is using a narrow generic vector that contains just one
> > > element, corresponding to V1SF or V1DF mode, while the backend does not
> > > expand fma patterns for such modes.
> > >
> > > For this to work under -ffp-contract=on, we either need to modify
> > > backends, or emulate such degenerate-vector FMA via scalar FMA in
> > > tree-vect-generic.  Do the latter.
> > 
> > Can you instead apply the lowering during gimplification?  That is because
> > having an unsupported internal-function in the IL the user could not have
> > emitted directly is somewhat bad.  I thought the vector lowering could
> > be generalized for more single-argument internal functions but then no
> > such unsupported calls should exist in the first place.
> 
> Sure, like below?  Not fully tested yet.

Ping — now bootstrapped and regtested.

> -- 8< --
> 
> From 4caee92434d9425912979b285725166b22f40a87 Mon Sep 17 00:00:00 2001
> From: Alexander Monakov 
> Date: Wed, 21 May 2025 18:35:45 +0300
> Subject: [PATCH v2] allow contraction to synthetic single-element vector FMA
> 
> In PR 105965 we accepted a request to form FMA instructions when the
> source code is using a narrow generic vector that contains just one
> element, corresponding to V1SF or V1DF mode, while the backend does not
> expand fma patterns for such modes.
> 
> For this to work under -ffp-contract=on, we either need to modify
> backends, or emulate such degenerate-vector FMA via scalar FMA.
> Do the latter, in gimplification hook together with contraction.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-gimplify.cc (fma_supported_p): Allow forming single-element
>   vector FMA when scalar FMA is available.
>   (c_gimplify_expr): Allow vector types.
> ---
>  gcc/c-family/c-gimplify.cc | 50 ++
>  1 file changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> index c6fb764656..6c313287e6 100644
> --- a/gcc/c-family/c-gimplify.cc
> +++ b/gcc/c-family/c-gimplify.cc
> @@ -870,12 +870,28 @@ c_build_bind_expr (location_t loc, tree block, tree 
> body)
>return bind;
>  }
>  
> +enum fma_expansion
> +{
> +  FMA_NONE,
> +  FMA_DIRECT,
> +  FMA_VEC1_SYNTHETIC
> +};
> +
>  /* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
>  
> -static bool
> +static fma_expansion
>  fma_supported_p (enum internal_fn fn, tree type)
>  {
> -  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> +  if (direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH))
> +return FMA_DIRECT;
> +  /* Accept single-element vector FMA (see PR 105965) when the
> + backend handles the scalar but not the vector mode.  */
> +  if (VECTOR_TYPE_P (type)
> +  && known_eq (TYPE_VECTOR_SUBPARTS (type),  1U)
> +  && direct_internal_fn_supported_p (fn, TREE_TYPE (type),
> +  OPTIMIZE_FOR_BOTH))
> +return FMA_VEC1_SYNTHETIC;
> +  return FMA_NONE;
>  }
>  
>  /* Gimplification of expression trees.  */
> @@ -936,13 +952,14 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> ATTRIBUTE_UNUSED,
>  case MINUS_EXPR:
>{
>   tree type = TREE_TYPE (*expr_p);
> + enum fma_expansion how;
>   /* For -ffp-contract=on we need to attempt FMA contraction only
>  during initial gimplification.  Late contraction across statement
>  boundaries would violate language semantics.  */
> - if (SCALAR_FLOAT_TYPE_P (type)
> + if ((SCALAR_FLOAT_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type))
>   && flag_fp_contract_mode == FP_CONTRACT_ON
>   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> - && fma_supported_p (IFN_FMA, type))
> + && (how = fma_supported_p (IFN_FMA, type)) != FMA_NONE)
> {
>   bool neg_mul = false, neg_add = code == MINUS_EXPR;
>  
> @@ -973,7 +990,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> ATTRIBUTE_UNUSED,
>   enum internal_fn ifn = IFN_FMA;
>   if (neg_mul)
> {
> - if (fma_supported_p (IFN_FNMA, type))
> + if ((how = fma_supported_p (IFN_FNMA, type)) != FMA_NONE)
> ifn = IFN_FNMA;
>   else
> ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> @@ -981,21 +998,34 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> ATTRIBUTE_UNUSED,
>   if (neg_add)
> {
>   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS;
> - if (fma_supported_p (ifn2, type))
> + if ((how = fma_supported_p (ifn2, type)) != FMA_NONE)
> ifn = ifn2;
>   else
> ops[2] = build1 (NEGATE_EXPR, type, ops[2]);
> }
>   /* Avoid gimplify_arg: it emits all side effects into *PRE_

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> >
>> >> Tamar Christina  writes:
>> >> > store_bit_field_1 has an optimization where if a target is not a memory 
>> >> > operand
>> >> > and the entire value is being set from something larger we can just 
>> >> > wrap a
>> >> > subreg around the source and emit a move.
>> >> >
>> >> > For vector constructors this is however problematic because the subreg 
>> >> > means
>> >> > that the expansion of the constructor won't happen through vec_init 
>> >> > anymore.
>> >> >
>> >> > Complicated constructors which aren't natively supported by targets 
>> >> > then ICE as
>> >> > they wouldn't have been expanded so recog fails.
>> >> >
>> >> > This patch blocks the optimization on non-constant vector constructors. 
>> >> > Or non-uniform
>> >> > non-constant vectors. I allowed constant vectors because if I read the 
>> >> > code right
>> >> > simplify-rtx should be able to perform the simplification of pulling 
>> >> > out the element
>> >> > or merging the constant values.  There are several testcases in 
>> >> > aarch64-sve-pcs.exp
>> >> > that test this as well. I allowed uniform non-constant vectors because 
>> >> > they
>> >> > would be folded into a vec_select later on.
>> >> >
>> >> > Note that codegen is quite horrible, for what should only be an lsr.  
>> >> > But I'll
>> >> > address that separately so that this patch is backportable.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
>> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>> >> > -m32, -m64 and no issues.
>> >> >
>> >> > Ok for master? and GCC 15, 14, 13?
>> >> 
>> >> I was discussing this Alex off-list last week, and the fix we talked
>> >> about there was:
>> >> 
>> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
>> >> index 7799a98053b..8b138f54f75 100644
>> >> --- a/gcc/explow.cc
>> >> +++ b/gcc/explow.cc
>> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
>> >> machine_mode innermode, poly_uint64 byte)
>> >>  {
>> >>rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
>> >> -  if (x)
>> >> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x
>> >>  return x;
>> >>  
>> >>auto *start = get_last_insn ();
>> >> 
>> >> The justification is that force_subreg is somewhat like a "subreg
>> >> version of force_operand", and so should try to avoid returning
>> >> subregs that force_operand would have replaced.  The force_operand
>> >> code I mean is:
>> >
>> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
>> > subreg operands, only registers (and memory) are.  But isn't this
>> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
>> > of a non-REG/MEM?
>> 
>> I don't think the documentation is correct/up-to-date.  subreg is
>> de facto used as a general operation, and for example there are
>> patterns like:
>> 
>> (define_insn ""
>>   [(set (match_operand:QI 0 "general_operand_dst" 
>> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
>> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
>> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
>> (const_int 16)) 3))
>>(clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
>>(clobber (reg:CC CC_REG))]
>>   ""
>>   "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
>>   [(set_attr "length" "10")])
>
> I see.  Is the subreg for such define_insn generated by the middle-end
> though?

I assume it was written to match something that combine could generate.
Whether it still does in another question.

>> (from h8300).  This is also why simplify_gen_subreg has:
>> 
>>   if (GET_CODE (op) == SUBREG
>>   || GET_CODE (op) == CONCAT
>>   || GET_MODE (op) == VOIDmode)
>> return NULL_RTX;
>> 
>>   if (MODE_COMPOSITE_P (outermode)
>>   && (CONST_SCALAR_INT_P (op)
>>|| CONST_DOUBLE_AS_FLOAT_P (op)
>>|| CONST_FIXED_P (op)
>>|| GET_CODE (op) == CONST_VECTOR))
>> return NULL_RTX;
>> 
>> rather than the !REG_P (op) && !MEM_P (op) that the documentation
>> would imply.
>
> So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
> we don't seem to ever legitmize constants wrapped in a SUBREG, so
> we shouldn't generate a SUBREG of a constant (in the middle-end)?

Hmm, yeah, maybe.  I'd originally rejected that because I assumed
the MODE_COMPOSITE_P was there for a reason.  But looking at the
history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f,
where the reason given was:

As for the simplify_gen_subreg change, I think it would be desirable
to just avoid creating SUBREGs of constants on all targets and for all
constants, if simplify_immed_subreg simplified, fine, otherwise punt,
but as we are late in GCC11 development, the patch instead guards this
behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
powerpc{,64,64le} double double long double - a

Re: [PATCH 13/18] s390: arch15: Vector devide/remainder

2025-06-24 Thread Stefan Schulze Frielinghaus

On Mon, Jan 20, 2025 at 02:59:44PM +0100, Stefan Schulze Frielinghaus wrote:
> On Mon, Jan 20, 2025 at 02:46:40PM +0100, Richard Biener wrote:
> > On Mon, Jan 20, 2025 at 11:04 AM Stefan Schulze Frielinghaus
> >  wrote:
> > >
> > > gcc/ChangeLog:
> > 
> > Interesting - I can't find anything about these in the PoP (though I can 
> > only
> > find the one from May 2022, 14th edition covering up to z16(?)).  Is there
> > a newer document?
> 
> This instruction becomes available in a newer machine generation which
> wasn't released so far.  Therefore, there is also no updated PoP or
> other document which describes it.
> 
> I will try to keep this in mind and send you a link once a new document
> is released.

As promised here is a link to the 15th edition covering up to z17
including vector divide:

https://www.ibm.com/docs/en/module_1678991624569/pdf/SA22-7832-14.pdf

Cheers,
Stefan

> 
> Cheers,
> Stefan
> 
> > 
> > Richard.
> > 
> > > * config/s390/vector.md (div3): Add.
> > > (udiv3): Add.
> > > (mod3): Add.
> > > (umod3): Add.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/s390/vxe3/vd-1.c: New test.
> > > * gcc.target/s390/vxe3/vd-2.c: New test.
> > > * gcc.target/s390/vxe3/vdl-1.c: New test.
> > > * gcc.target/s390/vxe3/vdl-2.c: New test.
> > > * gcc.target/s390/vxe3/vr-1.c: New test.
> > > * gcc.target/s390/vxe3/vr-2.c: New test.
> > > * gcc.target/s390/vxe3/vrl-1.c: New test.
> > > * gcc.target/s390/vxe3/vrl-2.c: New test.
> > > ---
> > >  gcc/config/s390/vector.md  | 36 ++
> > >  gcc/testsuite/gcc.target/s390/vxe3/vd-1.c  | 27 
> > >  gcc/testsuite/gcc.target/s390/vxe3/vd-2.c  | 21 +
> > >  gcc/testsuite/gcc.target/s390/vxe3/vdl-1.c | 27 
> > >  gcc/testsuite/gcc.target/s390/vxe3/vdl-2.c | 21 +
> > >  gcc/testsuite/gcc.target/s390/vxe3/vr-1.c  | 27 
> > >  gcc/testsuite/gcc.target/s390/vxe3/vr-2.c  | 21 +
> > >  gcc/testsuite/gcc.target/s390/vxe3/vrl-1.c | 27 
> > >  gcc/testsuite/gcc.target/s390/vxe3/vrl-2.c | 21 +
> > >  9 files changed, 228 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vd-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vd-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vdl-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vdl-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vr-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vr-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vrl-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/vxe3/vrl-2.c
> > >
> > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > > index 2e7419c45c3..606c6826860 100644
> > > --- a/gcc/config/s390/vector.md
> > > +++ b/gcc/config/s390/vector.md
> > > @@ -1145,6 +1145,42 @@
> > >"vml\t%v0,%v1,%v2"
> > >[(set_attr "op_type" "VRR")])
> > >
> > > +; vdf, vdg, vdq
> > > +(define_insn "div3"
> > > +  [(set (match_operand:VI_HW_SDT0 "register_operand" 
> > > "=v")
> > > +   (div:VI_HW_SDT (match_operand:VI_HW_SDT 1 "register_operand"  "v")
> > > +  (match_operand:VI_HW_SDT 2 "register_operand"  
> > > "v")))]
> > > +  "TARGET_VXE3"
> > > +  "vd\t%v0,%v1,%v2,0"
> > > +  [(set_attr "op_type" "VRR")])
> > > +
> > > +; vdlf, vdlg, vdlq
> > > +(define_insn "udiv3"
> > > +  [(set (match_operand:VI_HW_SDT 0 "register_operand" 
> > > "=v")
> > > +   (udiv:VI_HW_SDT (match_operand:VI_HW_SDT 1 "register_operand"  
> > > "v")
> > > +   (match_operand:VI_HW_SDT 2 "register_operand"  
> > > "v")))]
> > > +  "TARGET_VXE3"
> > > +  "vdl\t%v0,%v1,%v2,0"
> > > +  [(set_attr "op_type" "VRR")])
> > > +
> > > +; vrf, vrg, vrq
> > > +(define_insn "mod3"
> > > +  [(set (match_operand:VI_HW_SDT0 "register_operand" 
> > > "=v")
> > > +   (mod:VI_HW_SDT (match_operand:VI_HW_SDT 1 "register_operand"  "v")
> > > +  (match_operand:VI_HW_SDT 2 "register_operand"  
> > > "v")))]
> > > +  "TARGET_VXE3"
> > > +  "vr\t%v0,%v1,%v2,0"
> > > +  [(set_attr "op_type" "VRR")])
> > > +
> > > +; vrlf, vrlg, vrlq
> > > +(define_insn "umod3"
> > > +  [(set (match_operand:VI_HW_SDT 0 "register_operand" 
> > > "=v")
> > > +   (umod:VI_HW_SDT (match_operand:VI_HW_SDT 1 "register_operand"  
> > > "v")
> > > +   (match_operand:VI_HW_SDT 2 "register_operand"  
> > > "v")))]
> > > +  "TARGET_VXE3"
> > > +  "vrl\t%v0,%v1,%v2,0"
> > > +  [(set_attr "op_type" "VRR")])
> > > +
> > >  ; vlcb, vlch, vlcf, vlcg
> > >  (define_insn "neg2"
> > >[(set (match_operand:VI 0 "register_operand" "=v")
> > > diff --git a/gcc/testsuite/gcc.target/s390/vxe3/vd-1.c 
> > > b/gcc/testsuite/gcc.target/

RE: [committed] i386: Convert LEA stack adjust insn to SUB when FLAGS_REG is dead

2025-06-24 Thread Cui, Lili

> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, June 25, 2025 12:53 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Cui, Lili 
> Subject: [committed] i386: Convert LEA stack adjust insn to SUB when
> FLAGS_REG is dead
> 
> ADD/SUB is faster than LEA for most processors. Also, there are several
> peephole2 patterns available that convert prologue esp subtractions to pushes
> (at the end of i386.md). These process only patterns with flags reg clobber, 
> so
> they are ineffective with clobber-less stack ptr adjustments, introduced by 
> r16-
> 1551
> ("x86: Enable separate shrink wrapping").
> 
> Introduce a peephole2 pattern that adds a clobber to a clobber-less stack ptr
> adjustments when FLAGS_REG is dead.
> 
> gcc/ChangeLog:
> 
> * config/i386/i386.md
> (@pro_epilogue_adjust_stack_add_nocc): Add type attribute.
> (pro_epilogue_adjust_stack_add_nocc peephole2 pattern):
> Convert pro_epilogue_adjust_stack_add_nocc variant to
> pro_epilogue_adjust_stack_add when FLAGS_REG is dead.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 

Great, thanks!

Lili.

> Uros.

Re: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread H.J. Lu

On Wed, Jun 25, 2025 at 1:11 PM Hongtao Liu  wrote:
>
> On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu  wrote:
> >
> > -mtune=intel is used to generate a single binary to run well on both big
> > core and small core, similar to hybrid CPUs.  Update -mtune=intel to tune
> > for Diamond Rapids and Clearwater Forest, instead of Silvermont.
> >
> > PR target/120815
> > * common/config/i386/i386-common.cc (processor_alias_table):
> > Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
> > PROCESSOR_INTEL.
> > * config/i386/i386-options.cc (processor_cost_table): Replace
> > intel_cost with alderlake_cost.
> > * config/i386/x86-tune-costs.h (intel_cost): Removed.
> > * config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
> > PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
> > (ix86_adjust_cost): Likewise.
> > * doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
> > Clearwater Forest.
> >
> > OK for master?
> Ok.

Should it be backported to release branches?


-- 
H.J.

[PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread H.J. Lu

-mtune=intel is used to generate a single binary to run well on both big
core and small core, similar to hybrid CPUs.  Update -mtune=intel to tune
for Diamond Rapids and Clearwater Forest, instead of Silvermont.

PR target/120815
* common/config/i386/i386-common.cc (processor_alias_table):
Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
PROCESSOR_INTEL.
* config/i386/i386-options.cc (processor_cost_table): Replace
intel_cost with alderlake_cost.
* config/i386/x86-tune-costs.h (intel_cost): Removed.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
(ix86_adjust_cost): Likewise.
* doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
Clearwater Forest.

OK for master?

Thanks.

-- 
H.J.
From 385db3cf10ecbbec9d128a389c9c22b7a853d914 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 25 Jun 2025 07:40:31 +0800
Subject: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

-mtune=intel is used to generate a single binary to run well on both big
core and small core, similar to hybrid CPUs.  Update -mtune=intel to tune
for Diamond Rapids and Clearwater Forest, instead of Silvermont.

	PR target/120815
	* common/config/i386/i386-common.cc (processor_alias_table):
	Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
	PROCESSOR_INTEL.
	* config/i386/i386-options.cc (processor_cost_table): Replace
	intel_cost with alderlake_cost.
	* config/i386/x86-tune-costs.h (intel_cost): Removed.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
	PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
	(ix86_adjust_cost): Likewise.
	* doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
	Clearwater Forest.

Signed-off-by: H.J. Lu 
---
 gcc/common/config/i386/i386-common.cc |   2 +-
 gcc/config/i386/i386-options.cc   |   2 +-
 gcc/config/i386/x86-tune-costs.h  | 121 --
 gcc/config/i386/x86-tune-sched.cc |   4 +-
 gcc/doc/invoke.texi   |   4 +-
 5 files changed, 6 insertions(+), 127 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 64908ce740a..dfcd4e9a727 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2310,7 +2310,7 @@ const pta processor_alias_table[] =
 M_CPU_TYPE (INTEL_GRANDRIDGE), P_PROC_AVX2},
   {"clearwaterforest", PROCESSOR_CLEARWATERFOREST, CPU_HASWELL,
 PTA_CLEARWATERFOREST, M_CPU_TYPE (INTEL_CLEARWATERFOREST), P_PROC_AVX2},
-  {"intel", PROCESSOR_INTEL, CPU_SLM, PTA_NEHALEM,
+  {"intel", PROCESSOR_INTEL, CPU_HASWELL, PTA_HASWELL,
 M_VENDOR (VENDOR_INTEL), P_NONE},
   {"geode", PROCESSOR_GEODE, CPU_GEODE,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_PREFETCH_SSE, 0, P_NONE},
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index d1e321ad74b..27feeddaf81 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -797,7 +797,7 @@ static const struct processor_costs *processor_cost_table[] =
   &alderlake_cost,	/* PROCESSOR_ARROWLAKE_S.	*/
   &alderlake_cost,	/* PROCESSOR_PANTHERLAKE.	*/
   &icelake_cost,	/* PROCESSOR_DIAMONDRAPIDS.	*/
-  &intel_cost,		/* PROCESSOR_INTEL.		*/
+  &alderlake_cost,	/* PROCESSOR_INTEL.		*/
   &lujiazui_cost,	/* PROCESSOR_LUJIAZUI.		*/
   &yongfeng_cost,	/* PROCESSOR_YONGFENG.		*/
   &shijidadao_cost,	/* PROCESSOR_SHIJIDADAO.	*/
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index a5b99d1f962..c8603b982af 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -3568,127 +3568,6 @@ struct processor_costs tremont_cost = {
   COSTS_N_INSNS (2),			/* Branch mispredict scale.  */
 };
 
-static stringop_algs intel_memcpy[2] = {
-  {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}},
-  {libcall, {{32, loop, false}, {64, rep_prefix_4_byte, false},
- {8192, rep_prefix_8_byte, false}, {-1, libcall, false;
-static stringop_algs intel_memset[2] = {
-  {libcall, {{8, loop, false}, {15, unrolled_loop, false},
- {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
-  {libcall, {{24, loop, false}, {32, unrolled_loop, false},
- {8192, rep_prefix_8_byte, false}, {-1, libcall, false;
-static const
-struct processor_costs intel_cost = {
-  {
-  /* Start of register allocator costs.  integer->integer move cost is 2. */
-  6, /* cost for loading QImode using movzbl */
-  {4, 4, 4},/* cost of loading integer registers
-	   in QImode, HImode and SImode.
-	   Relative to reg-reg move (2).  */
-  {6, 6, 6},/* cost of storing integer registers */
-  2,	/* cost of reg,reg fld/fst */
-  {6, 6, 8},/* cost of loading fp registers
-	   in SFmode, DFmode and XFmode */
-  {6, 6, 10},/* cost of storing fp registers
-	   in SFmode, DFmode and XFmode */
-  2,	/* cost of moving MMX register */
-  {6, 6},/* cost of loading MMX regis

1 2 >

1 - 100 of 122 matches

Mail list logo