date:20250625

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-25 Thread Jakub Jelinek

On Tue, Jun 24, 2025 at 05:19:59PM -0400, Jason Merrill wrote:
> I think we could move the initialization of the fixed_type_p and
> virtual_access variables up, they don't need to be after cp_build_addr_expr.

I don't understand why it doesn't depend on cp_build_addr_expr.
I've tried the following patch and while it didn't regress anything on
make GXX_TESTSUITE_STDS=98,11,14,17,^C,23,26 check-g++
it regressed
FAIL: 23_containers/vector/bool/cmp_c++20.cc  -std=gnu++20 (test for excess 
errors)
FAIL: 23_containers/vector/bool/cmp_c++20.cc  -std=gnu++26 (test for excess 
errors)
In there code is PLUS_EXPR, !want_pointer, !has_empty, but uneval is true
and expr is
std::vector::begin (&c)
before cp_build_addr_expr and
&TARGET_EXPR ::begin (&c)>
after it.  resolves_to_fixed_type_p (expr) is 0 before cp_build_addr_expr
and 1 after it.  v_binfo is false though, so in that
particular case I think we don't actually care about fixed_type_p value,
but it doesn't raise confidence that testing resolves_to_fixed_type_p
early is ok.

--- gcc/cp/class.cc.jj  2025-06-18 17:24:03.973867379 +0200
+++ gcc/cp/class.cc 2025-06-25 08:01:06.824278658 +0200
@@ -347,9 +347,19 @@ build_base_path (enum tree_code code,
 || processing_template_decl
 || in_template_context);
 
+  int nonnull_copy = nonnull;
+  fixed_type_p = resolves_to_fixed_type_p (expr, &nonnull);
+
+  /* Do we need to look in the vtable for the real offset?  */
+  virtual_access = (v_binfo && fixed_type_p <= 0);
+
   /* For a non-pointer simple base reference, express it as a COMPONENT_REF
  without taking its address (and so causing lambda capture, 91933).  */
-  if (code == PLUS_EXPR && !v_binfo && !want_pointer && !has_empty && !uneval)
+  if (code == PLUS_EXPR
+  && !want_pointer
+  && !has_empty
+  && !uneval
+  && (!v_binfo || !virtual_access))
 return build_simple_base_path (expr, binfo);
 
   if (!want_pointer)
@@ -361,8 +371,10 @@ build_base_path (enum tree_code code,
   else
 expr = mark_rvalue_use (expr);
 
+  gcc_assert (resolves_to_fixed_type_p (expr, &nonnull_copy)
+ == fixed_type_p && nonnull_copy == nonnull);
+
   offset = BINFO_OFFSET (binfo);
-  fixed_type_p = resolves_to_fixed_type_p (expr, &nonnull);
   target_type = code == PLUS_EXPR ? BINFO_TYPE (binfo) : BINFO_TYPE (d_binfo);
   /* TARGET_TYPE has been extracted from BINFO, and, is therefore always
  cv-unqualified.  Extract the cv-qualifiers from EXPR so that the
@@ -371,9 +383,6 @@ build_base_path (enum tree_code code,
 (target_type, cp_type_quals (TREE_TYPE (TREE_TYPE (expr;
   ptr_target_type = build_pointer_type (target_type);
 
-  /* Do we need to look in the vtable for the real offset?  */
-  virtual_access = (v_binfo && fixed_type_p <= 0);
-
   /* Don't bother with the calculations inside sizeof; they'll ICE if the
  source type is incomplete and the pointer value doesn't matter.  In a
  template (even in instantiate_non_dependent_expr), we don't have vtables


> I think -1 doesn't distinguish between single or multiple virtual
> derivation, so handling -1 in that block might mean succeeding for a
> multiple derivation case where it ought to fail.

Ok, will keep it as is then.

> > So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has
> > CLASSTYPE_VBASECLASSES walk the
> > for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase))
> > if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase))
> > and in that case try to compare byte_position (TREE_OPERAND (path, 1))
> > against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type
> > check?) then decide based on BINFO_BASE_ACCESS or something like that
> > whether it was a private/protected vs. public virtual base?
> 
> It seems simpler to pass an accurate access to the build_base_field above.
> At least whether the whole BINFO_INHERITANCE_CHAIN is public or not, I
> suppose the distinction between private and protected doesn't matter.

I'm afraid I'm quite lost on what actually is public base class
that [expr.dynamic.cast] talks about in the case of virtual bases because
a virtual base can appear many times among the bases and if it is virtual
in all cases, there is just one copy of it and it can be public in some
paths and private/protected in others.  And where to find that information.

I've tried the following testcase and it seems that it succeeds unless
-DP1 -DP2
-DP1 -DP3
-DP1 -DP6
-DP2 -DP3 -DP6
-DP4 -DP5 -DP6
-DP2 -DP3 -DP4 -DP5
is a subset of the -DPN options or in case of clang++ also
-DP2 -DP4 -DP5 (for that g++ passes, clang++ fails).
E.g. what is the difference between -DP1 which works and
S is private in one case and public in 2 others, while -DP1 -DP2
doesn't work and is private in two cases and public in one.

#ifdef P1
#undef P1
#define P1 private
#else
#define P1
#endif
#ifdef P2
#undef P2
#define P2 private
#else
#define P2
#endif
#ifdef P3
#undef P3
#define P3 private
#else
#define P3
#endif

Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-25 Thread H.J. Lu

On Wed, Jun 25, 2025 at 2:14 PM Hongtao Liu  wrote:
>
> On Fri, May 23, 2025 at 1:56 PM H.J. Lu  wrote:
> >
> > Add preserve_none attribute which is similar to no_callee_saved_registers
> > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
> > used for integer parameter passing.  This can be used in an interpreter
> > to avoid saving/restoring the registers in functions which processing
> > byte codes.  It improved the pystones benchmark by 6-7%:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15
> >
> > Remove -mgeneral-regs-only restriction on no_caller_saved_registers
> > attribute.  Only SSE is allowed since SSE XMM register load preserves
> > the upper bits in YMM/ZMM register while YMM register load zeros the
> > upper 256 bits of ZMM register, and preserving 32 ZMM registers can
> > be quite expensive.
> >
> > gcc/
> >
> > PR target/119628
> > * config/i386/i386-expand.cc (ix86_expand_call): Call
> > ix86_type_no_callee_saved_registers_p instead of looking up
> > no_callee_saved_registers attribute.
> > * config/i386/i386-options.cc (ix86_set_func_type): Look up
> > preserve_none attribute.  Check preserve_none attribute for
> > interrupt attribute.  Don't check no_caller_saved_registers nor
> > no_callee_saved_registers conflicts here.
> > (ix86_set_func_type): Check no_callee_saved_registers before
> > checking no_caller_saved_registers attribute.
> > (ix86_set_current_function): Allow SSE with
> > no_caller_saved_registers attribute.
> > (ix86_handle_call_saved_registers_attribute): Check preserve_none,
> > no_callee_saved_registers and no_caller_saved_registers conflicts.
> > (ix86_gnu_attributes): Add preserve_none attribute.
> > * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p):
> > New.
> > * config/i386/i386.cc
> > (x86_64_preserve_none_int_parameter_registers): New.
> > (ix86_using_red_zone): Don't use red-zone when there are no
> > caller-saved registers with SSE.
> > (ix86_type_no_callee_saved_registers_p): New.
> > (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE
> > and call ix86_type_no_callee_saved_registers_p instead of looking
> > up no_callee_saved_registers attribute.
> > (ix86_comp_type_attributes): Call
> > ix86_type_no_callee_saved_registers_p instead of looking up
> > no_callee_saved_registers attribute.  Return 0 if preserve_none
> > attribute doesn't match in 64-bit mode.
> > (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE,
> > use x86_64_preserve_none_int_parameter_registers.
> > (init_cumulative_args): Set preserve_none_abi.
> > (function_arg_64): Use x86_64_preserve_none_int_parameter_registers
> > with preserve_none attribute.
> > (setup_incoming_varargs_64): Use
> > x86_64_preserve_none_int_parameter_registers with preserve_none
> > attribute.
> > (ix86_save_reg): Treat TYPE_PRESERVE_NONE like
> > TYPE_NO_CALLEE_SAVED_REGISTERS.
> > (ix86_nsaved_sseregs): Allow saving XMM registers for
> > no_caller_saved_registers attribute.
> > (ix86_compute_frame_layout): Likewise.
> > (x86_this_parameter): Use
> > x86_64_preserve_none_int_parameter_registers with preserve_none
> > attribute.
> > * config/i386/i386.h (ix86_args): Add preserve_none_abi.
> > (call_saved_registers_type): Add TYPE_PRESERVE_NONE.
> > (machine_function): Change call_saved_registers to 3 bits.
> > * doc/extend.texi: Add preserve_none attribute.  Update
> > no_caller_saved_registers attribute to remove -mgeneral-regs-only
> > restriction.
> >
> > gcc/testsuite/
> >
> > PR target/119628
> > * gcc.target/i386/no-callee-saved-3.c: Adjust error location.
> > * gcc.target/i386/no-callee-saved-19a.c: New test.
> > * gcc.target/i386/no-callee-saved-19b.c: Likewise.
> > * gcc.target/i386/no-callee-saved-19c.c: Likewise.
> > * gcc.target/i386/no-callee-saved-19d.c: Likewise.
> > * gcc.target/i386/no-callee-saved-19e.c: Likewise.
> > * gcc.target/i386/preserve-none-1.c: Likewise.
> > * gcc.target/i386/preserve-none-2.c: Likewise.
> > * gcc.target/i386/preserve-none-3.c: Likewise.
> > * gcc.target/i386/preserve-none-4.c: Likewise.
> > * gcc.target/i386/preserve-none-5.c: Likewise.
> > * gcc.target/i386/preserve-none-6.c: Likewise.
> > * gcc.target/i386/preserve-none-7.c: Likewise.
> > * gcc.target/i386/preserve-none-8.c: Likewise.
> > * gcc.target/i386/preserve-none-9.c: Likewise.
> > * gcc.target/i386/preserve-none-10.c: Likewise.
> > * gcc.target/i386/preserve-none-11.c: Likewise.
> > *

Re: [Patch, Fortran, Coarray, PR88076, v1] 6/6 Add a shared memory multi process coarray library.

2025-06-25 Thread Damian Rouson

Giving something new time to mature before making it the default is always
a great policy. My suggestion is aspirational. I’m describing a dream that
I hope can be the ultimate goal.  There’s no need to rush into implementing
my proposed vision.

D

On Tue, Jun 24, 2025 at 23:25 Andre Vehreschild  wrote:

> Hi Damian, hi Steve,
>
> enabling coarray-support by default has implications we need to consider.
> The
> memory footprint of a coarray enabled program is larger than the one of a
> non-coarray one. This is simply because the coarray token needs to be
> stored
> somewhere.
>
> Furthermore, I just yesterday figured, that with -fcoarray=single the
> space for
> the token was allocated. I.e. every data structure, that could possibly be
> stored in a coarray and had allocatable components in it, wasted another 8
> byte
> for an unused pointer.
>
> So when we default to having coarray support enabled, some work needs to be
> done, to remove such inefficiencies. Given there are only a few developers,
> that work on coarrays, this may take some time.
>
> What we can of course do, is to switch on the coarray mode, when we detect
> the
> first coarray construct and no longer need the user to do it. I hope this
> does
> not have to many implications and causes only a hand full of bugs.
>
> For the time being, I propose to first give the new coarray implementation
> some
> time to mature and test. There will be bugs, because nobody is perfect.
>
> @Steve caf_shmem does not use MPI. It is a shared memory, single node,
> multi
> process approach. Just to prevent any misunderstanding.
>
> Thanks for all the testing.
>
> Regards,
> Andre
>
> On Tue, 24 Jun 2025 11:13:52 -0700
> Steve Kargl  wrote:
>
> > Damian,
> >
> > I submitted a patch a long time ago to make -fcoarray=single the
> > default behavior.  The patch made -fcoarray=none a NOP.  With
> > inclusion of a shmem implementation of the runtime parts, this
> > might be the way to go.  I'll leave that decision to Andre, Thomas,
> > and Nicolas.
> >
> > I believe that the gfortran contributors have not considered
> > coarray as an optional add-on.  The problem for gfortran is
> > that it runs on dozens of CPUs and dozens upon dozens of
> > operating systems.  The few gfortran contributors simply cannot
> > ensure that opencoarray+mpich or opencoarray+openmpi runs on
> > all of the possible combinations of hardware and OS's.  Andre
> > has hinted that he expects some rough edges on non-linux system.
> > I'll find out this weekend when I give his patch a spin on
> > FreeBSD.  Hopefully, a windows10/11 user can test the patch.
> >
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>

Re: [PATCH] s390: Add some missing vector patterns.

2025-06-25 Thread Juergen Christ

> On Tue, Jun 24, 2025 at 09:49:01AM +0200, Juergen Christ wrote:
> > Some patterns that are detected by the autovectorizer can be supported by
> > s390.  Add expanders such that autovectorization of these patterns works.
> > 
> > Bootstrapped and regtested on s390.  Ok for trunk?
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/vector.md (avg3_ceil): New pattern.
> > (uavg3_ceil): New pattern.
> > (smul3_highpart): New pattern.
> > (umul3_highpart): New pattern.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/vector/pattern-avg-1.c: New test.
> > * gcc.target/s390/vector/pattern-mulh-1.c: New test.
> > 
> > Signed-off-by: Juergen Christ 
> > ---
> >  gcc/config/s390/vector.md | 28 ++
> >  .../gcc.target/s390/vector/pattern-avg-1.c| 26 +
> >  .../gcc.target/s390/vector/pattern-mulh-1.c   | 29 +++
> >  3 files changed, 83 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c
> > 
> > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > index 6f4e1929eb80..16f4b8116432 100644
> > --- a/gcc/config/s390/vector.md
> > +++ b/gcc/config/s390/vector.md
> > @@ -3576,3 +3576,31 @@
> >  ; vec_unpacks_float_lo
> >  ; vec_unpacku_float_hi
> >  ; vec_unpacku_float_lo
> > +
> > +(define_expand "avg3_ceil"
> > +  [(set (match_operand:VIT_HW_VXE3_T0 
> > "register_operand" "=v")
> > +   (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
> > "register_operand"  "v")
> > +  (match_operand:VIT_HW_VXE3_T 2 
> > "register_operand"  "v")]
> > + UNSPEC_VEC_AVG))]
> > +  "TARGET_VX")
> > +
> > +(define_expand "uavg3_ceil"
> > +  [(set (match_operand:VIT_HW_VXE3_T0 
> > "register_operand" "=v")
> > +   (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
> > "register_operand"  "v")
> > +  (match_operand:VIT_HW_VXE3_T 2 
> > "register_operand"  "v")]
> > + UNSPEC_VEC_AVGU))]
> > +  "TARGET_VX")
> > +
> > +(define_expand "smul3_highpart"
> > +  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
> >   "=v")
> > +   (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
> > "register_operand" "v")
> > +   (match_operand:VIT_HW_VXE3_DT 2 
> > "register_operand" "v")]
> > +  UNSPEC_VEC_SMULT_HI))]
> > +  "TARGET_VX")
> > +
> > +(define_expand "umul3_highpart"
> > +  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
> >   "=v")
> > +   (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
> > "register_operand" "v")
> > +   (match_operand:VIT_HW_VXE3_DT 2 
> > "register_operand" "v")]
> > +  UNSPEC_VEC_UMULT_HI))]
> > +  "TARGET_VX")
> 
> In commit r12-4231-g555fa3545efe23 RTX smul_highpart and umul_highpart
> were introduced which we could use instead of the unspec, now.  So one
> solution would be to move vec_smulh/vec_umulh from
> vx-builtins.md to vector.md and rename those to
> smul3_highpart/umul3_highpart and then making sure that
> those are used in s390-builtins.def.  Of course, replacing the unspec by
> the corresponding RTXs', too.
> 
> Sorry for bothering with this.  But I think it is worthwhile to replace
> those unspecs.
> 
> Thanks,
> Stefan

Will send v2 with these fixes.

Re: [PATCH] s390: Optimize fmin/fmax.

2025-06-25 Thread Juergen Christ

> On Mon, Jun 23, 2025 at 09:51:13AM +0200, Juergen Christ wrote:
> > On VXE targets, we can directly use the fp min/max instruction instead of
> > calling into libm for fmin/fmax etc.
> > 
> > Provide fmin/fmax versions also for vectors even though it cannot be
> > called directly.  This will be exploited with a follow-up patch when
> > reductions are introduced.
> 
> This looks very similar to vfmin / vfmax.  Couldn't we merge
> those by using appropriate mode iterators?  The expander for fmin
> / fmax could set the mask operand.

Will send v2.

[PATCH v2] s390: Optimize fmin/fmax.

2025-06-25 Thread Juergen Christ

On VXE targets, we can directly use the fp min/max instruction instead of
calling into libm for fmin/fmax etc.

Provide fmin/fmax versions also for vectors even though it cannot be
called directly.  This will be exploited with a follow-up patch when
reductions are introduced.

Bootstrapped and regtested on s390.  Ok for trunk?

gcc/ChangeLog:

* config/s390/s390.md: Update UNSPECs
* config/s390/vector.md (fmax3): New expander.
(fmin3): New expander.
* config/s390/vx-builtins.md (*fmin): New insn.
(vfmin): Redefined to use new insn.
(*fmax): New insn.
(vfmax): Redefined to use new insn.

gcc/testsuite/ChangeLog:

* gcc.target/s390/fminmax-1.c: New test.
* gcc.target/s390/fminmax-2.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390.md   |  6 +-
 gcc/config/s390/vector.md | 25 
 gcc/config/s390/vx-builtins.md| 29 ++---
 gcc/testsuite/gcc.target/s390/fminmax-1.c | 77 +++
 gcc/testsuite/gcc.target/s390/fminmax-2.c | 29 +
 5 files changed, 156 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-2.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 97a4bdf96b2d..1c88c9624b60 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -241,9 +241,6 @@
 
UNSPEC_VEC_MSUM
 
-   UNSPEC_VEC_VFMIN
-   UNSPEC_VEC_VFMAX
-
UNSPEC_VEC_VBLEND
UNSPEC_VEC_VEVAL
UNSPEC_VEC_VGEM
@@ -256,6 +253,9 @@
 
UNSPEC_NNPA_VCFN_V8HI
UNSPEC_NNPA_VCNF_V8HI
+
+   UNSPEC_FMAX
+   UNSPEC_FMIN
 ])
 
 ;;
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 6f4e1929eb80..8bda30624c22 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -89,6 +89,13 @@
 (define_mode_iterator VF_HW [(V4SF "TARGET_VXE") V2DF (V1TF "TARGET_VXE")
 (TF "TARGET_VXE")])
 
+; FP scalar and vector modes
+(define_mode_iterator VFT_BFP [SF DF
+ (V1SF "TARGET_VXE") (V2SF "TARGET_VXE") (V4SF 
"TARGET_VXE")
+ V1DF V2DF
+ (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
+
+
 (define_mode_iterator V_8   [V1QI])
 (define_mode_iterator V_16  [V2QI  V1HI])
 (define_mode_iterator V_32  [V4QI  V2HI V1SI V1SF])
@@ -3576,3 +3583,21 @@
 ; vec_unpacks_float_lo
 ; vec_unpacku_float_hi
 ; vec_unpacku_float_lo
+
+; fmax
+(define_expand "fmax3"
+  [(set (match_operand:VFT_BFP 0 "register_operand" "=v")
+   (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v")
+  (match_operand:VFT_BFP 2 "register_operand" "v")
+  (const_int 4)]
+ UNSPEC_FMAX))]
+  "TARGET_VXE")
+
+; fmin
+(define_expand "fmin3"
+  [(set (match_operand:VFT_BFP 0 "register_operand" "=v")
+   (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v")
+  (match_operand:VFT_BFP 2 "register_operand" "v")
+  (const_int 4)]
+ UNSPEC_FMIN))]
+  "TARGET_VXE")
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index a7bb7ff92f5e..0508df43b866 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -2136,15 +2136,32 @@
   "fchebs\t%v2,%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_insn "*fmin"
+  [(set (match_operand:VFT_BFP0 "register_operand"  "=v")
+   (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand"   "v")
+(match_operand:VFT_BFP 2 "register_operand"   "v")
+(match_operand:QI  3 "const_mask_operand" "C")]
+   UNSPEC_FMIN))]
+  "TARGET_VXE"
+  "fminb\t%v0,%v1,%v2,%b3"
+  [(set_attr "op_type" "VRR")])
 
-(define_insn "vfmin"
+(define_expand "vfmin"
   [(set (match_operand:VF_HW0 "register_operand"  "=v")
(unspec:VF_HW [(match_operand:VF_HW 1 "register_operand"   "v")
   (match_operand:VF_HW 2 "register_operand"   "v")
   (match_operand:QI3 "const_mask_operand" "C")]
- UNSPEC_VEC_VFMIN))]
+ UNSPEC_FMIN))]
+  "TARGET_VXE")
+
+(define_insn "*fmax"
+  [(set (match_operand:VFT_BFP0 "register_operand"  "=v")
+   (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand"   "v")
+(match_operand:VFT_BFP 2 "register_operand"   "v")
+(match_operand:QI  3 "const_mask_operand" "C")]
+   UNSPEC_FMAX))]
   "TARGET_VXE"
-  "fminb\t%v0,%v1,%v2,%b3"
+  "fmaxb\t%v0,%v1,%v2,%b3"
   [(set_attr "op_type" "VRR")])
 
 (define_insn "vfmax"
@@ -2152,10 +2169,8 @@
(unspec:VF_HW [(match_operand:VF_HW 1 "register_operand"   "v")
   (match_operand:VF_HW 2 "register_operand"   "v")
   (match_operand:

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-25 Thread Andre Vehreschild

Hi Jerry,

thank you very much. Just try it. I can only imagine that Paul had a somehow
corrupted build directory or left overs from some previous build. I am still
wondering, that I got no automated mail from the build hosts, but I can
imagine, that they get issues with a series of patches, that build upon each
other.

Just try it. The more feedback, the better.

Regards,
Andre

On Tue, 24 Jun 2025 11:07:23 -0700
Jerry D  wrote:

> On 6/24/25 6:09 AM, Andre Vehreschild wrote:
> > Hi all,
> > 
> > this series of patches (six in total) adds a new coarray backend library to
> > libgfortran.  The library uses shared memory and processes to implement
> > running multiple images on the same node.  The work is based on work
> > started by Thomas and Nicolas Koenig. No changes to the gfortran compile
> > part are required for this.  
> 
> --- snip ---
> 
> Hi Andre,
> 
> Thank you for this work. I have been wanting this functionality for 
> several years!
> 
> I will begin reviewing as best I can.  I did see Paul's initial comment 
> so your feedback on that would be appreciated.
> 
> Best regards,
> 
> Jerry

-- 
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH v2] s390: Add some missing vector patterns.

2025-06-25 Thread Juergen Christ

Some patterns that are detected by the autovectorizer can be supported by
s390.  Add expanders such that autovectorization of these patterns works.

RTL for the builtins used unspec to represent highpart multiplication.
Replace this by the correct RTL to allow further simplification.

Bootstrapped and regtested on s390.  Ok for trunk?

gcc/ChangeLog:

* config/s390/s390.md: Removed unused unspecs.
* config/s390/vector.md (avg3_ceil): New expander.
(uavg3_ceil): New expander.
(smul3_highpart): New expander.
(umul3_highpart): New expander.
* config/s390/vx-builtins.md (vec_umulh): Remove unspec.
(vec_smulh): Remove unspec.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/pattern-avg-1.c: New test.
* gcc.target/s390/vector/pattern-mulh-1.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390.md   |  3 --
 gcc/config/s390/vector.md | 26 +
 gcc/config/s390/vx-builtins.md| 10 +++
 .../gcc.target/s390/vector/pattern-avg-1.c| 26 +
 .../gcc.target/s390/vector/pattern-mulh-1.c   | 29 +++
 5 files changed, 85 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 97a4bdf96b2d..440ce93574f4 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -139,9 +139,6 @@
UNSPEC_LCBB
 
; Vector
-   UNSPEC_VEC_SMULT_HI
-   UNSPEC_VEC_UMULT_HI
-   UNSPEC_VEC_SMULT_LO
UNSPEC_VEC_SMULT_EVEN
UNSPEC_VEC_UMULT_EVEN
UNSPEC_VEC_SMULT_ODD
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 6f4e1929eb80..8d7ca1a520f3 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -3576,3 +3576,29 @@
 ; vec_unpacks_float_lo
 ; vec_unpacku_float_hi
 ; vec_unpacku_float_lo
+
+(define_expand "avg3_ceil"
+  [(set (match_operand:VIT_HW_VXE3_T0 
"register_operand" "=v")
+   (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
"register_operand"  "v")
+  (match_operand:VIT_HW_VXE3_T 2 
"register_operand"  "v")]
+ UNSPEC_VEC_AVG))]
+  "TARGET_VX")
+
+(define_expand "uavg3_ceil"
+  [(set (match_operand:VIT_HW_VXE3_T0 
"register_operand" "=v")
+   (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 
"register_operand"  "v")
+  (match_operand:VIT_HW_VXE3_T 2 
"register_operand"  "v")]
+ UNSPEC_VEC_AVGU))]
+  "TARGET_VX")
+
+(define_expand "smul3_highpart"
+  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
  "=v")
+   (smul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
+ (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")))]
+  "TARGET_VX")
+
+(define_expand "umul3_highpart"
+  [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
  "=v")
+   (umul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
+ (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")))]
+  "TARGET_VX")
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index a7bb7ff92f5e..2478f74e161a 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -983,9 +983,8 @@
 ; vmhb, vmhh, vmhf, vmhg, vmhq
 (define_insn "vec_smulh"
   [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
  "=v")
-   (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
-   (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")]
-  UNSPEC_VEC_SMULT_HI))]
+   (smul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
+ (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")))]
   "TARGET_VX"
   "vmh\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
@@ -993,9 +992,8 @@
 ; vmlhb, vmlhh, vmlhf, vmlhg, vmlhq
 (define_insn "vec_umulh"
   [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" 
  "=v")
-   (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
-   (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")]
-  UNSPEC_VEC_UMULT_HI))]
+   (umul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 
"register_operand" "v")
+ (match_operand:VIT_HW_VXE3_DT 2 
"register_operand" "v")))]
   "TARGET_VX"
   "vmlh\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.

Re: [PATCH v6 2/9] AArch64: reformat branch instruction rules

2025-06-25 Thread Richard Sandiford

Karl Meakin  writes:
> Make the formatting of the RTL templates in the rules for branch
> instructions more consistent with each other.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (cbranch4): Reformat.
>   (cbranchcc4): Likewise.
>   (condjump): Likewise.
>   (*compare_condjump): Likewise.
>   (aarch64_cb1): Likewise.
>   (*cb1): Likewise.
>   (tbranch_3): Likewise.
>   (@aarch64_tb): Likewise.
> ---
>  gcc/config/aarch64/aarch64.md | 77 +--
>  1 file changed, 38 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index fcc24e300e6..d059a6362d5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> [...]
> @@ -725,34 +725,34 @@ (define_expand "cbranch4"
>  )
>  
>  (define_expand "cbranch4"
> -  [(set (pc) (if_then_else
> - (match_operator 0 "aarch64_comparison_operator"
> -  [(match_operand:GPF_F16 1 "register_operand")
> -   (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
> - (label_ref (match_operand 3 "" ""))
> - (pc)))]
> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> + [(match_operand:GPF_F16 1 "register_operand")
> +  (match_operand:GPF_F16 2 
> "aarch64_fp_compare_operand")])
> +(label_ref (match_operand 3))
> +(pc)))]

I think we should drop this part, since it makes the lines go over the
80-character limit.

OK with that change, thanks.

Richard

Re: [PATCH v6 3/9] AArch64: rename branch instruction rules

2025-06-25 Thread Richard Sandiford

Karl Meakin  writes:
> Give the `define_insn` rules used in lowering `cbranch4` to RTL
> more descriptive and consistent names: from now on, each rule is named
> after the AArch64 instruction that it generates. Also add comments to
> document each rule.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (condjump): Rename to ...
>   (aarch64_bcond): ...here.
>   (*compare_condjump): Rename to ...
>   (*aarch64_bcond_wide_imm): ...here.
>   (aarch64_cb): Rename to ...
>   (aarch64_cbz1): ...here.
>   (*cb1): Rename to ...
>   (*aarch64_tbz1): ...here.
>   (@aarch64_tb): Rename to ...
>   (@aarch64_tbz): ...here.
>   (restore_stack_nonlocal): Handle rename.
>   (stack_protect_combined_test): Likewise.
>   * config/aarch64/aarch64-simd.md (cbranch4): Likewise.
>   * config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
>   * config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
> ---
>  gcc/config/aarch64/aarch64-simd.md |  2 +-
>  gcc/config/aarch64/aarch64-sme.md  |  2 +-
>  gcc/config/aarch64/aarch64.cc  |  4 ++--
>  gcc/config/aarch64/aarch64.md  | 21 -
>  4 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index e771defc73f..33839f2fec7 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2884,9 +2884,9 @@ aarch64_gen_test_and_branch (rtx_code code, rtx x, int 
> bitnum,
>emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask));
>rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM);
>rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx);
> -  return gen_condjump (x, cc_reg, label);
> +  return gen_aarch64_bcond (x, cc_reg, label);
>  }
> -  return gen_aarch64_tb (code, mode, mode,
> +  return gen_aarch64_tbz (code, mode, mode,
>x, gen_int_mode (bitnum, mode), label);

Sorry for the formatting nit, but: please indent this line by an extra
column too, so that the arguments still line up.

> [...]
> @@ -8104,7 +8107,7 @@ (define_expand "stack_protect_combined_test"
>: gen_stack_protect_test_si) (operands[0], operands[1]));
>  
>rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> -  emit_jump_insn (gen_condjump (gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
> +  emit_jump_insn (gen_aarch64_bcond (gen_rtx_EQ (VOIDmode, cc_reg, 
> const0_rtx),
>   cc_reg, operands[2]));

Similarly, please reindent this to match the new name.

OK with those changes, thanks.

Richard

[PATCH v2] libstdc++: Test for %S precision for durations with integral representation.

2025-06-25 Thread Tomasz Kamiński

Existing test are extented to cover cases where not precision is specified,
or it is specified to zero. The precision value is ignored in all cases.

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/precision.cc: New tests.
---
v2 extents test to cover .0 as precision.
Testing on x86_64-linux. std/format/time* test passed, also with 
-D_GLIBCXX_USE_CXX11_ABI=0
and -D_GLIBCXX_DEBUG.


 .../testsuite/std/time/format/precision.cc| 104 +-
 1 file changed, 99 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc 
b/libstdc++-v3/testsuite/std/time/format/precision.cc
index ccb2c77ce05..aa266156c1f 100644
--- a/libstdc++-v3/testsuite/std/time/format/precision.cc
+++ b/libstdc++-v3/testsuite/std/time/format/precision.cc
@@ -16,6 +16,10 @@ test_empty()
   std::basic_string res;
 
   const duration d(33.111222);
+  res = std::format(WIDEN("{:}"), d);
+  VERIFY( res == WIDEN("33.1112s") );
+  res = std::format(WIDEN("{:.0}"), d);
+  VERIFY( res == WIDEN("33.1112s") );
   res = std::format(WIDEN("{:.3}"), d);
   VERIFY( res == WIDEN("33.1112s") );
   res = std::format(WIDEN("{:.6}"), d);
@@ -25,6 +29,10 @@ test_empty()
 
   // Uses ostream operator<<
   const duration nd = d;
+  res = std::format(WIDEN("{:}"), nd);
+  VERIFY( res == WIDEN("3.31112e+10ns") );
+  res = std::format(WIDEN("{:.0}"), nd);
+  VERIFY( res == WIDEN("3.31112e+10ns") );
   res = std::format(WIDEN("{:.3}"), nd);
   VERIFY( res == WIDEN("3.31112e+10ns") );
   res = std::format(WIDEN("{:.6}"), nd);
@@ -40,6 +48,10 @@ test_Q()
   std::basic_string res;
 
   const duration d(7.111222);
+  res = std::format(WIDEN("{:%Q}"), d);
+  VERIFY( res == WIDEN("7.111222") );
+  res = std::format(WIDEN("{:.0%Q}"), d);
+  VERIFY( res == WIDEN("7.111222") );
   res = std::format(WIDEN("{:.3%Q}"), d);
   VERIFY( res == WIDEN("7.111222") );
   res = std::format(WIDEN("{:.6%Q}"), d);
@@ -47,7 +59,23 @@ test_Q()
   res = std::format(WIDEN("{:.9%Q}"), d);
   VERIFY( res == WIDEN("7.111222") );
 
+  duration md = d;
+  res = std::format(WIDEN("{:%Q}"), md);
+  VERIFY( res == WIDEN("7111.222") );
+  res = std::format(WIDEN("{:.0%Q}"), md);
+  VERIFY( res == WIDEN("7111.222") );
+  res = std::format(WIDEN("{:.3%Q}"), md);
+  VERIFY( res == WIDEN("7111.222") );
+  res = std::format(WIDEN("{:.6%Q}"), md);
+  VERIFY( res == WIDEN("7111.222") );
+  res = std::format(WIDEN("{:.9%Q}"), md);
+  VERIFY( res == WIDEN("7111.222") );
+
   const duration nd = d;
+  res = std::format(WIDEN("{:%Q}"), nd);
+  VERIFY( res == WIDEN("7111222000") );
+  res = std::format(WIDEN("{:.0%Q}"), nd);
+  VERIFY( res == WIDEN("7111222000") );
   res = std::format(WIDEN("{:.3%Q}"), nd);
   VERIFY( res == WIDEN("7111222000") );
   res = std::format(WIDEN("{:.6%Q}"), nd);
@@ -58,12 +86,16 @@ test_Q()
 
 template
 void
-test_S()
+test_S_fp()
 {
   std::basic_string res;
 
   // Precision is ignored, but period affects output
-  const duration d(5.111222);
+  duration d(5.111222);
+  res = std::format(WIDEN("{:%S}"), d);
+  VERIFY( res == WIDEN("05") );
+  res = std::format(WIDEN("{:.0%S}"), d);
+  VERIFY( res == WIDEN("05") );
   res = std::format(WIDEN("{:.3%S}"), d);
   VERIFY( res == WIDEN("05") );
   res = std::format(WIDEN("{:.6%S}"), d);
@@ -71,7 +103,11 @@ test_S()
   res = std::format(WIDEN("{:.9%S}"), d);
   VERIFY( res == WIDEN("05") );
 
-  const duration md = d;
+  duration md = d;
+  res = std::format(WIDEN("{:%S}"), md);
+  VERIFY( res == WIDEN("05.111") );
+  res = std::format(WIDEN("{:.0%S}"), md);
+  VERIFY( res == WIDEN("05.111") );
   res = std::format(WIDEN("{:.3%S}"), md);
   VERIFY( res == WIDEN("05.111") );
   res = std::format(WIDEN("{:.6%S}"), md);
@@ -79,13 +115,70 @@ test_S()
   res = std::format(WIDEN("{:.9%S}"), md);
   VERIFY( res == WIDEN("05.111") );
 
-  const duration nd = d;
+  duration ud = d;
+  res = std::format(WIDEN("{:%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.0%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.3%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.6%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+  res = std::format(WIDEN("{:.9%S}"), ud);
+  VERIFY( res == WIDEN("05.111222") );
+
+  duration nd = d;
+  res = std::format(WIDEN("{:%S}"), nd);
+  VERIFY( res == WIDEN("05.111222000") );
+  res = std::format(WIDEN("{:.0%S}"), nd);
+  VERIFY( res == WIDEN("05.111222000") );
   res = std::format(WIDEN("{:.3%S}"), nd);
   VERIFY( res == WIDEN("05.111222000") );
   res = std::format(WIDEN("{:.6%S}"), nd);
   VERIFY( res == WIDEN("05.111222000") );
   res = std::format(WIDEN("{:.9%S}"), nd);
   VERIFY( res == WIDEN("05.111222000") );
+
+  duration pd = d;
+  res = std::format(WIDEN("{:%S}"), pd);
+  VERIFY( res == WIDEN("05.11122200") );
+  res = std::format(WIDEN("{:.0%S}"), pd);
+  VERIFY( res == WIDEN("05.11122200") );
+  res = std::format(WIDEN("{:.3%S}"), pd);
+  VER

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-25 Thread Richard Biener

On Tue, Jun 24, 2025 at 5:25 PM Alexander Monakov  wrote:
>
> > I'd say we want to fix these kind of things before switching the default.  
> > Can
> > you file bugreports for the distinct issues you noticed when adjusting the
> > testcases?
>
> Sure, filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120808 for the most
> frequently hit issue on x86 for now.

Thanks.  So almost all issues arise because the FMAs are then introduced early
(and possible folding with negates is done late).  At some point we've arranged
FMAs to be produced after vectorization only (there might be targets with scalar
FMA but no vector FMA for example).

It shouldn't be too hard to handle FMAs during vectorization but having a mix
will certainly complicate things.  Likewise undoing FMA creation when there's
no vector FMA would rely on detecting whether the FMA was introduced by
the compiler or the middle-end (I suppose builtin vs. IFN might do the
job here).

> > I suppose they are reproducible as well when using the C fma() function
> > directly?
>
> No, unfortunately there are multiple issues with fma builtin:
>
> 1) __builtin_fma does not accept generic vector types

indeed, you'd have to declare an OMP SIMD fma variant but that will not
be recognized as fma or .FMA then I think.

> 2) we have FMS FNMA FNMS FMADDSUB FMSUBADD internal functions, but
> no corresponding builtins

These are direct optab internal functions.  I'm not sure we want
builtins for all of
those, fma () with negated arguments should do fine.

> 3) __builtin_fma and .FMA internal function are not the same in the 
> middle-end,
> I reported one instance arising from that in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892

The builtin and the internal function should behave the same, in this case it's
again late vs. early exposal of FMA.

I am testing partial fixes for these issues.

Richard.

>
> Alexander

[PATCH] tree-optimization/109892 - SLP reduction of fma

2025-06-25 Thread Richard Biener

The following adds the ability to vectorize a fma reduction pair
as SLP reduction (we cannot yet handle ternary association in
reduction vectorization yet).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/109892
* tree-vect-loop.cc (reduction_fn_for_scalar_code): Handle fma.

* gcc.dg/vect/vect-reduc-fma-1.c: New testcase.
* gcc.dg/vect/vect-reduc-fma-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c | 15 +++
 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c | 20 
 gcc/tree-vect-loop.cc|  4 
 3 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
new file mode 100644
index 000..e958b43e23b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+double f(double x[], long n)
+{
+double r0 = 0, r1 = 0;
+for (; n; x += 2, n--) {
+r0 = __builtin_fma(x[0], x[0], r0);
+r1 = __builtin_fma(x[1], x[1], r1);
+}
+return r0 + r1;
+}
+
+/* We should vectorize this as SLP reduction.  */
+/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and 
unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c
new file mode 100644
index 000..ea1ca9720e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffp-contract=on" } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+static double muladd(double x, double y, double z)
+{
+return x * y + z;
+}
+double g(double x[], long n)
+{
+double r0 = 0, r1 = 0;
+for (; n; x += 2, n--) {
+r0 = muladd(x[0], x[0], r0);
+r1 = muladd(x[1], x[1], r1);
+}
+return r0 + r1;
+}
+
+/* We should vectorize this as SLP reduction.  */
+/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and 
unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a3f95433a5b..1e6e9cede18 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3906,6 +3906,10 @@ reduction_fn_for_scalar_code (code_helper code, 
internal_fn *reduc_fn)
*reduc_fn = IFN_REDUC_FMIN;
return true;
 
+  CASE_CFN_FMA:
+   *reduc_fn = IFN_LAST;
+   return true;
+
   default:
return false;
   }
-- 
2.43.0

Re: [PATCH v2] libstdc++: Test for %S precision for durations with integral representation.

2025-06-25 Thread Jonathan Wakely

On Wed, 25 Jun 2025 at 10:42, Tomasz Kamiński  wrote:
>
> Existing test are extented to cover cases where not precision is specified,
> or it is specified to zero. The precision value is ignored in all cases.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/std/time/format/precision.cc: New tests.
> ---
> v2 extents test to cover .0 as precision.
> Testing on x86_64-linux. std/format/time* test passed, also with 
> -D_GLIBCXX_USE_CXX11_ABI=0
> and -D_GLIBCXX_DEBUG.

OK for trunk

>
>
>  .../testsuite/std/time/format/precision.cc| 104 +-
>  1 file changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc 
> b/libstdc++-v3/testsuite/std/time/format/precision.cc
> index ccb2c77ce05..aa266156c1f 100644
> --- a/libstdc++-v3/testsuite/std/time/format/precision.cc
> +++ b/libstdc++-v3/testsuite/std/time/format/precision.cc
> @@ -16,6 +16,10 @@ test_empty()
>std::basic_string res;
>
>const duration d(33.111222);
> +  res = std::format(WIDEN("{:}"), d);
> +  VERIFY( res == WIDEN("33.1112s") );
> +  res = std::format(WIDEN("{:.0}"), d);
> +  VERIFY( res == WIDEN("33.1112s") );
>res = std::format(WIDEN("{:.3}"), d);
>VERIFY( res == WIDEN("33.1112s") );
>res = std::format(WIDEN("{:.6}"), d);
> @@ -25,6 +29,10 @@ test_empty()
>
>// Uses ostream operator<<
>const duration nd = d;
> +  res = std::format(WIDEN("{:}"), nd);
> +  VERIFY( res == WIDEN("3.31112e+10ns") );
> +  res = std::format(WIDEN("{:.0}"), nd);
> +  VERIFY( res == WIDEN("3.31112e+10ns") );
>res = std::format(WIDEN("{:.3}"), nd);
>VERIFY( res == WIDEN("3.31112e+10ns") );
>res = std::format(WIDEN("{:.6}"), nd);
> @@ -40,6 +48,10 @@ test_Q()
>std::basic_string res;
>
>const duration d(7.111222);
> +  res = std::format(WIDEN("{:%Q}"), d);
> +  VERIFY( res == WIDEN("7.111222") );
> +  res = std::format(WIDEN("{:.0%Q}"), d);
> +  VERIFY( res == WIDEN("7.111222") );
>res = std::format(WIDEN("{:.3%Q}"), d);
>VERIFY( res == WIDEN("7.111222") );
>res = std::format(WIDEN("{:.6%Q}"), d);
> @@ -47,7 +59,23 @@ test_Q()
>res = std::format(WIDEN("{:.9%Q}"), d);
>VERIFY( res == WIDEN("7.111222") );
>
> +  duration md = d;
> +  res = std::format(WIDEN("{:%Q}"), md);
> +  VERIFY( res == WIDEN("7111.222") );
> +  res = std::format(WIDEN("{:.0%Q}"), md);
> +  VERIFY( res == WIDEN("7111.222") );
> +  res = std::format(WIDEN("{:.3%Q}"), md);
> +  VERIFY( res == WIDEN("7111.222") );
> +  res = std::format(WIDEN("{:.6%Q}"), md);
> +  VERIFY( res == WIDEN("7111.222") );
> +  res = std::format(WIDEN("{:.9%Q}"), md);
> +  VERIFY( res == WIDEN("7111.222") );
> +
>const duration nd = d;
> +  res = std::format(WIDEN("{:%Q}"), nd);
> +  VERIFY( res == WIDEN("7111222000") );
> +  res = std::format(WIDEN("{:.0%Q}"), nd);
> +  VERIFY( res == WIDEN("7111222000") );
>res = std::format(WIDEN("{:.3%Q}"), nd);
>VERIFY( res == WIDEN("7111222000") );
>res = std::format(WIDEN("{:.6%Q}"), nd);
> @@ -58,12 +86,16 @@ test_Q()
>
>  template
>  void
> -test_S()
> +test_S_fp()
>  {
>std::basic_string res;
>
>// Precision is ignored, but period affects output
> -  const duration d(5.111222);
> +  duration d(5.111222);
> +  res = std::format(WIDEN("{:%S}"), d);
> +  VERIFY( res == WIDEN("05") );
> +  res = std::format(WIDEN("{:.0%S}"), d);
> +  VERIFY( res == WIDEN("05") );
>res = std::format(WIDEN("{:.3%S}"), d);
>VERIFY( res == WIDEN("05") );
>res = std::format(WIDEN("{:.6%S}"), d);
> @@ -71,7 +103,11 @@ test_S()
>res = std::format(WIDEN("{:.9%S}"), d);
>VERIFY( res == WIDEN("05") );
>
> -  const duration md = d;
> +  duration md = d;
> +  res = std::format(WIDEN("{:%S}"), md);
> +  VERIFY( res == WIDEN("05.111") );
> +  res = std::format(WIDEN("{:.0%S}"), md);
> +  VERIFY( res == WIDEN("05.111") );
>res = std::format(WIDEN("{:.3%S}"), md);
>VERIFY( res == WIDEN("05.111") );
>res = std::format(WIDEN("{:.6%S}"), md);
> @@ -79,13 +115,70 @@ test_S()
>res = std::format(WIDEN("{:.9%S}"), md);
>VERIFY( res == WIDEN("05.111") );
>
> -  const duration nd = d;
> +  duration ud = d;
> +  res = std::format(WIDEN("{:%S}"), ud);
> +  VERIFY( res == WIDEN("05.111222") );
> +  res = std::format(WIDEN("{:.0%S}"), ud);
> +  VERIFY( res == WIDEN("05.111222") );
> +  res = std::format(WIDEN("{:.3%S}"), ud);
> +  VERIFY( res == WIDEN("05.111222") );
> +  res = std::format(WIDEN("{:.6%S}"), ud);
> +  VERIFY( res == WIDEN("05.111222") );
> +  res = std::format(WIDEN("{:.9%S}"), ud);
> +  VERIFY( res == WIDEN("05.111222") );
> +
> +  duration nd = d;
> +  res = std::format(WIDEN("{:%S}"), nd);
> +  VERIFY( res == WIDEN("05.111222000") );
> +  res = std::format(WIDEN("{:.0%S}"), nd);
> +  VERIFY( res == WIDEN("05.111222000") );
>res = std::format(WIDEN("{:.3%S}"), nd);
>VERIFY( res == WIDEN("05.111222000") );
>res = std::format(WIDEN("{:.6%S}"), nd);
>VERIFY( res == WIDEN("05.111222000") );
>res = std:

Re: [PATCH v3] reassoc: Optimize CMP/XOR expressions [PR116860]

2025-06-25 Thread Konstantinos Eleftheriou

Hi Jakub, thanks for the feedback.

We have sent a new version
(https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687530.html),
addressing those issues. Regarding the hash_sets, we have replaced
them with vectors in some cases and in the cases that we're still
using them we're copying them to sorted vectors before traversals.

Konstantinos

On Thu, Jun 12, 2025 at 10:36 AM Jakub Jelinek  wrote:
>
> On Mon, Mar 17, 2025 at 11:40:32AM +0100, Konstantinos Eleftheriou wrote:
> >   * gcc.dg/tree-ssa/fold-xor-and-or.c:
> >   Remove logical-op-non-short-circuit=1.
>
> The remove certainly fits on the same line as :
> and --param= is missing before the option name.
>
> > +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
> > "optimized" } } */
> > \ No newline at end of file
>
> Please avoid files not ending with newline unless intentional.
>
> > +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
> > "optimized" } } */
> > \ No newline at end of file
>
> Ditto.
>
> > --- a/gcc/tree-ssa-reassoc.cc
> > +++ b/gcc/tree-ssa-reassoc.cc
> > @@ -4077,6 +4077,359 @@ optimize_range_tests_var_bound (enum tree_code 
> > opcode, int first, int length,
> >return any_changes;
> >  }
> >
> > +/* Helper function for optimize_cmp_xor_exprs.  Visit EXPR operands
> > +   recursively and try to find comparison or XOR expressions that can be
> > +   solved using the expressions in CALC_STMTS.  Expressions that can be 
> > folded
> > +   to 0 are stored in STMTS_TO_FOLD.  IS_OR_EXPR is true for OR expressions
> > +   and false for AND expressions.  */
> > +
> > +tree
>
> Missing static before tree
>
> > +solve_expr (tree expr, hash_set *calc_stmts,
> > + hash_set *stmts_to_fold, hash_set *visited,
> > + bool is_or_expr)
> > +{
> > +  /* Return, if have already visited this expression or the expression is 
> > not
> > + an SSA name.  */
> > +  if (visited->contains (expr) || TREE_CODE (expr) != SSA_NAME)
>
> The TREE_CODE (expr) != SSA_NAME test is certainly much cheaper than
> visited->contains (expr), so please swap the || operands.
>
> +void
>
> Again missing static before return type (and in more spots)
>
> +find_terminal_nodes (tree expr, hash_set *terminal_nodes,
> +   hash_set *visited)
> +{
> +  if (visited->contains (expr))
> +return;
> +
> +  visited->add (expr);
>
> The above together is
>   if (visited->add (expr))
> return;
> (and more efficient in that).
>
> > +return NULL_TREE;
> > +
> > +  visited->add (expr);
> > +
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> > +
> > +  if (!def_stmt || !is_gimple_assign (def_stmt))
> > +return expr;
> > +
> > +  unsigned int op_num = gimple_num_ops (def_stmt);
> > +  unsigned int terminal_node_num = 0;
> > +  /* Visit the expression operands recursively until finding a statement 
> > that
>
> until it finds ?
>
> > +
> > +  do {
>
> The formatting is wrong, it shouldn't be
>   do {
> statements;
>   } while (cond);
> but
>   do
> {
>   statements;
> }
>   while (cond);
>
> Last but not least, there are ton's of hash_sets involved, wonder if one could
> away without that for the common simple cases and use those only if it is
> larger, but more importantly, I believe the code generation depends on the
> hash_set traversals, which is a big no no for reproduceability, because the
> hash_set or hash_set I believe just use hashes derived from
> the pointer values and so with address space randomization, even subsequent
> runs of the same compiler on the same machine could result in different code
> generation, not even talking about cross compilers with different hosts etc.
> It is fine to use hash set traversals to find out what will need to be done,
> but in that case it should be e.g. pushed into some vector worklist and the
> worklist sorted by something stable (e.g. SSA_NAME_VERSIONs or positions in
> the original term sequence etc., i.e. something that reflects the IL and
> not the pointer values of particular trees or gimple *).
>
> Jakub
>

[PATCH v4] reassoc: Optimize CMP/XOR expressions [PR116860]

2025-06-25 Thread Konstantinos Eleftheriou

Testcases for match.pd patterns
`((a ^ b) & c) cmp d | a != b -> (0 cmp d | a != b)` and
`(a ^ b) cmp c | a != b -> (0 cmp c | a != b)` were failing on some targets,
like PowerPC.

This patch adds an implemenetation for the optimization in reassoc. Doing so,
we can now handle cases where the related conditions appear in an AND
expression too. Also, we can optimize cases where we have intermediate
expressions between the related ones in the AND/OR expression on some targets.
This is not handled on targets like PowerPC, where each condition of the
AND/OR expression is placed into a different basic block.

Bootstrapped/regtested on x86 and AArch64.

PR tree-optimization/116860

gcc/ChangeLog:

* tree-ssa-reassoc.cc (solve_expr): New function.
(find_terminal_nodes): New function.
(get_terminal_nodes): New function.
(sort_elements): New function.
(copy_hashset_to_vec_and_sort): New function.
(optimize_cmp_xor_exprs): New function.
(optimize_range_tests): Call optimize_cmp_xor_exprs.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fold-xor-and-or.c:
Remove logical-op-non-short-circuit=1.
* gcc.dg/tree-ssa/fold-xor-or.c: Likewise.
* gcc.dg/tree-ssa/fold-xor-and-or-2.c: New test.
* gcc.dg/tree-ssa/fold-xor-and.c: New test.
---
 .../gcc.dg/tree-ssa/fold-xor-and-or-2.c   |  59 +++
 .../gcc.dg/tree-ssa/fold-xor-and-or.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c  |  55 +++
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c   |   2 +-
 gcc/tree-ssa-reassoc.cc   | 415 ++
 5 files changed, 531 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
new file mode 100644
index ..a11fcb3732a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
@@ -0,0 +1,59 @@
+/* This test is not working across all targets (e.g. it fails on PowerPC, 
+   because each condition of the AND/OR expression is placed into
+   a different basic block). Therefore, it is gated for x86-64 and AArch64,
+   where we know that it has to pass.  */
+/* { dg-do compile { target { aarch64-*-* x86_64-*-* } } } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1_or_inter(int d1, int d2, int d3) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2_or_inter(int d1, int d2, int d3, int d4) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2 || d4 == 11)
+return 0;
+  return 1;
+}
+
+int cmp1_and_inter(int d1, int d2, int d3) {
+  if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2)
+return 0;
+  return 1;
+}
+
+int cmp2_and_inter(int d1, int d2, int d3, int d4) {
+  if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2 && d4 != 11)
+return 0;
+  return 1;
+}
+
+int cmp1_or_inter_64(uint64_t d1, uint64_t d2, uint64_t d3) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2_or_inter_64(uint64_t d1, uint64_t d2, uint64_t d3, uint64_t d4) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2 || d4 == 11)
+return 0;
+  return 1;
+}
+
+int cmp1_and_inter_64(uint64_t d1, uint64_t d2, uint64_t d3) {
+  if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2)
+return 0;
+  return 1;
+}
+
+int cmp2_and_inter_64(uint64_t d1, uint64_t d2, uint64_t d3, uint64_t d4) {
+  if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2 && d4 != 11)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
index 99e83d8e5aae..e5dc98e7541d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-optimized --param 
logical-op-non-short-circuit=1" } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
 
 typedef unsigned long int uint64_t;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c
new file mode 100644
index ..9957ef27dc70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(int d1, int d2) {
+  if (!((d1 ^ d2) == 0xabcd) && d1 == d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 == d2 && !((d1 ^ d2) == 0xabcd))
+return 0;
+  return 1;
+}
+
+int cmp3(int d1, int d2)

[PATCH] RISC-V: Generate -mcpu and -mtune options from riscv-cores.def.

2025-06-25 Thread Dongyan Chen

Automatically generate -mcpu and -mtune options in invoke.texi from
the unified riscv-cores.def metadata, ensuring documentation stays in sync
with definitions and reducing manual maintenance.

gcc/ChangeLog:

* Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list
of files to be processed by the Texinfo generator.
* config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi
and riscv-mtune.texi.
* doc/invoke.texi: Replace hand‑written extension table with
`@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to
pull in auto‑generated entries.
* config/riscv/gen-riscv-mcpu-texi.cc: New file.
* config/riscv/gen-riscv-mtune-texi.cc: New file.
* doc/riscv-mcpu.texi: New file.
* doc/riscv-mtune.texi: New file.

---
 gcc/Makefile.in  |  2 +-
 gcc/config/riscv/gen-riscv-mcpu-texi.cc  | 43 +++
 gcc/config/riscv/gen-riscv-mtune-texi.cc | 41 ++
 gcc/config/riscv/t-riscv | 37 -
 gcc/doc/invoke.texi  | 23 ++--
 gcc/doc/riscv-mcpu.texi  | 69 
 gcc/doc/riscv-mtune.texi | 59 
 7 files changed, 251 insertions(+), 23 deletions(-)
 create mode 100644 gcc/config/riscv/gen-riscv-mcpu-texi.cc
 create mode 100644 gcc/config/riscv/gen-riscv-mtune-texi.cc
 create mode 100644 gcc/doc/riscv-mcpu.texi
 create mode 100644 gcc/doc/riscv-mtune.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9535804f7fb5..2d5e3427550d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3710,7 +3710,7 @@ TEXI_GCC_FILES = gcc.texi gcc-common.texi gcc-vers.texi 
frontends.texi\
 contribute.texi compat.texi funding.texi gnu.texi gpl_v3.texi  \
 fdl.texi contrib.texi cppenv.texi cppopts.texi avr-mmcu.texi   \
 implement-c.texi implement-cxx.texi gcov-tool.texi gcov-dump.texi \
-lto-dump.texi riscv-ext.texi
+lto-dump.texi riscv-ext.texi riscv-mcpu.texi riscv-mtune.texi

 # we explicitly use $(srcdir)/doc/tm.texi here to avoid confusion with
 # the generated tm.texi; the latter might have a more recent timestamp,
diff --git a/gcc/config/riscv/gen-riscv-mcpu-texi.cc 
b/gcc/config/riscv/gen-riscv-mcpu-texi.cc
new file mode 100644
index ..980a1103e0f9
--- /dev/null
+++ b/gcc/config/riscv/gen-riscv-mcpu-texi.cc
@@ -0,0 +1,43 @@
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  puts ("@c Copyright (C) 2025 Free Software Foundation, Inc.");
+  puts ("@c This is part of the GCC manual.");
+  puts ("@c For copying conditions, see the file gcc/doc/include/fdl.texi.");
+  puts ("");
+  puts ("@c This file is generated automatically using");
+  puts ("@c  gcc/config/riscv/gen-riscv-mcpu-texi.cc from:");
+  puts ("@c   gcc/config/riscv/riscv-cores.def");
+  puts ("");
+  puts ("@c Please *DO NOT* edit manually.");
+  puts ("");
+  puts ("@samp{Core Name}");
+  puts ("");
+  puts ("@opindex mcpu");
+  puts ("@item -mcpu=@var{processor-string}");
+  puts ("Use architecture of and optimize the output for the given processor, 
specified");
+  puts ("by particular CPU name. Permissible values for this option are:");
+  puts ("");
+  puts ("");
+
+  std::vector coreNames;
+
+#define RISCV_CORE(CORE_NAME, ARCH, MICRO_ARCH) \
+  coreNames.push_back (#CORE_NAME);
+#include "riscv-cores.def"
+#undef RISCV_CORE
+
+  for (size_t i = 0; i < coreNames.size(); ++i) {
+if (i == coreNames.size() - 1) {
+  printf("@samp{%s}.\n", coreNames[i].c_str());
+} else {
+  printf("@samp{%s},\n\n", coreNames[i].c_str());
+}
+  }
+
+  return 0;
+}
diff --git a/gcc/config/riscv/gen-riscv-mtune-texi.cc 
b/gcc/config/riscv/gen-riscv-mtune-texi.cc
new file mode 100644
index ..0c30b524895e
--- /dev/null
+++ b/gcc/config/riscv/gen-riscv-mtune-texi.cc
@@ -0,0 +1,41 @@
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  puts ("@c Copyright (C) 2025 Free Software Foundation, Inc.");
+  puts ("@c This is part of the GCC manual.");
+  puts ("@c For copying conditions, see the file gcc/doc/include/fdl.texi.");
+  puts ("");
+  puts ("@c This file is generated automatically using");
+  puts ("@c  gcc/config/riscv/gen-riscv-mtune-texi.cc from:");
+  puts ("@c   gcc/config/riscv/riscv-cores.def");
+  puts ("");
+  puts ("@c Please *DO NOT* edit manually.");
+  puts ("");
+  puts ("@samp{Tune Name}");
+  puts ("");
+  puts ("@opindex mtune");
+  puts ("@item -mtune=@var{processor-string}");
+  puts ("Optimize the output for the given processor, specified by 
microarchitecture or");
+  puts ("particular CPU name.  Permissible values for this option are:");
+  puts ("");
+  puts ("");
+
+  std::vector tuneNames;
+
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
+  tuneNames.push_back (#TUNE_NAME);
+#include "riscv-cores.def"
+#undef RISCV_TUNE
+
+  for (size_t i = 0; i < tuneNames.size(); ++i) {
+printf("@samp{%s},\n\n

[PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644]

2025-06-25 Thread Nathaniel Shead

On Tue, Jun 24, 2025 at 12:10:09PM -0400, Patrick Palka wrote:
> On Tue, 24 Jun 2025, Jason Merrill wrote:
> 
> > On 6/23/25 5:41 PM, Nathaniel Shead wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?
> > > 
> > > -- >8 --
> > > 
> > > We were erroring because the TEMPLATE_DECL of the existing partial
> > > specialisation has an undeduced return type, but the imported
> > > declaration did not.
> > > 
> > > The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
> > > where modules streaming code assumes that a TEMPLATE_DECL and its
> > > DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
> > > fixed the issue by ensuring that when the type of a variable is deduced
> > > the TEMPLATE_DECL is updated as well, but this missed handling partial
> > > specialisations.
> > > 
> > > However, I don't think we actually care about that, since it seems that
> > > only the type of the inner decl actually matters in practice.  Instead,
> > > this patch handles the issue on the modules side when deduping a
> > > streamed decl, by only comparing the inner type.
> > > 
> > >   PR c++/120644
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * decl.cc (cp_finish_decl): Remove workaround.
> > 
> > Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL 
> > correct,
> > maybe we should always set it to NULL_TREE to make sure we only look at the
> > inner type.
> 
> FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL
> corresponding to a partial specialization via
> 
>  TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl)))
> 
> if we do want to end up keeping the two TREE_TYPEs in sync.
> 

Thanks.  On further reflection, maybe the safest approach is to just
ensure that the types are always consistent (including for partial
specs); this is what the following patch does.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Subject: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is
 consistent with its template [PR120644]

We were erroring because the TEMPLATE_DECL of the existing partial
specialisation has an undeduced return type, but the imported
declaration did not.

The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
where modules streaming code assumes that a TEMPLATE_DECL and its
DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
fixed the issue by ensuring that when the type of a variable is deduced
the TEMPLATE_DECL is updated as well, but missed handling partial
specialisations.  This patch ensures that the same adjustment is made
there as well.

PR c++/120644

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Also propagate type to partial
templates.
* module.cc (trees_out::decl_value): Add assertion that the
TREE_TYPE of a streamed template decl matches its inner.
(trees_in::is_matching_decl): Clarify function return type
deduction should only occur for non-TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/auto-7.h: New test.
* g++.dg/modules/auto-7_a.H: New test.
* g++.dg/modules/auto-7_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
Reviewed-by: Patrick Palka 
---
 gcc/cp/decl.cc  | 13 +
 gcc/cp/module.cc|  7 ++-
 gcc/testsuite/g++.dg/modules/auto-7.h   | 12 
 gcc/testsuite/g++.dg/modules/auto-7_a.H |  5 +
 gcc/testsuite/g++.dg/modules/auto-7_b.C |  5 +
 5 files changed, 37 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h
 create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4fe97ffbf8f..59701197e16 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8923,10 +8923,15 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 
   /* Update the type of the corresponding TEMPLATE_DECL to match.  */
-  if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl)
- && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl)
-   TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type;
+  if (DECL_LANG_SPECIFIC (decl) && DECL_TEMPLATE_INFO (decl))
+   {
+ tree info = DECL_TEMPLATE_INFO (decl);
+ tree tmpl = TI_TEMPLATE (info);
+ if (DECL_TEMPLATE_RESULT (tmpl) == decl)
+   TREE_TYPE (tmpl) = type;
+ else if (PRIMARY_TEMPLATE_P (tmpl) && TI_PARTIAL_INFO (info))
+   TREE_TYPE (TI_TEMPLATE (TI_PARTIAL_INFO (info))) = type;
+   }
 }
 
   if (ensure_literal_type_for_constexpr_object (decl) == error_mark_node)
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c99988da05b..53edb2ff203 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8212,6 +8212,10 @@ tr

[PATCH 02/17] Mark pass_sccopy gate and execute functions as final override

2025-06-25 Thread Martin Jambor

Hi,

It is customary to mark the gate and execute functions of the classes
representing passes as final override but this is missing in
pass_sccopy.  This patch adds it which also silences clang warnings
about it.

Bootstrapped and tested on x86_64-linux. Because of the precedent
elsewhere I consider this obvious and will commit it shortly.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* gimple-ssa-sccopy.cc (class pass_sccopy): Mark member functions
gate and execute as final override.
---
 gcc/gimple-ssa-sccopy.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index c93374572a9..341bae46080 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -699,8 +699,8 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return true; }
-  virtual unsigned int execute (function *);
+  virtual bool gate (function *) final override { return true; }
+  virtual unsigned int execute (function *) final override;
   opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
 }; // class pass_sccopy
 
-- 
2.49.0

[PATCH 11/17] tree-vect-stmts.cc: Remove an unused shadowed variable

2025-06-25 Thread Martin Jambor

Hi,

when compiling tree-vect-stmts.cc with clang, it emits a warning:

  gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' 
[-Wunused-variable]

And indeed, there are two mode_iter local variables in function
supportable_indirect_convert_operation and the first one is not used
at all.  This patch removes it.

Bootstrapped and tested on x86_64-linx.  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* tree-vect-stmts.cc (supportable_indirect_convert_operation):
Remove an unused shadowed variable.
---
 gcc/tree-vect-stmts.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f699d808e68..652c590e553 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14927,7 +14927,6 @@ supportable_indirect_convert_operation (code_helper 
code,
   bool found_mode = false;
   scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_out));
   scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_in));
-  opt_scalar_mode mode_iter;
   tree_code tc1, tc2, code1, code2;
 
   tree cvt_type = NULL_TREE;
-- 
2.49.0

[PATCH 09/17] jit: Silence clang warning in jit-builtins.cc

2025-06-25 Thread Martin Jambor

Hi,

When compiling GCC (with JIT enabled) by clang, it produces a series
of warning s like this for all uses of DEF_GOACC_BUILTIN_COMPILER and
DEF_GOMP_BUILTIN_COMPILER in omp-builtins.def:

--
  In file included from 
/home/worker/buildworker/tiber-gcc-clang/build/gcc/jit/jit-builtins.cc:61:
  In file included from 
/home/worker/buildworker/tiber-gcc-clang/build/gcc/builtins.def:1276:
  /home/worker/buildworker/tiber-gcc-clang/build/gcc/omp-builtins.def:55:1: 
warning: non-constant-expression cannot be narrowed from type 'int' to 'bool' 
in initializer list [-Wc++11-narrowing]
 55 | DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
| ^~~~
 56 | BT_FN_INT_INT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
| 

  /home/worker/buildworker/tiber-gcc-clang/build/gcc/builtins.def:225:9: note: 
expanded from macro 'DEF_GOACC_BUILTIN_COMPILER'
225 |flag_openacc, true, true, ATTRS, false, true)
|^~~~
  ./options.h:7049:22: note: expanded from macro 'flag_openacc'
   7049 | #define flag_openacc global_options.x_flag_openacc
|  ^
  /home/worker/buildworker/tiber-gcc-clang/build/gcc/jit/jit-builtins.cc:58:23: 
note: expanded from macro 'DEF_BUILTIN'
 58 |   {NAME, CLASS, TYPE, BOTH_P, FALLBACK_P, ATTRS, IMPLICIT},
|   ^~
  /home/worker/buildworker/tiber-gcc-clang/build/gcc/omp-builtins.def:55:1: 
note: insert an explicit cast to silence this issue
--

I'm not sure to what extent this is an actual problem or not, but
flag_openacc is an int and we do store it in a bool, so I this patch
does add the explicit cast clang asks for.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warnings instead.

Thanks,

Martin


gcc/jit/ChangeLog:

2025-06-23  Martin Jambor  

* jit-builtins.cc (DEF_BUILTIN): Add explicit cast to bool of BOTH_P.
---
 gcc/jit/jit-builtins.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-builtins.cc b/gcc/jit/jit-builtins.cc
index 84e0bd5347f..ddbba55d3f3 100644
--- a/gcc/jit/jit-builtins.cc
+++ b/gcc/jit/jit-builtins.cc
@@ -55,7 +55,7 @@ struct builtin_data
 
 #define DEF_BUILTIN(X, NAME, CLASS, TYPE, LT, BOTH_P, FALLBACK_P, \
NONANSI_P, ATTRS, IMPLICIT, COND) \
-  {NAME, CLASS, TYPE, BOTH_P, FALLBACK_P, ATTRS, IMPLICIT},
+  {NAME, CLASS, TYPE, (bool) BOTH_P, FALLBACK_P, ATTRS, IMPLICIT},
 static const struct builtin_data builtin_data[] =
 {
 #include "builtins.def"
-- 
2.49.0

[PATCH 12/17] Silence a clang warning in tree-vect-slp.cc about an unused variable

2025-06-25 Thread Martin Jambor

Hi,

since r15-4695-gd17e672ce82e69 (Richard Biener: Assert finished
vectorizer pattern COND_EXPR transition), the static const array
cond_expr_maps is unused and when GCC is compiled with clang, it warns
about that.

This patch simply removes the variable.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* tree-vect-slp.cc (cond_expr_maps): Remove.
---
 gcc/tree-vect-slp.cc | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index dc89da3bf17..39692ea9465 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -507,11 +507,6 @@ vect_def_types_match (enum vect_def_type dta, enum 
vect_def_type dtb)
  && (dtb == vect_external_def || dtb == vect_constant_def)));
 }
 
-static const int cond_expr_maps[3][5] = {
-  { 4, -1, -2, 1, 2 },
-  { 4, -2, -1, 1, 2 },
-  { 4, -1, -2, 2, 1 }
-};
 static const int no_arg_map[] = { 0 };
 static const int arg0_map[] = { 1, 0 };
 static const int arg1_map[] = { 1, 1 };
-- 
2.49.0

[PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override

2025-06-25 Thread Martin Jambor

Hi,

When GCC is compiled with clang, it emits a warning that
dom_oracle::next_relation is not marked as override even though it
does override a virtual function of its ancestor.  This patch marks it
as such to silence the warning and for the sake of consistency.

There are other member functions in the class which are marked as
final override but this particular function is in the protected
section so I decided to just mark it as override.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* value-relation.h (class dom_oracle): Mark member function
next_relation as override.
---
 gcc/value-relation.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index 1081877ccca..87f0d856fab 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -235,7 +235,7 @@ public:
   void dump (FILE *f) const final override;
 protected:
   virtual relation_chain *next_relation (basic_block, relation_chain *,
-tree) const;
+tree) const override;
   bool m_do_trans_p;
   bitmap m_tmp, m_tmp2;
   bitmap m_relation_set;  // Index by ssa-name. True if a relation exists
-- 
2.49.0

[PATCH 14/17] c-format: Removed unused private member

2025-06-25 Thread Martin Jambor

Hi,

when building GCC with clang, it warns that the private member suffix
in class element_expected_type_with_indirection (defined in
gcc/c-family/c-format.cc) is not used which indeed looks like it is
the case.  This patch therefore removes it.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/c-family/ChangeLog:

2025-06-24  Martin Jambor  

* c-format.cc (class element_expected_type_with_indirection):
Remove member m_wanted_type.
---
 gcc/c-family/c-format.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index a44249a0222..1fdda3faaf5 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4817,7 +4817,6 @@ public:
 
 private:
   const char *m_wanted_type_name;
-  tree m_wanted_type;
   int m_pointer_count;
 };
 
-- 
2.49.0

[COMMITTED] - get_bitmask is sometimes less refined.

2025-06-25 Thread Andrew MacLeod

While looking at something else, I decided to write some self-tests for 
the bound-snapping  changes.


Along the way, I discovered a couple of things.

This patch has the self tests, and they tripped over an issue with 
get_bitmask ().   get_bitmask () takes the current mask, and intersect 
it with a mask derived from the lower and upper bounds, giving us useful 
results.  However, when the 2 masks are incompatible, it was  returning 
bitmask_unknown, which is akin to a VARYING result.   It has no way of 
communicating an UNDEFINED result, which would be more appropriate.


Instead, it should just return the original mask.  Any undefined results 
will show up eventually when set_range_from_bitmask () is called.


This patch provides the updated get_bitmask as well as all the self tests.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.   Pushed.

Andrew
From 5ae33c8f44f0112644b561dfc549c1dc2c679b6f Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 24 Jun 2025 13:10:56 -0400
Subject: [PATCH 1/3] get_bitmask is sometimes less refined.

get_bitmask intersects the current mask with a mask generated from the
range.  If the 2 masks are incompatible, it currently returns UNKNOWN.
Instead, ti should return the original mask or information is lost.

	* value-range.cc (irange::get_bitmask): Return original mask if
	result is unknown.
	(assert_snap_result): New.
	(test_irange_snap_bounds): New.
	(range_tests_misc): Call test_irange_snap_bounds.
---
 gcc/value-range.cc | 117 -
 1 file changed, 116 insertions(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 23a5c66ed5e..85c1e26287e 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -2513,7 +2513,13 @@ irange::get_bitmask () const
   // See also the note in irange_bitmask::intersect.
   irange_bitmask bm (type (), lower_bound (), upper_bound ());
   if (!m_bitmask.unknown_p ())
-bm.intersect (m_bitmask);
+{
+  bm.intersect (m_bitmask);
+  // If the new intersection is unknown, it means there are inconstent
+  // bits, so simply return the original bitmask.
+  if (bm.unknown_p ())
+	return m_bitmask;
+}
   return bm;
 }
 
@@ -2879,6 +2885,112 @@ range_tests_strict_enum ()
   ASSERT_FALSE (ir1.varying_p ());
 }
 
+// Test that range bounds are "snapped" to where they are expected to be.
+
+static void
+assert_snap_result (int lb_val, int ub_val,
+		int expected_lb, int expected_ub,
+		unsigned mask_val, unsigned value_val,
+		tree type)
+{
+  wide_int lb = wi::shwi (lb_val, TYPE_PRECISION (type));
+  wide_int ub = wi::shwi (ub_val, TYPE_PRECISION (type));
+  wide_int new_lb, new_ub;
+
+  irange_bitmask bm (wi::uhwi (value_val, TYPE_PRECISION (type)),
+		 wi::uhwi (mask_val, TYPE_PRECISION (type)));
+
+  int_range_max r (type);
+  r.set (type, lb, ub);
+  r.update_bitmask (bm);
+
+  if (TYPE_SIGN (type) == SIGNED && expected_ub < expected_lb)
+gcc_checking_assert (r.undefined_p ());
+  else if (TYPE_SIGN (type) == UNSIGNED
+	   && ((unsigned)expected_ub < (unsigned)expected_lb))
+gcc_checking_assert (r.undefined_p ());
+  else
+{
+  gcc_checking_assert (wi::eq_p (r.lower_bound (),
+ wi::shwi (expected_lb,
+	   TYPE_PRECISION (type;
+  gcc_checking_assert (wi::eq_p (r.upper_bound (),
+ wi::shwi (expected_ub,
+	   TYPE_PRECISION (type;
+}
+}
+
+
+// Run a selection of tests that confirm, bounds are snapped as expected.
+// We only test individual pairs, multiple pairs use the same snapping
+// routine as single pairs.
+
+static void
+test_irange_snap_bounds ()
+{
+  tree u32 = unsigned_type_node;
+  tree s32 = integer_type_node;
+  tree s8 = build_nonstandard_integer_type (8, /*unsigned=*/ 0);
+  tree s1 = build_nonstandard_integer_type (1, /*unsigned=*/ 0);
+  tree u1 = build_nonstandard_integer_type (1, /*unsigned=*/ 1);
+
+  // Basic aligned range: even-only
+  assert_snap_result (5, 15, 6, 14, 0xFFFE, 0x0, u32);
+  // Singleton that doesn't match mask: undefined.
+  assert_snap_result (7, 7, 1, 0, 0xFFFE, 0x0, u32);
+  // 8-bit signed char, mask 0xF0 (i.e. step of 16).
+  assert_snap_result (-100, 100, -96, 96, 0xF0, 0x00, s8);
+  // Already aligned range: no change.
+  assert_snap_result (0, 240, 0, 240, 0xF0, 0x00, u32);
+  // Negative range, step 16 alignment (s32).
+  assert_snap_result (-123, -17, -112, -32, 0xFFF0, 0x00, s32);
+  // Negative range, step 16 alignment (trailing-zero aligned mask).
+  assert_snap_result (-123, -17, -112, -32, 0xFFF0, 0x00, s32);
+  // s8, 16-alignment mask, value = 0 (valid).
+  assert_snap_result (-50, 10, -48, 0, 0xF0, 0x00, s8);
+  // No values in range [-3,2] match alignment except 0.
+  assert_snap_result (-3, 2, 0, 0, 0xF8, 0x00, s8);
+  // No values in range [-3,2] match alignment — undefined.
+  assert_snap_result (-3, 2, 1, 0, 0xF8, 0x04, s8);
+  // Already aligned range: no change.
+  assert_snap_result (0,

[COMMITTED] Promote verify_range to vrange.

2025-06-25 Thread Andrew MacLeod

Another thing I noticed is that verifying a range outside of private 
constraints was actually quite difficult.


Most range classes had a verify_range () routine, but they were private, 
not constant, and impossible to invoke if we were in a situation where 
all we had a was a vrange.


This patch promotes verify_range () to a public call in vrange, makes it 
virtual, provides a hook in value-range, and ensure it just works 
everywhere.   There is no current need for the call, but it sure would 
have been handy earlier in the week.  Lets just make it consistent 
across all range classes.


 Bootstraps on x86_64-pc-linux-gnu  with no regressions. Pushed.

Andrew
From 8213212eba1cad976823716c0c4ba835c842d0b2 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 19 Jun 2025 21:19:27 -0400
Subject: [PATCH 2/3] Promote verify_range to vrange.

most range classes had a verufy_range, but it was all private. Make it a
supported routine from vrange.

	* value-range.cc (frange::verify_range): Constify.
	(irange::verify_range): Constify.
	* value-range.h (vrange::verify_range): New.
	(irange::verify_range): Make public.
	(prange::verify_range): Make public.
	(prange::verify_range): Make public.
	(value_range::verify_range): New.
---
 gcc/value-range.cc | 4 ++--
 gcc/value-range.h  | 9 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 85c1e26287e..dc6909e77c5 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1205,7 +1205,7 @@ frange::supports_type_p (const_tree type) const
 }
 
 void
-frange::verify_range ()
+frange::verify_range () const
 {
   if (!undefined_p ())
 gcc_checking_assert (HONOR_NANS (m_type) || !maybe_isnan ());
@@ -1515,7 +1515,7 @@ irange::set (tree min, tree max, value_range_kind kind)
 // Check the validity of the range.
 
 void
-irange::verify_range ()
+irange::verify_range () const
 {
   gcc_checking_assert (m_discriminator == VR_IRANGE);
   if (m_kind == VR_UNDEFINED)
diff --git a/gcc/value-range.h b/gcc/value-range.h
index c32c5076b63..5c358f3c70c 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -111,6 +111,7 @@ public:
   bool operator== (const vrange &) const;
   bool operator!= (const vrange &r) const { return !(*this == r); }
   void dump (FILE *) const;
+  virtual void verify_range () const { }
 protected:
   vrange (enum value_range_discriminator d) : m_discriminator (d) { }
   ENUM_BITFIELD(value_range_kind) m_kind : 8;
@@ -323,6 +324,7 @@ public:
   virtual void update_bitmask (const class irange_bitmask &) override;
   virtual irange_bitmask get_bitmask () const override;
 
+  virtual void verify_range () const;
 protected:
   void maybe_resize (int needed);
   virtual void set (tree, tree, value_range_kind = VR_RANGE) override;
@@ -335,7 +337,6 @@ protected:
 
   void normalize_kind ();
 
-  void verify_range ();
 
   // Hard limit on max ranges allowed.
   static const int HARD_MAX_RANGES = 255;
@@ -421,7 +422,7 @@ public:
   bool contains_p (const wide_int &) const;
   wide_int lower_bound () const;
   wide_int upper_bound () const;
-  void verify_range () const;
+  virtual void verify_range () const;
   irange_bitmask get_bitmask () const final override;
   void update_bitmask (const irange_bitmask &) final override;
 protected:
@@ -593,14 +594,13 @@ public:
   bool nan_signbit_p (bool &signbit) const;
   bool known_isnormal () const;
   bool known_isdenormal_or_zero () const;
-
+  virtual void verify_range () const;
 protected:
   virtual bool contains_p (tree cst) const override;
   virtual void set (tree, tree, value_range_kind = VR_RANGE) override;
 
 private:
   bool internal_singleton_p (REAL_VALUE_TYPE * = NULL) const;
-  void verify_range ();
   bool normalize_kind ();
   bool union_nans (const frange &);
   bool intersect_nans (const frange &);
@@ -798,6 +798,7 @@ public:
   void update_bitmask (const class irange_bitmask &bm)
   { return m_vrange->update_bitmask (bm); }
   void accept (const vrange_visitor &v) const { m_vrange->accept (v); }
+  void verify_range () const { m_vrange->verify_range (); }
 private:
   void init (tree type);
   void init (const vrange &);
-- 
2.45.0

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-25 Thread Richard Biener

On Wed, 25 Jun 2025, Robin Dapp wrote:

> Hi,
> 
> this patch adds simple misalignment checks for gather/scatter
> operations.  Previously, we assumed that those perform element accesses
> internally so alignment does not matter.  The riscv vector spec however
> explicitly states that vector operations are allowed to fault on
> element-misaligned accesses.  Reasonable uarchs won't, but...
> 
> For gather/scatter we have two paths in the vectorizer:
> 
> (1) Regular analysis based on datarefs.  Here we can also create
> strided loads.
> (2) Non-affine access where each gather index is relative to the
> initial address.
> 
> The assumption this patch works off is that once the alignment for the
> first scalar is correct, all others will fall in line, as the index is
> always a multiple of the first element's size.
> 
> For (1) we have a dataref and can check it for alignment as in other
> cases.  For (2) this patch checks the object alignment of BASE and
> compares it against the natural alignment of the current vectype's unit.
> 
> The patch also adds a pointer argument to the gather/scatter IFNs that
> contains the necessary alignment.  Most of the patch is thus mechanical
> in that it merely adjusts indices.
> 
> I tested the riscv version with a custom qemu version that faults on
> element-misaligned vector accesses.  With this patch applied, there is
> just a single fault left, which is due to PR120782 and which will be
> addressed separately.
> 
> Is the general approach reasonable or do we need to do something else
> entirely?  Bootstrap and regtest on aarch64 went fine.
> 
> I couldn't bootstrap/regtest on x86 as my regular cfarm machines
> (420-422) are currently down.  Issues are expected, though, as the patch
> doesn't touch x86's old-style gathers/scatters at all yet.  I still
> wanted to get this initial version out there to get feedback.
> 
> The two riscv-specific changes I can still split off, obviously.
> Also, I couldn't help but do tiny refactoring in some spots :)  This
> could also go if requested.
> 
> I noticed one early-break failure with the changes where we would give
> up on a load_permutation of {0}.  It looks latent and probably
> unintended but I didn't investigate for now and just allowed this
> specific permutation.

This change reminds me that we lack documentation about arguments
of most of the "complicated" internal functions ...

We miss internal_fn_gatherscatter_{offset,scale}_index and possibly
a internal_fn_ldst_ptr_index (always zero?) and
internal_fn_ldst_alias_align_index (always one, if supported?).

  if (elsvals && icode != CODE_FOR_nothing)
 get_supported_else_vals
-  (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, 
*elsvals);
+  (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals);

these "fixes" seem to be independent?

+  /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment
+ suffice?  */
+  bool is_packed = not_size_aligned (DR_REF (dr));
+  info->align_ptr = build_int_cst
+(reference_alias_ptr_type (DR_REF (dr)),
+ is_packed ? 1 : get_object_alignment (DR_REF (dr)));

I think get_object_alignment should be sufficient.

+  gs_info->align_ptr = build_int_cst
+   (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr));

why's this?  If DR_BASE_ALIGNMENT is bigger than element alignment
it could be possibly not apply to all loads forming the gather?

@@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
   || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 *poffset = neg_ldst_offset;

-  if (*memory_access_type == VMAT_GATHER_SCATTER
-  || *memory_access_type == VMAT_ELEMENTWISE
+  if (*memory_access_type == VMAT_ELEMENTWISE

this probably needs some refactoring with the adjustments you
do in get_load_store_type given a few lines above we can end up
classifying a load/store as VMAT_GATHER_SCATTER if
vect_use_strided_gather_scatters_p.  But then you'd use the
wrong alignment analysis going forward.

+  bool is_misaligned = scalar_align < inner_vectype_sz;
+  bool is_packed = scalar_align > 1 && is_misaligned;
+
+  *misalignment = !is_misaligned ? 0 : inner_vectype_sz - 
scalar_align;
+
+  if (targetm.vectorize.support_vector_misalignment
+ (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed))

the misalignment argument is meaningless, I think you want to
pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed
if the scalars acesses are not at least size aligned.

Note the hook really doesn't know whether you ask it for gather/scatter
or a contiguous vector load so I wonder whether the above fits
constraints on other platforms where scalar accesses might be
allowed to be packed but all unaligned vector accesses would need
to be element aligned?

+  /* The alignment_ptr of the base.  */

The TBAA alias pointer type where the value determines the alignment
of the scalar accesses.

+  tree

Re: [PATCH 1/2] Match: Support for signed scalar SAT_ADD IMM form 2

2025-06-25 Thread Richard Biener

On Tue, Jun 24, 2025 at 5:12 AM Ciyan Pan  wrote:
>
> From: panciyan 
>
> This patch would like to support signed scalar SAT_ADD IMM form 2
>
> Form2:
> T __attribute__((noinline))  \
> sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)\
> {\
>   T sum = (T)((UT)x + (UT)IMM);   \
>   return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \
> (-(T)(x < 0) ^ MAX) : sum; \
> }
>
> Take below form1 as example:
> DEF_SAT_S_ADD_IMM_FMT_2(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)
>
> Before this patch:
> __attribute__((noinline))
> int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
> {
>   int8_t sum;
>   unsigned char x.0_1;
>   unsigned char _2;
>   signed char _3;
>   signed char _4;
>   _Bool _5;
>   signed char _6;
>   int8_t _7;
>   int8_t _10;
>   signed char _11;
>   signed char _13;
>   signed char _14;
>
>[local count: 1073741822]:
>   x.0_1 = (unsigned char) x_8(D);
>   _2 = x.0_1 + 9;
>   sum_9 = (int8_t) _2;
>   _3 = x_8(D) ^ sum_9;
>   _4 = x_8(D) ^ 9;
>   _13 = ~_3;
>   _14 = _4 | _13;
>   if (_14 >= 0)
> goto ; [59.00%]
>   else
> goto ; [41.00%]
>
>[local count: 259738146]:
>   _5 = x_8(D) < 0;
>   _11 = (signed char) _5;
>   _6 = -_11;
>   _10 = _6 ^ 127;
>
>[local count: 1073741824]:
>   # _7 = PHI 
>   return _7;
>
> }
>
> After this patch:
> __attribute__((noinline))
> int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
> {
>   int8_t _7;
>
>[local count: 1073741824]:
>   _7 = .SAT_ADD (x_8(D), 9); [tail call]
>   return _7;
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
> Signed-off-by: Ciyan Pan 
> gcc/ChangeLog:
>
> * match.pd:

OK with sth filled in here.

Richard.

>
> ---
>  gcc/match.pd | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f4416d9172c..10c2b97f494 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3500,7 +3500,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   wide_int c2 = wi::to_wide (@2);
>   wide_int sum = wi::add (c1, c2);
>  }
> -(if (wi::eq_p (sum, wi::max_value (precision, SIGNED)))
> +(if (wi::eq_p (sum, wi::max_value (precision, SIGNED))
> +
> +(match (signed_integer_sat_add @0 @1)
> +  /* T SUM = (T)((UT)X + (UT)IMM)
> + SAT_S_ADD = (X ^ SUM) < 0 && (X ^ IMM) >= 0 ? (-(T)(X < 0) ^ MAX) : SUM 
>  */
> +   (cond^ (ge (bit_ior:c (bit_xor:c @0 INTEGER_CST@1)
> +   (bit_not (bit_xor:c @0 (nop_convert@2 (plus 
> (nop_convert @0)
> +   INTEGER_CST@3)
> +   integer_zerop)
> +   (signed_integer_sat_val @0)
> +   @2)
> +   (if (wi::eq_p (wi::to_wide (@1), wi::to_wide (@3))
>
>  /* Saturation sub for signed integer.  */
>  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
> --
> 2.43.0
>

[PATCH 01/17] Mark rtl_avoid_store_forwarding functions final override

2025-06-25 Thread Martin Jambor

Hi,

It is customary to mark the gate and execute functions of the classes
representing passes as final override but this is missing in
pass_rtl_avoid_store_forwarding.  This patch adds it which also
silences a clang warning about it.

Bootstrapped and tested on x86_64-linux.  Because of the precedent
elsewhere I consider this obvious and will commit it shortly.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* avoid-store-forwarding.cc (class
pass_rtl_avoid_store_forwarding): Mark member function gate as
final override.
---
 gcc/avoid-store-forwarding.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
index 6825d0426ec..37e095316c9 100644
--- a/gcc/avoid-store-forwarding.cc
+++ b/gcc/avoid-store-forwarding.cc
@@ -80,12 +80,12 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *)
+  virtual bool gate (function *) final override
 {
   return flag_avoid_store_forwarding && optimize >= 1;
 }
 
-  virtual unsigned int execute (function *) override;
+  virtual unsigned int execute (function *) final override;
 }; // class pass_rtl_avoid_store_forwarding
 
 /* Handler for finding and avoiding store forwardings.  */
-- 
2.49.0

[PATCH 03/17] Diagnostics: Mark path_label::get_effects as final override

2025-06-25 Thread Martin Jambor

Hi,

When compiling diagnostic-path-output.cc with clang, it warns that
path_label::get_effects should be marked as override.  That looks like
a good idea and from a brief look I also believe it should be marked
as final (the other override in the class is marked as both), so this
patch does that.

Likewise for html_output_format::after_diagnostic in
diagnostic-format-html.cc which also already has quite a few member
functions marked as final override.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning(s) instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* diagnostic-path-output.cc (path_label::get_effects): Mark as
final override.
* diagnostic-format-html.cc
(html_output_format::after_diagnostic): Likewise.
---
 gcc/diagnostic-format-html.cc | 2 +-
 gcc/diagnostic-path-output.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/diagnostic-format-html.cc b/gcc/diagnostic-format-html.cc
index 45d088150dd..b2c7214d7f1 100644
--- a/gcc/diagnostic-format-html.cc
+++ b/gcc/diagnostic-format-html.cc
@@ -1201,7 +1201,7 @@ public:
   {
 m_builder.emit_diagram (diagram);
   }
-  void after_diagnostic (const diagnostic_info &)
+  void after_diagnostic (const diagnostic_info &) final override
   {
 /* No-op, but perhaps could show paths here.  */
   }
diff --git a/gcc/diagnostic-path-output.cc b/gcc/diagnostic-path-output.cc
index bae24bf01a7..4bec3a66267 100644
--- a/gcc/diagnostic-path-output.cc
+++ b/gcc/diagnostic-path-output.cc
@@ -135,7 +135,7 @@ class path_label : public range_label
 return result;
   }
 
-  const label_effects *get_effects (unsigned /*range_idx*/) const
+  const label_effects *get_effects (unsigned /*range_idx*/) const final 
override
   {
 return &m_effects;
   }
-- 
2.49.0

[PATCH 05/17] tree-ssa-propagate.h: Mark two functions as override

2025-06-25 Thread Martin Jambor




When tree-ssa-propagate.h is compiled with clang, it complains that
member functions functions value_of_expr and range_of_expr of class
substitute_and_fold_engine are not marked as override even though they
do override virtual functions of the ancestor class.  This patch
merely adds the keyword to silence the warning and for consistency's
sake.

I did not make this part of the previous patch because I wanted to
point out that the first case is quite unusual, a virtual function
with a functional body (range_query::value_of_expr) is being
overridden with a pure virtual function.  I assume it was a conscious
decision but adding the override keyword seems even more important
then.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning(s) instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* tree-ssa-propagate.h (class substitute_and_fold_engine): Mark
member functions value_of_expr and range_of_expr as override.
---
 gcc/tree-ssa-propagate.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h
index 8429e38f40e..200fc732079 100644
--- a/gcc/tree-ssa-propagate.h
+++ b/gcc/tree-ssa-propagate.h
@@ -102,10 +102,10 @@ class substitute_and_fold_engine : public range_query
   substitute_and_fold_engine (bool fold_all_stmts = false)
 : fold_all_stmts (fold_all_stmts) { }
 
-  virtual tree value_of_expr (tree expr, gimple * = NULL) = 0;
+  virtual tree value_of_expr (tree expr, gimple * = NULL) override = 0;
   virtual tree value_on_edge (edge, tree expr) override;
   virtual tree value_of_stmt (gimple *, tree name = NULL) override;
-  virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL);
+  virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL) override;
 
   virtual ~substitute_and_fold_engine (void) { }
   virtual bool fold_stmt (gimple_stmt_iterator *) { return false; }
-- 
2.49.0

[PATCH v3] Evaluate the object size by the size of the pointee type when the type is a structure with flexible array member which is annotated with counted_by.

2025-06-25 Thread Qing Zhao

Hi, 

This is the 3rd version of the patch for:

Evaluate the object size by the size of the pointee type when the type
is a structure with flexible array member which is annotated with
counted_by. 

Compared to the 2nd version of the patch at:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682923.html

The major changes include:
   A. Add a new --param objsz-allow-dereference-input=0|1 to control this 
feature;
   B. Some code reorg to the routine "insert_cond_and_size" to make it more 
readable.

The patch has been bootstrapped and regression tested on both x86 and aarch64.

Okay for trunk?

thanks.

Qing

===
In tree-object-size.cc, if the size is UNKNOWN after evaluating use-def
chain, We can evaluate the SIZE of the pointee TYPE ONLY when this TYPE
is a structure type with flexible array member which is attached a
counted_by attribute, since a structure with FAM can not be an element
of an array, so, the pointer must point to a single object with this
structure with FAM.

Control this behavior with a new --param objsz-allow-dereference-input=0|1
Default is 0.

This is only available for C now.

gcc/c/ChangeLog:

* c-lang.cc (LANG_HOOKS_BUILD_COUNTED_BY_REF):
Define to below function.
* c-tree.h (c_build_counted_by_ref): New extern function.
* c-typeck.cc (build_counted_by_ref): Rename to ...
(c_build_counted_by_ref): ...this.
(handle_counted_by_for_component_ref): Call the renamed function.

gcc/ChangeLog:

* doc/invoke.texi: Add documentation for the new option
--param objsz-allow-dereference-input.
* langhooks-def.h (LANG_HOOKS_BUILD_COUNTED_BY_REF):
New language hook.
* langhooks.h (struct lang_hooks_for_types): Add
build_counted_by_ref.
* params.opt: New param objsz-allow-dereference-input.
* tree-object-size.cc (struct object_size_info): Add a new field
insert_cf.
(insert_cond_and_size): New function.
(gimplify_size_expressions): Handle new field insert_cf.
(compute_builtin_object_size): Init the new field to false;
(is_pointee_fam_struct_with_counted_by): New function.
(record_with_fam_object_size): New function.
(collect_object_sizes_for): Call record_with_fam_object_size.
(dynamic_object_sizes_execute_one): Special handling for insert_cf.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-3.c: Update test for whole object size;
* gcc.dg/flex-array-counted-by-4.c: Likewise.
* gcc.dg/flex-array-counted-by-5.c: Likewise.
* gcc.dg/flex-array-counted-by-10.c: New test.
---
 gcc/c/c-lang.cc   |   3 +
 gcc/c/c-tree.h|   1 +
 gcc/c/c-typeck.cc |   6 +-
 gcc/doc/invoke.texi   |  13 +
 gcc/langhooks-def.h   |   4 +-
 gcc/langhooks.h   |   5 +
 gcc/params.opt|   4 +
 .../gcc.dg/flex-array-counted-by-10.c |  41 +++
 .../gcc.dg/flex-array-counted-by-3.c  |   7 +-
 .../gcc.dg/flex-array-counted-by-4.c  |  36 ++-
 .../gcc.dg/flex-array-counted-by-5.c  |   6 +-
 gcc/tree-object-size.cc   | 305 +-
 12 files changed, 406 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-10.c

diff --git a/gcc/c/c-lang.cc b/gcc/c/c-lang.cc
index c69077b2a93..e9ec9e6e64a 100644
--- a/gcc/c/c-lang.cc
+++ b/gcc/c/c-lang.cc
@@ -51,6 +51,9 @@ enum c_language_kind c_language = clk_c;
 #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
 #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE c_get_sarif_source_language
 
+#undef LANG_HOOKS_BUILD_COUNTED_BY_REF
+#define LANG_HOOKS_BUILD_COUNTED_BY_REF c_build_counted_by_ref
+
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 364f51df58c..627791551b4 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -777,6 +777,7 @@ extern struct c_switch *c_switch_stack;
 
 extern bool null_pointer_constant_p (const_tree);
 
+extern tree c_build_counted_by_ref (tree, tree, tree *);
 
 inline bool
 c_type_variably_modified_p (tree t)
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e24629be918..44031ca1ae3 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2940,8 +2940,8 @@ should_suggest_deref_p (tree datum_type)
 &(p->k)
 
 */
-static tree
-build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type)
+tree
+c_build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type)
 {
   tree type = TREE_TYPE (datum);
   if (!c_flexible_array_member_type_p (TREE_TYPE (subdatum)))
@@ -3039,7 +3039,7 @@ handle_counted_by_for_component_ref (location_t loc, tree 
ref)
   tree datum = TR

[PATCH 10/17] rust: Silence a clang warning in borrow-checker-diagnostics

2025-06-25 Thread Martin Jambor

Hi,

when compiling
gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
with clang, it emits the following warning:

  gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc:145:46: 
warning: non-constant-expression cannot be narrowed from type 'Polonius::Loan' 
(aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') in initializer list 
[-Wc++11-narrowing]

I'd hope that for indexing that is never really a problem,
nevertheless if narrowing is taking place, I guess it can be argued it
should be made explicit.

I have so far only tested this with the clang compile, I will try to
do a bootstrap with rust-enabled too.

Philip, Pierre, would you be willing to incorporate this into your
tree and commit it to master at gcc.gnu.org from there?  Or should I
commit it to master at gcc.gnu.org and you'll merge it from there?

Thanks,

Martin


gcc/rust/ChangeLog:

2025-06-23  Martin Jambor  

* checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
(BorrowCheckerDiagnostics::get_loan): Type cast loan to uint32_t.
---
 .../checks/errors/borrowck/rust-borrow-checker-diagnostics.cc   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc 
b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
index 6c67706780b..adf1448791e 100644
--- a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
+++ b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
@@ -142,7 +142,7 @@ BorrowCheckerDiagnostics::get_statement (Polonius::Point 
point)
 const BIR::Loan &
 BorrowCheckerDiagnostics::get_loan (Polonius::Loan loan)
 {
-  return bir_function.place_db.get_loans ()[{loan}];
+  return bir_function.place_db.get_loans ()[{(uint32_t) loan}];
 }
 
 const HIR::LifetimeParam *
-- 
2.49.0

[PATCH 08/17] ranger-op: Use CFN_ constant instead of plain BUILTIN_ one

2025-06-25 Thread Martin Jambor

Hi,

when compiling gimple-range-op.cc, clang issues warning:

  gimple-range-op.cc:1419:18: warning: comparison of different enumeration 
types in switch statement ('combined_fn' and 'built_in_function') 
[-Wenum-compare-switch]

which I hope is harmless, but all other switch cases use CFN_ prefixed
constants, so I guess the ISINF case should too.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin



gcc/ChangeLog:

2025-06-23  Martin Jambor  

* gimple-range-op.cc
(gimple_range_op_handler::maybe_builtin_call): Use
CFN_BUILT_IN_ISINF instead of BUILT_IN_ISINF.
---
 gcc/gimple-range-op.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 90a61971489..c9bc5c0c6b9 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1416,7 +1416,7 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;
 
-CASE_FLT_FN (BUILT_IN_ISINF):
+CASE_FLT_FN (CFN_BUILT_IN_ISINF):
   m_op1 = gimple_call_arg (call, 0);
   m_operator = &op_cfn_isinf;
   break;
-- 
2.49.0

[PATCH 07/17] gfortran: Avoid freeing uninitialized value

2025-06-25 Thread Martin Jambor

Hi,

When compiling fortran/match.cc, clang emits a warning

  fortran/match.cc:5301:7: warning: variable 'p' is used uninitialized whenever 
'if' condition is true [-Wsometimes-uninitialized]

which looks accurate, so this patch adds an initialization of p to
avoid the use.

Bootstrapped and tested on x86_64-linx.  OK for master?

Thanks,

Martin


gcc/fortran/ChangeLog:

2025-06-23  Martin Jambor  

* match.cc (gfc_match_nullify): Initialize p to NULL;
---
 gcc/fortran/match.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index a99a757bede..2e5ba29d9a4 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -5293,7 +5293,7 @@ match
 gfc_match_nullify (void)
 {
   gfc_code *tail;
-  gfc_expr *e, *p;
+  gfc_expr *e, *p = NULL;
   match m;
 
   tail = NULL;
-- 
2.49.0

[PATCH 13/17] lto-ltrans-cache: Remove unused private member

2025-06-25 Thread Martin Jambor

Hi,

when building GCC with clang, it warns that the private member suffix
in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used
which indeed looks like it is the case.  This patch therefore removes
it along with its initialization in the constructor.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* lto-ltrans-cache.h (class ltrans_file_cache): Remove member prefix.
* lto-ltrans-cache.cc (ltrans_file_cache::ltrans_file_cache): Do
not initialize member prefix.
---
 gcc/lto-ltrans-cache.cc | 3 +--
 gcc/lto-ltrans-cache.h  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
index c57775fae85..91af6ed6f82 100644
--- a/gcc/lto-ltrans-cache.cc
+++ b/gcc/lto-ltrans-cache.cc
@@ -210,8 +210,7 @@ write_cache_item (FILE* f, ltrans_file_cache::item *item, 
const char* dir)
 ltrans_file_cache::ltrans_file_cache (const char* dir, const char* prefix,
  const char* suffix,
  size_t soft_cache_size):
-  dir (dir), prefix (prefix), suffix (suffix),
-  soft_cache_size (soft_cache_size)
+  dir (dir), suffix (suffix),  soft_cache_size (soft_cache_size)
 {
   if (!dir) return;
 
diff --git a/gcc/lto-ltrans-cache.h b/gcc/lto-ltrans-cache.h
index 5fef44bae53..fdb7a389435 100644
--- a/gcc/lto-ltrans-cache.h
+++ b/gcc/lto-ltrans-cache.h
@@ -122,8 +122,7 @@ private:
   std::map map_checksum;
   std::map map_input;
 
-  /* Cached filenames are in format "prefix%d[.ltrans]suffix".  */
-  const char* prefix;
+  /* Cached filenames are in format "cache_prefix%d[.ltrans]suffix".  */
   const char* suffix;
 
   /* If cache items count is larger, prune deletes old items.  */
-- 
2.49.0

[PATCH 17/17] Ignore more clang warnings in contrib/filter-clang-warnings.py

2025-06-25 Thread Martin Jambor

Hi,

in contrib we have a script filter-clang-warnings.py which supposedly
filters out uninteresting warnings emitted by clang when it compiles
GCC.  I'm not sure if anyone else uses it but our internal SUSE
testing infrastructure does.

Since Martin Liška left, I have mostly ignored the warnings and so
they have multiplied.  In an effort to improve the situation, I have
tried to fix those warnings which I think are worth it and would like
to adjust the filtering script so that we get to zero "interesting"
warnings again.

The changes are the following:

1. Ignore -Woverloaded-shift-op-parentheses warnings.  IIUC, those
   make some sense when << and >> are used for I/O but since that is
   not the case in GCC they are not really interesting.

2. Ignore -Wunused-function and -Wunneeded-internal-declaration.  I
   think it is OK to occasionally prepare APIs before they are used
   (and with our LTO we should be able to get rid of them).

3. Ignore -Wvla-cxx-extension and -Wunused-command-line-argument which
   just don't seem to be useful.

4. Ignore -Wunused-private-field warning in diagnostic-path-output.cc
   which can only be correct if quite a few functions are removed and
   looks like it is just not an oversight:

 gcc/diagnostic-path-output.cc:271:35: warning: private field 
'm_logical_loc_mgr' is not used [-Wunused-private-field]

5. Ignore a case in -Wunused-but-set-variable about named_args which
   is used in a piece of code behind an ifdef.

6. Adjust the gimple-match and generic-match filters to the fact that
   we now have multiple such files.

7. Ignore warnings about using memcpy to copy around wide_ints, like
   the one below.  I seem to remember wide-int has undergone fairly
   rigorous review and TBH I just hope I know what we are doing.

 gcc/wide-int.h:1198:11: warning: first argument in call to 'memcpy' is a 
pointer to non-trivially copyable type 'wide_int_storage' [-Wnontrivial-memcall]

8. I have decided to ignore warnings in m2/gm2-compiler-boot about
   unused stuff (all reported unused stuff are variables).  These
   sources are in the build directory so I assume they are somehow
   generated and so warnings about unused things are a bit expected
   and probably not too bad.

9. On the Zulip chat, I have informed Rust folks they have a bunch of
   -Wunused-private-field cases in the FE.  Until they sort it out I'm
   ignoring these.  I might add the missing explicit type-cast case
   here too if it takes time for the patch I'm posting in this series
   to reach master.

10. I ignore warning about use of offsetof in libiberty/sha1.c which is
apparently only a "C23 extension:"

  libiberty/sha1.c:239:11: warning: defining a type within 'offsetof' is a 
C23 extension [-Wc23-extensions]
  libiberty/sha1.c:460:11: warning: defining a type within 'offsetof' is a 
C23 extension [-Wc23-extensions]

11. I have enlarged the list of .texi files where warnings somehow got
reported.  Not sure why that happens.

12. I'm ignoring the -Wunused-const-variable case in value-relation.cc
until Andrew commits the patch he has to remove it.

With these changes and my other patches, we reach zero interesting
warnings.

Since I don't think anyone else uses the script, I'm would like to
declare these changes "obvious" in the sense that they are obviously
useful for me and obviously nobody else will mind or even be affected.
I'm going to hold off for a week though, please let me know if I'm
stretching the obvious rule too much here.

Thanks,

Martin



contrib/ChangeLog:

2025-06-25  Martin Jambor  

* filter-clang-warnings.py (skip_warning): Also ignore
-Woverloaded-shift-op-parentheses, -Wunused-function,
-Wunneeded-internal-declaration, -Wvla-cxx-extension', and
-Wunused-command-line-argument everywhere and a warning about
m_logical_loc_mgr in diagnostic-path-output.cc.  Adjust gimple-match
and generic-match "filenames."  Ignore -Wunused-const-variable
warnings in value-relation.cc, -Wnontrivial-memcall warnings in
wide-int.h, all warnings about unused stuff in files under
m2/gm2-compiler-boot, all -Wunused-private-field in rust FE, all
Warnings in avr-mmcu.texi, install.texi and libgccjit.texi and all
-Wc23-extensions warnings in libiberty/sha1.c.
---
 contrib/filter-clang-warnings.py | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/contrib/filter-clang-warnings.py b/contrib/filter-clang-warnings.py
index 2ea7c710163..f0f7885d26d 100755
--- a/contrib/filter-clang-warnings.py
+++ b/contrib/filter-clang-warnings.py
@@ -41,12 +41,22 @@ def skip_warning(filename, message):
  '-Wignored-attributes', '-Wgnu-zero-variadic-macro-arguments',
  '-Wformat-security', '-Wundefined-internal',
  '-Wunknown-warning-option', '-Wc++20-extensions',
- '-Wbitwise-instead-of-logical', 'egrep is obsole

Re: [PATCH 3/4] c++/modules: Support streaming new size cookie for constexpr [PR120040]

2025-06-25 Thread Jason Merrill


On 5/21/25 10:15 PM, Nathaniel Shead wrote:

This type currently has a DECL_NAME of an IDENTIFIER_DECL.  Although the
documentation indicates this is legal, this confuses modules streaming
which expects all RECORD_TYPEs to have a TYPE_DECL, which is used to
determine the context and merge key, etc.

PR c++/120040

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Handle TYPE_NAME
now being a TYPE_DECL rather than just an IDENTIFIER_NODE.
* init.cc (build_new_constexpr_heap_type): Build a TYPE_DECL for
the returned type; mark the type as artificial.
* module.cc (trees_out::type_node): Add some assertions.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr120040_a.C: New test.
* g++.dg/modules/pr120040_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/constexpr.cc   |  2 +-
  gcc/cp/init.cc| 10 +-
  gcc/cp/module.cc  |  3 +++
  gcc/testsuite/g++.dg/modules/pr120040_a.C | 19 +++
  gcc/testsuite/g++.dg/modules/pr120040_b.C | 15 +++
  5 files changed, 47 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr120040_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr120040_b.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fa754b9a176..ceb8f04fab4 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8613,7 +8613,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
tree cookie_size = NULL_TREE;
tree arg_size = NULL_TREE;
if (TREE_CODE (elt_type) == RECORD_TYPE
-   && TYPE_NAME (elt_type) == heap_identifier)
+   && DECL_NAME (TYPE_NAME (elt_type)) == heap_identifier)


This could be TYPE_IDENTIFIER.  OK either way.


  {
tree fld1 = TYPE_FIELDS (elt_type);
tree fld2 = DECL_CHAIN (fld1);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 80a37a14a80..0a389fb6ecd 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3010,7 +3010,6 @@ build_new_constexpr_heap_type (tree elt_type, tree 
cookie_size, tree itype2)
tree atype1 = build_cplus_array_type (sizetype, itype1);
tree atype2 = build_cplus_array_type (elt_type, itype2);
tree rtype = cxx_make_type (RECORD_TYPE);
-  TYPE_NAME (rtype) = heap_identifier;
tree fld1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, atype1);
tree fld2 = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, atype2);
DECL_FIELD_CONTEXT (fld1) = rtype;
@@ -3019,7 +3018,16 @@ build_new_constexpr_heap_type (tree elt_type, tree 
cookie_size, tree itype2)
DECL_ARTIFICIAL (fld2) = true;
TYPE_FIELDS (rtype) = fld1;
DECL_CHAIN (fld1) = fld2;
+  TYPE_ARTIFICIAL (rtype) = true;
layout_type (rtype);
+
+  tree decl = build_decl (UNKNOWN_LOCATION, TYPE_DECL, heap_identifier, rtype);
+  TYPE_NAME (rtype) = decl;
+  TYPE_STUB_DECL (rtype) = decl;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_ARTIFICIAL (decl) = true;
+  layout_decl (decl, 0);
+
return rtype;
  }
  
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc

index ddb5299b244..765d17935c5 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -9362,6 +9362,7 @@ trees_out::type_node (tree type)
  
tree root = (TYPE_NAME (type)

   ? TREE_TYPE (TYPE_NAME (type)) : TYPE_MAIN_VARIANT (type));
+  gcc_checking_assert (root);
  
if (type != root)

  {
@@ -9440,6 +9441,8 @@ trees_out::type_node (tree type)
|| TREE_CODE (type) == UNION_TYPE
|| TREE_CODE (type) == ENUMERAL_TYPE)
{
+   gcc_checking_assert (DECL_P (name));
+
/* We can meet template parms that we didn't meet in the
   tpl_parms walk, because we're referring to a derived type
   that was previously constructed from equivalent template
diff --git a/gcc/testsuite/g++.dg/modules/pr120040_a.C 
b/gcc/testsuite/g++.dg/modules/pr120040_a.C
new file mode 100644
index 000..77e16892f4e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr120040_a.C
@@ -0,0 +1,19 @@
+// PR c++/120040
+// { dg-additional-options "-fmodules -std=c++20" }
+// { dg-module-cmi M }
+
+export module M;
+
+struct S {
+  constexpr ~S() {}
+};
+
+export constexpr bool foo() {
+  S* a = new S[3];
+  delete[] a;
+  return true;
+}
+
+export constexpr S* bar() {
+  return new S[3];
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr120040_b.C 
b/gcc/testsuite/g++.dg/modules/pr120040_b.C
new file mode 100644
index 000..e4610b07eaf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr120040_b.C
@@ -0,0 +1,15 @@
+// PR c++/120040
+// { dg-additional-options "-fmodules -std=c++20" }
+
+import M;
+
+constexpr bool qux() {
+  auto* s = bar();
+  delete[] s;
+  return true;
+}
+
+int main() {
+  static_assert(foo());
+  static_assert(qux());
+}

[commmited v2] libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

2025-06-25 Thread Tomasz Kamiński

For month_day we incorrectly reported day information to be available, which 
lead
to format_error being thrown from the call to formatter::format at runtime, 
instead
of making call to format ill-formed.

The included test cover most of the combinations of _ChronoParts and format
specifiers.

PR libstdc++/120650

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h
(formatter::parse): Call _M_parse with
only Month being available.
* testsuite/std/time/format/data_not_present_neg.cc: New test.
---
v2 adds "{ target cxx11_abi }" to dg-errors for types supported only in 
cxx11_abi.
Test on x86_64-linux, and std/time/format* tested with 
-D_GLIBCXX_USE_CXX11_ABI=0.
Pushed to trunk.

 libstdc++-v3/include/bits/chrono_io.h |   3 +-
 .../std/time/format/data_not_present_neg.cc   | 164 ++
 2 files changed, 165 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index abbf4efcc3b..4eb00f4932d 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2199,8 +2199,7 @@ namespace __format
   constexpr typename basic_format_parse_context<_CharT>::iterator
   parse(basic_format_parse_context<_CharT>& __pc)
   {
-   return _M_f._M_parse(__pc, __format::_Month|__format::_Day,
-__defSpec);
+   return _M_f._M_parse(__pc, __format::_Month, __defSpec);
   }
 
   template
diff --git a/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc 
b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
new file mode 100644
index 000..bb09451dc29
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc
@@ -0,0 +1,164 @@
+// { dg-do compile { target c++20 } }
+
+#include 
+#include 
+
+using namespace std::chrono;
+
+auto d1 = std::format("{:%w}", 10d); // { dg-error "call to consteval 
function" }
+auto d2 = std::format("{:%m}", 10d); // { dg-error "call to consteval 
function" }
+auto d3 = std::format("{:%y}", 10d); // { dg-error "call to consteval 
function" }
+auto d4 = std::format("{:%F}", 10d); // { dg-error "call to consteval 
function" }
+auto d5 = std::format("{:%T}", 10d); // { dg-error "call to consteval 
function" }
+auto d6 = std::format("{:%Q}", 10d); // { dg-error "call to consteval 
function" }
+auto d7 = std::format("{:%Z}", 10d); // { dg-error "call to consteval 
function" }
+
+auto w1 = std::format("{:%d}", Thursday); // { dg-error "call to consteval 
function" }
+auto w2 = std::format("{:%m}", Thursday); // { dg-error "call to consteval 
function" }
+auto w3 = std::format("{:%y}", Thursday); // { dg-error "call to consteval 
function" }
+auto w4 = std::format("{:%F}", Thursday); // { dg-error "call to consteval 
function" }
+auto w5 = std::format("{:%T}", Thursday); // { dg-error "call to consteval 
function" }
+auto w6 = std::format("{:%Q}", Thursday); // { dg-error "call to consteval 
function" }
+auto w7 = std::format("{:%Z}", Thursday); // { dg-error "call to consteval 
function" }
+
+auto wi1 = std::format("{:%d}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi2 = std::format("{:%m}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi3 = std::format("{:%y}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi4 = std::format("{:%F}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi5 = std::format("{:%T}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi6 = std::format("{:%Q}", Thursday[2]); // { dg-error "call to consteval 
function" }
+auto wi7 = std::format("{:%Z}", Thursday[2]); // { dg-error "call to consteval 
function" }
+
+auto wl1 = std::format("{:%d}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl2 = std::format("{:%m}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl3 = std::format("{:%y}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl4 = std::format("{:%F}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl5 = std::format("{:%T}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl6 = std::format("{:%Q}", Thursday[last]); // { dg-error "call to 
consteval function" }
+auto wl7 = std::format("{:%Z}", Thursday[last]); // { dg-error "call to 
consteval function" }
+
+auto m1 = std::format("{:%d}", January); // { dg-error "call to consteval 
function" }
+auto m2 = std::format("{:%w}", January); // { dg-error "call to consteval 
function" }
+auto m3 = std::format("{:%y}", January); // { dg-error "call to consteval 
function" }
+auto m4 = std::format("{:%F}", January); // { dg-error "call to consteval 
function" }
+auto m5 = std::format("{:%T}", January); // { dg-error "call to consteval 
function" }
+auto m6 = std::format("{:%Q}", Janua

Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions

2025-06-25 Thread Richard Sandiford

Richard Sandiford  writes:
> Karl Meakin  writes:
>> +   "r"))
>> +   (label_ref (match_operand 2))
>> +   (pc)))]
>> +  "TARGET_CMPBR"
>> +  "cb\\t%0, %1, %l2";

Sorry, for following up on myself, but: the pattern needs to handle far
branches, in the same way as existing patterns do.  That is:

if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
  return aarch64_gen_far_branch (...);
else
  return "cb\\t%0, %1, %l2";

It would be good to have a test for this, e.g. by having an if-then-else
in which the then and else blocks contain a series of 256(+) volatile
stores, with the then and else storing to different volatile locations.

Richard


>> +  [(set_attr "type" "branch")
>> +   (set (attr "length")
>> +(if_then_else (and (ge (minus (match_dup 2) (pc))
>> +   (const_int BRANCH_LEN_N_1Kib))
>> +   (lt (minus (match_dup 2) (pc))
>> +   (const_int BRANCH_LEN_P_1Kib)))
>> +  (const_int 4)
>> +  (const_int 8)))
>> +   (set (attr "far_branch")
>> +(if_then_else (and (ge (minus (match_dup 2) (pc))
>> +   (const_int BRANCH_LEN_N_1Kib))
>> +   (lt (minus (match_dup 2) (pc))
>> +   (const_int BRANCH_LEN_P_1Kib)))
>> +  (const_string "no")
>> +  (const_string "yes")))]
>> +)

[PATCH] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-25 Thread Tomasz Kamiński

This patch reworks the formatting for the chrono types, such that they are all
formatted in terms of _ChronoData class, that includes all required fields.
Populating each required field is performed in formatter for specific type,
based on the chrono-spec used.

To facilitate above, the _ChronoSpec now includes additional _M_needed field,
that represnts the chrono data that is referenced by format spec (this value
is also configured for __defSpec). This value differs from the value of
__parts passed to _M_parse, which does include all fields that can be computed
from input (e.g. weekday_indexed can be computed for year_month_day). Later
it is used to fill _ChronoData, in particular _M_fill_* family of functions,
to determine if given field needs to be set, and thus it's value needs to be
computed.

In consequence _ChronoParts enum was exteneded with additional values,
that allows more fine grained indentification:
 * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
 * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
 * _LocalDays, _WeekdayIndex are defiend in included in _Date,
 * _Duration is removed, and instead _EpochUnits and _UnitSuffix are
   introduced.
Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
with additional operators that simplify uses.

In addition to fields that can be printed using chron-spec, _ChronoData stores:
 * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
   struct tm construction, and for ISO calendar computation.
 * Total seconds in wall time (_M_lseconds) - this value may be different from
   sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
   to allow future extension, like printing total minutes.
 * Total seconds since epoch - due offset different from above. Again to be
   used with future extension (e.g. %s as proposed in P2945R1).
 * Subseconds - count of attoseconds (10^(-18)), in addition to priting can
   be used to  compute fractional hours, minutes.
The both total seconds fielkds we use single _TotalSeconds enumerator in
_ChronoParts, that when present in combination with _EpochUnits or _LocalDays
indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
provided/required.

To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
format_args mechanism, where the result of +d.count() (see LWG4118) is erased
into make_format_args to local __arg_store, that is later referenced by
_M_ereps (_M_ereps.get(0)).

To handle precision values, and in prepartion to allow user to configure ones,
we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
allows duration with precision to be printed using "{0:{2}}". For subseconds
the precision is handled differently depending on the representation:
 * for integral reps, _M_subseconds value is used to determine fractional value,
   precision is trimmed to 18 digits;
 * for floating-points, we _M_ereps stores duration initialized with only
   fractional seconds, that is later formatted with precision.
Always using _M_subseconds fields for integral duration, means that we do not
use formattter for user-defined durations that are considered to be integral
(see empty_spec.cc file change). To avoid potentially expensive computation
of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
_Subseconds are needed. In particular we remove this flag for localized ouput
in _M_parse.

Construction the _M_ereps as described above is handled by __formatter_duration,
that is then used to format duration, hh_mm_ss and time_points specialization.
This class also handles _UnitSuffix, the _M_units_suffix field is populated
either with predefined suffix (chrono::__detail::__units_suffix) or one produced
locally.

Finally, formatters for types listed below contains type specific logic:
 * hh_mm_ss - we do not compute total duration and seconds, unless explicitly
   requested, as such computation may overflow;
 * utc_time - for time during leap second insertion, the _M_seconds field is
   increased to 60;
 * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
   abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
   futhermore conversion from `char` to `wchar_t` for abbreviation is performed
   if needed.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::__no_timezone_available):
Removed, replaced with separate throws in formatter for
__local_time_fmt
(__format::_ChronoParts): Defined additional enumertors and
declared as enum class.
(__format::operator&(_ChronoParts, _ChronoParts))
(__format::operator&=(_ChronoParts&, _ChronoParts))
(__format::operator-(_ChronoParts, _ChronoParts))
(__format::operator-=(_ChronoParts&, _ChronoParts))
(__format::operator==(_ChronoParts, decltype(nullptr)))
(_ChronoSpec::

Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-25 Thread Richard Sandiford

Christoph Müllner  writes:
> On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford
>  wrote:
>>
>> Christoph Müllner  writes:
>> > insn_info::has_been_deleted () is documented to return true if an
>> > instruction is deleted.  Such instructions have their `volatile` bit set,
>> > which can be tested via rtx_insn::deleted ().
>> >
>> > The current condition for insn_info::has_been_deleted () is:
>> > * m_rtl is not NULL: this can't happen as no member of insn_info
>> >   changes this pointer.
>>
>> Yeah, it's invariant after creation, but it starts off null for some
>> artificial instructions:
>>
>>   // Return the underlying RTL insn.  This instruction is null if is_phi ()
>>   // or is_bb_end () are true.  The instruction is a basic block note if
>>   // is_bb_head () is true.
>>   rtx_insn *rtl () const { return m_rtl; }
>>
>> So I think we should keep the null check.  (But then is_call and is_jump
>> should check m_rtl is nonnull too -- that's preapproved if you want to
>> do it while you're here.)
>
> I have a tested patch for this, but I don't think that it would be sufficient,
> as there are also other places to check for a NULL dereference:
> * member-fns.inl: insn_info::uid -> what to return here?

That one's ok, because m_rtl is nonnull whenever m_cost_or_uid >= 0.
(m_cost_or_uid >= 0 is the test for whether something is a "real"
instruction, in which case it always has an associated RTL insn.)

> * internals.inl: insn_info::set_properties
> * insns.cc: insn_info::calculate_cost

Those two are ok because they're internal routines that are only
reached when we already know that we're dealing with real instructions.

> Ok, if I add NULL-checks there as well?

I think just is_call and is_jump for now, since they're publicly-facing
routines that don't assume any preconditions.  Others might crop up later
though...

>> > * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and
>> >   does not test the `volatile` bit.
>>
>> Because of the need to stage multiple simultaneous changes, rtl-ssa first
>> uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note,
>> then uses remove_insn to remove the underlying instruction.  It doesn't
>> use delete_insn directly.  The call to remove_insn is fairly recent;
>> the original code just used set_insn_deleted, but not removing the notes
>> caused trouble for later passes.
>>
>> The test was therefore supposed to be checking whether set_insn_deleted
>> had been called.  It should also have checked the note kind though.
>
> Thanks for the explanation. I missed the fact that set_insn_delete () is used.
> Assuming that code using RTL-SSA will use the insn_change class, it makes
> sense now.

Ah, yeah, that's pretty much required, since otherwise things will get out
of sync.

> I'm converting the fold-mem-offsets pass to RTL-SSA (see PR117922).
> And I ran into this issue because I've already converted the analysis
> part to RTL-SSA,
> but the code changes are still performed directly on the rtx_insn objects
> (in do_commit_insn ()). I'll try to use RTL-SSA in do_commit_insn () as well,
> which should also allow RTL-SSA to see the changes.

Sounds good!  Thanks for doing this.

Richard

[Fortran, Patch, PR120711, v1] 1/(3) Fix out of bounds access in cleanup of array constructor

2025-06-25 Thread Andre Vehreschild

Hi all,

attached patch fixes an out of bounds access in the clean up code of a
concatenating array constructor. A fragment like

list = [ list, something() ]

lead to clean up using an offset (of the list array) that was manipulated in
the loop copying the existing array elements and at the end pointing to one
element past the list (after the concatenation). 

This fixes a 15-regression. Releases prior to 15 do not have the out
of bounds access in the (non existing) clean up code. The have a memory
leak instead.

Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

The subject says, that there will be 3 patches. Only this one fixes the bug.
The other fixes I found while hunting this issue and because they play in the
general same area, I don't want to loose them. I therefore publish them in this
context. 

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 548bcaeff9b8c8d6bb670574883f7b02878e3221 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 25 Jun 2025 09:12:35 +0200
Subject: [PATCH 1/3] Fortran: Fix out of bounds access in structure
 constructor's clean up [PR120711]

A structure constructor's generated clean up code was using an offset
variable, which was manipulated before the clean up was run leading to
an out of bounds access.

	PR fortran/120711

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_trans_array_ctor_element): Store the value
	of the offset for reuse.

gcc/testsuite/ChangeLog:

	* gfortran.dg/asan/array_constructor_1.f90: New test.
---
 gcc/fortran/trans-array.cc| 10 
 .../gfortran.dg/asan/array_constructor_1.f90  | 23 +++
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 3d274439895..7be2d7b11a6 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -1991,14 +1991,17 @@ static void
 gfc_trans_array_ctor_element (stmtblock_t * pblock, tree desc,
 			  tree offset, gfc_se * se, gfc_expr * expr)
 {
-  tree tmp;
+  tree tmp, offset_eval;
 
   gfc_conv_expr (se, expr);
 
   /* Store the value.  */
   tmp = build_fold_indirect_ref_loc (input_location,
  gfc_conv_descriptor_data_get (desc));
-  tmp = gfc_build_array_ref (tmp, offset, NULL);
+  /* The offset may change, so get its value now and use that to free memory.
+   */
+  offset_eval = gfc_evaluate_now (offset, &se->pre);
+  tmp = gfc_build_array_ref (tmp, offset_eval, NULL);
 
   if (expr->expr_type == EXPR_FUNCTION && expr->ts.type == BT_DERIVED
   && expr->ts.u.derived->attr.alloc_comp)
@@ -3150,8 +3153,7 @@ finish:
  the reference.  */
   if ((expr->ts.type == BT_DERIVED || expr->ts.type == BT_CLASS)
&& finalblock.head != NULL_TREE)
-gfc_add_block_to_block (&loop->post, &finalblock);
-
+gfc_prepend_expr_to_block (&loop->post, finalblock.head);
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
new file mode 100644
index 000..45eafacd5a6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
@@ -0,0 +1,23 @@
+!{ dg-do run }
+
+! Contributed by Christopher Albert  
+
+program grow_type_array
+type :: container
+integer, allocatable :: arr(:)
+end type container
+
+type(container), allocatable :: list(:)
+
+list = [list, new_elem(5)]
+
+deallocate(list)
+
+contains
+
+type(container) function new_elem(s) result(out)
+integer :: s
+allocate(out%arr(s))
+end function new_elem
+  
+end program grow_type_array
-- 
2.49.0

[Fortran, Patch, v1] 3/(3) Prevent creating tree that is never used.

2025-06-25 Thread Andre Vehreschild

Hi,

while hunting for pr120711 I found a construct where a call-tree was created
and never used. The patch now just suppresses the tree creation and instead
uses directly the tree that is desired.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 52a7898f0b460dfcd64117b399826592e8f0978b Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 25 Jun 2025 12:27:35 +0200
Subject: [PATCH 3/3] Fortran: Prevent creation of unused tree.

gcc/fortran/ChangeLog:

	* trans.cc (gfc_allocate_using_malloc): Prevent possible memory
	leak when allocation was already done.
---
 gcc/fortran/trans.cc | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index fdeb1e89a76..13fd5ad498d 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -822,6 +822,7 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer,
   tree tmp, error_cond;
   stmtblock_t on_error;
   tree status_type = status ? TREE_TYPE (status) : NULL_TREE;
+  bool cond_is_true = cond == boolean_true_node;
 
   /* If successful and stat= is given, set status to 0.  */
   if (status != NULL_TREE)
@@ -834,11 +835,13 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer,
   tmp = fold_build2_loc (input_location, MAX_EXPR, size_type_node,
 			 size, build_int_cst (size_type_node, 1));
 
-  tmp = build_call_expr_loc (input_location,
-			 builtin_decl_explicit (BUILT_IN_MALLOC), 1, tmp);
-  if (cond == boolean_true_node)
+  if (!cond_is_true)
+tmp = build_call_expr_loc (input_location,
+			   builtin_decl_explicit (BUILT_IN_MALLOC), 1, tmp);
+  else
 tmp = alt_alloc;
-  else if (cond)
+
+  if (!cond_is_true && cond)
 tmp = build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp), cond,
 		  alt_alloc, tmp);
 
-- 
2.49.0

[Fortran, Patch, v1] 2/(3) Stop spending memory in coarray single mode executables.

2025-06-25 Thread Andre Vehreschild

Hi,

attached patch prevents generation of a token component in derived types, when
-fcoarray=single is used. Generating the token only wastes memory. It is never
even initialized nor accessed.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From a888d8952e8fa6f516fde22519fab33d60d3f0c4 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 25 Jun 2025 12:27:04 +0200
Subject: [PATCH 2/3] Fortran: Fix wasting memory in coarray single mode.

gcc/fortran/ChangeLog:

	* resolve.cc (resolve_fl_derived0): Do not create the token
	component when not in coarray lib mode.
	* trans-types.cc: Do not access the token when not in coarray
	lib mode.
---
 gcc/fortran/resolve.cc | 4 ++--
 gcc/fortran/trans-types.cc | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 7089e4f171d..58f7aee29c3 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -16841,8 +16841,8 @@ resolve_fl_derived0 (gfc_symbol *sym)
 return false;
 
   /* Now add the caf token field, where needed.  */
-  if (flag_coarray != GFC_FCOARRAY_NONE
-  && !sym->attr.is_class && !sym->attr.vtype)
+  if (flag_coarray == GFC_FCOARRAY_LIB && !sym->attr.is_class
+  && !sym->attr.vtype)
 {
   for (c = sym->components; c; c = c->next)
 	if (!c->attr.dimension && !c->attr.codimension
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index e15b1bb89f0..1754d982153 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3187,7 +3187,7 @@ copy_derived_types:
 for (c = derived->components; c; c = c->next)
   {
 	/* Do not add a caf_token field for class container components.  */
-	if ((codimen || coarray_flag) && !c->attr.dimension
+	if (codimen && coarray_flag && !c->attr.dimension
 	&& !c->attr.codimension && (c->attr.allocatable || c->attr.pointer)
 	&& !derived->attr.is_class)
 	  {
-- 
2.49.0

Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions

2025-06-25 Thread Richard Sandiford

Karl Meakin  writes:
> Add rules for lowering `cbranch4` to CBB/CBH/CB when
> CMPBR extension is enabled.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
>   * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
>   * config/aarch64/aarch64.md (cbranch4): Rename to ...
>   (cbranch4): ...here, and emit CMPBR if possible.
>   (cbranch4): New expand rule.
>   (aarch64_cb): New insn rule.
>   (aarch64_cb): Likewise.
>   * config/aarch64/constraints.md (Uc0): New constraint.
>   (Uc1): Likewise.
>   (Uc2): Likewise.
>   * config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
>   (INT_CMP): New code iterator.
>   (cmpbr_imm_constraint): New code attr.
>   * config/aarch64/predicates.md (const_0_to_63_operand): New predicate.
>   (aarch64_cb_immediate): Likewise.
>   (aarch64_cb_operand): Likewise.
>   (aarch64_cb_short_operand): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cmpbr.c:
> ---
>  gcc/config/aarch64/aarch64-protos.h  |   2 +
>  gcc/config/aarch64/aarch64.cc|  33 ++
>  gcc/config/aarch64/aarch64.md|  89 +++-
>  gcc/config/aarch64/constraints.md|  18 +
>  gcc/config/aarch64/iterators.md  |  19 +
>  gcc/config/aarch64/predicates.md |  15 +
>  gcc/testsuite/gcc.target/aarch64/cmpbr.c | 586 ---
>  7 files changed, 376 insertions(+), 386 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 31f2f5b8bd2..0f104d0641b 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, 
> vec,
>unsigned int, tree, unsigned int,
>tree *);
>  
> +bool aarch64_cb_rhs (rtx op, rtx rhs);
> +
>  namespace aarch64 {
>void report_non_ice (location_t, tree, unsigned int);
>void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT,
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 667e42ba401..3dc139e9a72 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern)
>gcc_unreachable ();
>  }
>  
> +/* Return true if rhs is an operand suitable for a CB (immediate)
> +   instruction. */

This should also mention what "op" is.  The convention is also to use caps
to refer to parameter names.

Maybe:

/* Return true if RHS is an immediate operand suitable for a CB (immediate)
   instruction.  OP determines the type of the comparison.  */

> +bool
> +aarch64_cb_rhs (rtx op, rtx rhs)
> +{
> +  if (!CONST_INT_P (rhs))
> +return REG_P (rhs);
> +
> +  HOST_WIDE_INT rhs_val = INTVAL (rhs);
> +
> +  switch (GET_CODE (op))
> +{
> +case EQ:
> +case NE:
> +case GT:
> +case GTU:
> +case LT:
> +case LTU:
> +  return IN_RANGE (rhs_val, 0, 63);
> +
> +case GE:  /* CBGE:   signed greater than or equal */
> +case GEU: /* CBHS: unsigned greater than or equal */
> +  return IN_RANGE (rhs_val, 1, 64);
> +
> +case LE:  /* CBLE:   signed less than or equal */
> +case LEU: /* CBLS: unsigned less than or equal */
> +  return IN_RANGE (rhs_val, -1, 62);
> +
> +default:
> +  return false;
> +}
> +}
> +
>  /* Return the location of a piece that is known to be passed or returned
> in registers.  FIRST_ZR is the first unused vector argument register
> and FIRST_PR is the first unused predicate argument register.  */
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 0a378ab377d..23bce55f620 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -717,6 +717,10 @@ (define_constants
>  ;; +/- 32KiB.  Used by TBZ, TBNZ.
>  (BRANCH_LEN_P_32KiB  32764)
>  (BRANCH_LEN_N_32KiB -32768)
> +
> +;; +/- 1KiB.  Used by CBB, CBH, CB.
> +(BRANCH_LEN_P_1Kib  1020)
> +(BRANCH_LEN_N_1Kib -1024)
>]
>  )
>  
> @@ -724,18 +728,35 @@ (define_constants
>  ;; Conditional jumps
>  ;; ---
>  
> -(define_expand "cbranch4"
> +(define_expand "cbranch4"
>[(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>   [(match_operand:GPI 1 "register_operand")
>(match_operand:GPI 2 "aarch64_plus_operand")])
>  (label_ref (match_operand 3))
>  (pc)))]
>""
> -  "
> -  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
> -  operands[2]);
> -  operands[2] = const0_rtx;
> -  "
> +  {
> +  if (TARGET_CMPBR && aarch64_cb_rhs(operands[0], operands[2]))
> +{
> +// Fal

[Fortran, Patch, PR120637, v1] Ensure expression in finalizer creation is freed only when unused.

2025-06-25 Thread Andre Vehreschild

Hi,

Antony Lewis reported this issue and also proposed a patch, that removes the
was_finalized tracking. While this may lead to the desired effect for the issue
at hand, I don't believe that the was_finalized tracking code has been there for
no reason. 

This patch fixes the issue that also Antony found, but by ensuring the
expression stays allocated when used instead of being freeed.

The test has been put into the asan directory of gfortran.dg and reliably
reports the issue without the fix. (With the fix, the asan is quite).

Regtests ok on x86_64-pc-linxu-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 2c7c6a6db78c448a158ee4f952cf2236665001ca Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 25 Jun 2025 14:46:16 +0200
Subject: [PATCH] Fortran: Ensure finalizers are created correctly [PR120637]

Finalize_component freeed an expression that it used to remember which
components in which context it had finalized already.  While it makes
sense to free the copy of the expression, if it is unused, it causes
issues, when comparing to a non existent expression. This is now
detected by returning true, when the expression has been used.

	PR fortran/120637

gcc/fortran/ChangeLog:

	* class.cc (finalize_component): Return true, when a finalizable
	component was detect and do not free it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/asan/finalizer_1.f90: New test.
---
 gcc/fortran/class.cc  | 24 ---
 .../gfortran.dg/asan/finalizer_1.f90  | 67 +++
 2 files changed, 81 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/asan/finalizer_1.f90

diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index df18601e45b..a1c6fafa75e 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -1034,7 +1034,7 @@ comp_is_finalizable (gfc_component *comp)
of calling the appropriate finalizers, coarray deregistering, and
deallocation of allocatable subcomponents.  */
 
-static void
+static bool
 finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
 		gfc_symbol *stat, gfc_symbol *fini_coarray, gfc_code **code,
 		gfc_namespace *sub_ns)
@@ -1044,14 +1044,14 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   gfc_was_finalized *f;
 
   if (!comp_is_finalizable (comp))
-return;
+return false;
 
   /* If this expression with this component has been finalized
  already in this namespace, there is nothing to do.  */
   for (f = sub_ns->was_finalized; f; f = f->next)
 {
   if (f->e == expr && f->c == comp)
-	return;
+	return false;
 }
 
   e = gfc_copy_expr (expr);
@@ -1208,8 +1208,6 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   final_wrap->ext.actual->next->next = gfc_get_actual_arglist ();
   final_wrap->ext.actual->next->next->expr = fini_coarray_expr;
 
-
-
   if (*code)
 	{
 	  (*code)->next = final_wrap;
@@ -1221,11 +1219,14 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   else
 {
   gfc_component *c;
+  bool ret = false;
 
   for (c = comp->ts.u.derived->components; c; c = c->next)
-	finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray, code,
-			sub_ns);
-  gfc_free_expr (e);
+	ret |= finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray,
+   code, sub_ns);
+  /* Only free the expression, if it has never been used.  */
+  if (!ret)
+	gfc_free_expr (e);
 }
 
   /* Record that this was finalized already in this namespace.  */
@@ -1234,6 +1235,7 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   sub_ns->was_finalized->e = expr;
   sub_ns->was_finalized->c = comp;
   sub_ns->was_finalized->next = f;
+  return true;
 }
 
 
@@ -2314,6 +2316,7 @@ finish_assumed_rank:
 {
   gfc_symbol *stat;
   gfc_code *block = NULL;
+  gfc_expr *ptr_expr;
 
   if (!ptr)
 	{
@@ -2359,14 +2362,15 @@ finish_assumed_rank:
 	 sub_ns);
   block = block->next;
 
+  ptr_expr = gfc_lval_expr_from_sym (ptr);
   for (comp = derived->components; comp; comp = comp->next)
 	{
 	  if (comp == derived->components && derived->attr.extension
 	  && ancestor_wrapper && ancestor_wrapper->expr_type != EXPR_NULL)
 	continue;
 
-	  finalize_component (gfc_lval_expr_from_sym (ptr), derived, comp,
-			  stat, fini_coarray, &block, sub_ns);
+	  finalize_component (ptr_expr, derived, comp, stat, fini_coarray,
+			  &block, sub_ns);
 	  if (!last_code->block->next)
 	last_code->block->next = block;
 	}
diff --git a/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90 b/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90
new file mode 100644
index 000..dfc20de7f3b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90
@@ -0,0 +1,67 @@
+!{ dg-do run }
+
+! PR fortran/120637
+
+! Contributed

Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-25 Thread Christoph Müllner

On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford
 wrote:
>
> Christoph Müllner  writes:
> > insn_info::has_been_deleted () is documented to return true if an
> > instruction is deleted.  Such instructions have their `volatile` bit set,
> > which can be tested via rtx_insn::deleted ().
> >
> > The current condition for insn_info::has_been_deleted () is:
> > * m_rtl is not NULL: this can't happen as no member of insn_info
> >   changes this pointer.
>
> Yeah, it's invariant after creation, but it starts off null for some
> artificial instructions:
>
>   // Return the underlying RTL insn.  This instruction is null if is_phi ()
>   // or is_bb_end () are true.  The instruction is a basic block note if
>   // is_bb_head () is true.
>   rtx_insn *rtl () const { return m_rtl; }
>
> So I think we should keep the null check.  (But then is_call and is_jump
> should check m_rtl is nonnull too -- that's preapproved if you want to
> do it while you're here.)

I have a tested patch for this, but I don't think that it would be sufficient,
as there are also other places to check for a NULL dereference:
* member-fns.inl: insn_info::uid -> what to return here?
* internals.inl: insn_info::set_properties
* insns.cc: insn_info::calculate_cost

Ok, if I add NULL-checks there as well?

> > * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and
> >   does not test the `volatile` bit.
>
> Because of the need to stage multiple simultaneous changes, rtl-ssa first
> uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note,
> then uses remove_insn to remove the underlying instruction.  It doesn't
> use delete_insn directly.  The call to remove_insn is fairly recent;
> the original code just used set_insn_deleted, but not removing the notes
> caused trouble for later passes.
>
> The test was therefore supposed to be checking whether set_insn_deleted
> had been called.  It should also have checked the note kind though.

Thanks for the explanation. I missed the fact that set_insn_delete () is used.
Assuming that code using RTL-SSA will use the insn_change class, it makes
sense now.

I'm converting the fold-mem-offsets pass to RTL-SSA (see PR117922).
And I ran into this issue because I've already converted the analysis
part to RTL-SSA,
but the code changes are still performed directly on the rtx_insn objects
(in do_commit_insn ()). I'll try to use RTL-SSA in do_commit_insn () as well,
which should also allow RTL-SSA to see the changes.

Thanks,
Christoph

> However, I agree that testing the deleted flag would be better.
> For that to work, we'd need to set the deleted flag here:
>
>   if (rtx_insn *rtl = insn->rtl ())
> ::remove_insn (rtl); // Remove the underlying RTL insn.
>
> as well as calling remove_insn.  Alternatively (and better), we could
> try converting ::remove_insn to ::delete_insn.
>
> Thanks,
> Richard
>
>
> > This patch drops these conditions and calls m_rtl->deleted () instead.
> >
> > The impact of this change is minimal as insn_info::has_been_deleted
> > is only called in insn_info::print_full.
> >
> > Bootstrapped and regtested x86_64-linux.
> >
> > gcc/ChangeLog:
> >
> >   * rtl-ssa/insns.h: Fix implementation of has_been_deleted ().
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/rtl-ssa/insns.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h
> > index d89dfc5c3f66..bb3f52efa83a 100644
> > --- a/gcc/rtl-ssa/insns.h
> > +++ b/gcc/rtl-ssa/insns.h
> > @@ -186,7 +186,7 @@ public:
> >// Return true if the instruction was a real instruction but has now
> >// been deleted.  In this case the instruction is no longer part of
> >// the SSA information.
> > -  bool has_been_deleted () const { return m_rtl && !INSN_P (m_rtl); }
> > +  bool has_been_deleted () const { return m_rtl->deleted (); }
> >
> >// Return true if the instruction is a debug instruction (and thus
> >// also a real instruction).

[PATCH] vect: Misalign checks for gather/scatter.

2025-06-25 Thread Robin Dapp


Hi,

this patch adds simple misalignment checks for gather/scatter
operations.  Previously, we assumed that those perform element accesses
internally so alignment does not matter.  The riscv vector spec however
explicitly states that vector operations are allowed to fault on
element-misaligned accesses.  Reasonable uarchs won't, but...

For gather/scatter we have two paths in the vectorizer:

(1) Regular analysis based on datarefs.  Here we can also create
strided loads.
(2) Non-affine access where each gather index is relative to the
initial address.

The assumption this patch works off is that once the alignment for the
first scalar is correct, all others will fall in line, as the index is
always a multiple of the first element's size.

For (1) we have a dataref and can check it for alignment as in other
cases.  For (2) this patch checks the object alignment of BASE and
compares it against the natural alignment of the current vectype's unit.

The patch also adds a pointer argument to the gather/scatter IFNs that
contains the necessary alignment.  Most of the patch is thus mechanical
in that it merely adjusts indices.

I tested the riscv version with a custom qemu version that faults on
element-misaligned vector accesses.  With this patch applied, there is
just a single fault left, which is due to PR120782 and which will be
addressed separately.

Is the general approach reasonable or do we need to do something else
entirely?  Bootstrap and regtest on aarch64 went fine.

I couldn't bootstrap/regtest on x86 as my regular cfarm machines
(420-422) are currently down.  Issues are expected, though, as the patch
doesn't touch x86's old-style gathers/scatters at all yet.  I still
wanted to get this initial version out there to get feedback.

The two riscv-specific changes I can still split off, obviously.
Also, I couldn't help but do tiny refactoring in some spots :)  This
could also go if requested.

I noticed one early-break failure with the changes where we would give
up on a load_permutation of {0}.  It looks latent and probably
unintended but I didn't investigate for now and just allowed this
specific permutation.

Regards
Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_support_vector_misalignment):
Always support known aligned types.
* internal-fn.cc (expand_scatter_store_optab_fn): Change
argument numbers.
(expand_gather_load_optab_fn): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_else_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_gather_scatter_fn_supported_p): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Ditto.
* tree-vect-data-refs.cc (vect_describe_gather_scatter_call):
Handle align_ptr.
(vect_check_gather_scatter): Compute and set align_ptr.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern):
Ditto.
* tree-vect-slp.cc (GATHER_SCATTER_OFFSET): Define.
(vect_get_and_check_slp_defs): Use define.
* tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
Set align_ptr.
(get_group_load_store_type): Do not special-case gather/scatter.
(get_load_store_type): Compute misalignment.
(vectorizable_store): Remove alignment assert for
scatter/gather.
(vectorizable_load): Ditto.
* tree-vectorizer.h (struct gather_scatter_info): Add align_ptr.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix riscv misalign supported check.
---
gcc/config/riscv/riscv.cc | 24 ++--
gcc/internal-fn.cc| 21 ---
gcc/optabs-query.cc   |  2 +-
gcc/testsuite/lib/target-supports.exp |  2 +-
gcc/tree-vect-data-refs.cc| 13 -
gcc/tree-vect-patterns.cc | 17 +++---
gcc/tree-vect-slp.cc  | 20 ---
gcc/tree-vect-stmts.cc| 83 ---
gcc/tree-vectorizer.h |  3 +
9 files changed, 130 insertions(+), 55 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8fdc5b21484..02637ee5a5b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12069,11 +12069,27 @@ riscv_estimated_poly_value (poly_int64 val,
   target.  */
bool
riscv_support_vector_misalignment (machine_mode mode,
-  const_tree type ATTRIBUTE_UNUSED,
+  const_tree type,
   int misalignment,
-  bool is_packed ATTRIBUTE_UNUSED)
-{
-  /* Depend on movmisalign pattern.  */
+  bool is_packed)
+{
+  /* IS_PACKED is true if the corresponding scalar element is not naturally
+ aligned.  In that case defer to the default hook which will check
+ if movmisalign is present.  Movmisalign, in turn, depends on
+ TARGET_VECTOR

Re: [PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

2025-06-25 Thread Nathaniel Shead

On Tue, Jun 24, 2025 at 11:14:51AM -0400, Jason Merrill wrote:
> On 6/24/25 10:16 AM, Nathaniel Shead wrote:
> > On Tue, Jun 24, 2025 at 01:03:53PM +0200, Jakub Jelinek wrote:
> > > Hi!
> > > 
> > > The following patch implements the P3618R0 paper by tweaking pedwarn
> > > condition, adjusting pedwarn wording, adjusting one testcase and adding 4
> > > new ones.  The paper was voted in as DR, so it isn't guarded on C++ 
> > > version.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > 
> > > 2025-06-24  Jakub Jelinek  
> > > 
> > >   PR c++/120773
> > >   * decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
> > >   main to the global module.  Only pedwarn for current_lang_name
> > >   other than lang_name_cplusplus and adjust pedwarn wording.
> > > 
> > >   * g++.dg/parse/linkage5.C: Don't expect error on
> > >   extern "C++" int main ();.
> > >   * g++.dg/parse/linkage7.C: New test.
> > >   * g++.dg/parse/linkage8.C: New test.
> > >   * g++.dg/modules/main-2.C: New test.
> > >   * g++.dg/modules/main-3.C: New test.
> > > 
> > > --- gcc/cp/decl.cc.jj 2025-06-19 08:55:04.408676724 +0200
> > > +++ gcc/cp/decl.cc2025-06-23 17:47:13.942011687 +0200
> > > @@ -11326,9 +11326,9 @@ grokfndecl (tree ctype,
> > > "cannot declare %<::main%> to be %qs", "consteval");
> > > if (!publicp)
> > >   error_at (location, "cannot declare %<::main%> to be static");
> > > -  if (current_lang_depth () != 0)
> > > +  if (current_lang_name != lang_name_cplusplus)
> > >   pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> 
> > > with a"
> > > -  " linkage specification");
> > > +  " linkage specification other than %<\"C++\"%>");
> > > if (module_attach_p ())
> > >   error_at (location, "cannot attach %<::main%> to a named 
> > > module");
> > 
> > Maybe it would be nice to add a note/fixit that users can now work
> > around this error by marking main as 'extern "C++"'?  But overall LGTM.
> 
> I suppose we could say "other than %" to make that a little
> clearer.  OK with that tweak.
> 
> I wouldn't object to a fixup but it sounds more complicated than it's worth
> to have different fixups for the extern "C" { int main(); } and extern "C"
> int main(); cases.
> 
> Jason
> 

I think I wasn't totally clear sorry; here's a patch with what I meant.
Tested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Subject: [PATCH] c++: Add fix note for how to declare main in a module

This patch adds a note to help users unfamiliar with modules terminology
understand how to declare main in a named module since P3618.

There doesn't appear to be an easy robust location available for "the
start of this declaration" that I could find to attach a fixit to, but
the explanation should suffice.

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Add explanation of how to attach to
global module.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 95bccfbb585..4fe97ffbf8f 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -11330,7 +11330,12 @@ grokfndecl (tree ctype,
pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a"
 " linkage specification other than %");
   if (module_attach_p ())
-   error_at (location, "cannot attach %<::main%> to a named module");
+   {
+ auto_diagnostic_group adg;
+ error_at (location, "cannot attach %<::main%> to a named module");
+ inform (location, "use % to attach it to the "
+ "global module instead");
+   }
   inlinep = 0;
   publicp = 1;
 }
-- 
2.47.0

RE: [PATCH 2/2] RISC-V: Add testcases for signed scalar SAT_ADD IMM form 2

2025-06-25 Thread Li, Pan2

> Pan -- can you cover reviewing the testsuite bits since thisis an area 
> where you've done a ton of work over the last year or so.

Sure thing and thanks Jeff, I will take a look after return back from a 
vacation, ETA before the end of this week.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, June 25, 2025 5:30 AM
To: Ciyan Pan ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; 
juzhe.zh...@rivai.ai; Li, Pan2 ; rdapp@gmail.com
Subject: Re: [PATCH 2/2] RISC-V: Add testcases for signed scalar SAT_ADD IMM 
form 2



On 6/23/25 9:12 PM, Ciyan Pan wrote:
> From: panciyan 
> 
> This patch adds testcase for form2, as shown below:
> 
> T __attribute__((noinline))  \
> sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)\
> {\
>T sum = (T)((UT)x + (UT)IMM);   \
>return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \
>  (-(T)(x < 0) ^ MAX) : sum; \
> }
> 
> Passed the rv64gcv regression test.
> 
> Signed-off-by: Ciyan Pan 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat/sat_arith.h:
>   * gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test.
>   * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test.
Pan -- can you cover reviewing the testsuite bits since thisis an area 
where you've done a ton of work over the last year or so.

THanks!

jeff

Re: [PATCH] ivopts: Change constant_multiple_of to expand aff nodes.

2025-06-25 Thread Richard Biener

On Tue, 24 Jun 2025, Alfie Richards wrote:

> Hi all,
> 
> This is a small change to ivopts to expand SSA variables enabling ivopts to
> correctly work out when an address IV step is set to be a multiple on index
> step in the loop header (ie, not constant, not calculated each loop.)
> 
> Seems like this might have compile speed costs that need to be considered, but
> I believe should be worth it.
> 
> This is also required for some upcoming work for vectorization of VLA loops 
> with
> iteration data dependencies.
> 
> Bootstrapped and reg tested on aarch64-linux-gnu and x86_64-unknown-linux-gnu.

OK.

Thanks,
Richard.

> Thanks,
> Alfie
> 
> -- >8 --
> 
> This changes the calls to tree_to_aff_combination in constant_multiple_of to
> tree_to_aff_combination_expand along with associated plumbing of ivopts_data
> and required cache.
> 
> This improves cases such as:
> 
> ```c
> void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) {
> for (unsigned long i = 0; i < end; i += step) {
> svst1(pg, p1, svld1_s32(pg, p2));
> p1 += step;
> p2 += step;
> }
> }
> ```
> 
> Where ivopts previously didn't expand the SSA variables for the step 
> increements
> and so lacked the ability to group all the IV's and ended up with:
> 
> ```
> f:
>   cbz x3, .L1
>   mov x4, 0
> .L3:
>   ld1wz31.s, p0/z, [x1]
>   add x4, x4, x2
>   st1wz31.s, p0, [x0]
>   add x1, x1, x2, lsl 2
>   add x0, x0, x2, lsl 2
>   cmp x3, x4
>   bhi .L3
> .L1:
>   ret
> ```
> 
> After this change we end up with:
> 
> ```
> f:
>   cbz x3, .L1
>   mov x4, 0
> .L3:
>   ld1wz31.s, p0/z, [x1, x4, lsl 2]
>   st1wz31.s, p0, [x0, x4, lsl 2]
>   add x4, x4, x2
>   cmp x3, x4
>   bhi .L3
> .L1:
>   ret
> ```
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-ivopts.cc (constant_multiple_of): Change
>   tree_to_aff_combination to tree_to_aff_combination_expand and add
>   parameter to take ivopts_data.
>   (get_computation_aff_1): Change parameters and calls to include
>   ivopts_data.
>   (get_computation_aff): Ditto.
>   (get_computation_at) Ditto.:
>   (get_debug_computation_at) Ditto.:
>   (get_computation_cost) Ditto.:
>   (rewrite_use_nonlinear_expr) Ditto.:
>   (rewrite_use_address) Ditto.:
>   (rewrite_use_compare) Ditto.:
>   (remove_unused_ivs) Ditto.:
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/adr_7.c: New test.
> ---
>  gcc/testsuite/gcc.target/aarch64/sve/adr_7.c | 19 ++
>  gcc/tree-ssa-loop-ivopts.cc  | 65 +++-
>  2 files changed, 54 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/adr_7.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c
> new file mode 100644
> index 000..61e23bbf182
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ftree-vectorize" } */
> +
> +#include 
> +
> +void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) 
> {
> +for (unsigned long i = 0; i < end; i += step) {
> +svst1(pg, p1, svld1_s32(pg, p2));
> +p1 += step;
> +p2 += step;
> +}
> +}
> +
> +/* { dg-final { scan-assembler-not {\tld1w\tz[0-9]+\.d, 
> p[0-9]+/z\[x[0-9]+\.d\]} } } */
> +/* { dg-final { scan-assembler-not {\tst1w\tz[0-9]+\.d, 
> p[0-9]+/z\[x[0-9]+\.d\]} } } */
> +
> +/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+} 1 } 
> } */
> +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-9]+/z, 
> \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */
> +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-9]+, 
> \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 8a6726f1988..544a946ff89 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -2117,11 +2117,15 @@ idx_record_use (tree base, tree *idx,
> signedness of TOP and BOT.  */
>  
>  static bool
> -constant_multiple_of (tree top, tree bot, widest_int *mul)
> +constant_multiple_of (tree top, tree bot, widest_int *mul,
> +   struct ivopts_data *data)
>  {
>aff_tree aff_top, aff_bot;
> -  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> -  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> +  tree_to_aff_combination_expand (top, TREE_TYPE (top), &aff_top,
> +   &data->name_expansion_cache);
> +  tree_to_aff_combination_expand (bot, TREE_TYPE (bot), &aff_bot,
> +   &data->name_expansion_cache);
> +
>poly_widest_int poly_mul;
>if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
>&& poly_mul.is_constant (mul))
> @@ -3945,

[PATCH][v2] tree-optimization/109892 - SLP reduction of fma

2025-06-25 Thread Richard Biener

The following adds the ability to vectorize a fma reduction pair
as SLP reduction (we cannot yet handle ternary association in
reduction vectorization yet).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

I'll file a bug about the missed handling for fold-left reductions.

PR tree-optimization/109892
* tree-vect-loop.cc (check_reduction_path): Handle fma.
(vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code
generation constraints.

* gcc.dg/vect/vect-reduc-fma-1.c: New testcase.
* gcc.dg/vect/vect-reduc-fma-2.c: Likewise.
* gcc.dg/vect/vect-reduc-fma-3.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c | 15 +++
 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c | 20 
 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c | 16 
 gcc/tree-vect-loop.cc| 17 +
 4 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
new file mode 100644
index 000..e958b43e23b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+double f(double x[], long n)
+{
+double r0 = 0, r1 = 0;
+for (; n; x += 2, n--) {
+r0 = __builtin_fma(x[0], x[0], r0);
+r1 = __builtin_fma(x[1], x[1], r1);
+}
+return r0 + r1;
+}
+
+/* We should vectorize this as SLP reduction.  */
+/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and 
unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c
new file mode 100644
index 000..ea1ca9720e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffp-contract=on" } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+static double muladd(double x, double y, double z)
+{
+return x * y + z;
+}
+double g(double x[], long n)
+{
+double r0 = 0, r1 = 0;
+for (; n; x += 2, n--) {
+r0 = muladd(x[0], x[0], r0);
+r1 = muladd(x[1], x[1], r1);
+}
+return r0 + r1;
+}
+
+/* We should vectorize this as SLP reduction.  */
+/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and 
unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c
new file mode 100644
index 000..10cecedd8e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffast-math" } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+double f(double x[], long n)
+{
+double r0 = 0, r1 = 0;
+for (; n; x += 2, n--) {
+r0 = __builtin_fma(x[0], x[0], r0);
+r1 = __builtin_fma(x[1], x[1], r1);
+}
+return r0 + r1;
+}
+
+/* We should vectorize this as SLP reduction, higher VF possible.  */
+/* { dg-final { scan-tree-dump "optimized: loop vectorized" "vect" { target { 
x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a3f95433a5b..9a4b89e9113 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4139,6 +4139,10 @@ pop:
  if (op.ops[2] == op.ops[opi])
neg = ! neg;
}
+  /* For an FMA the reduction code is the PLUS if the addition chain
+is the reduction.  */
+  else if (op.code == IFN_FMA && opi == 2)
+   op.code = PLUS_EXPR;
   if (CONVERT_EXPR_CODE_P (op.code)
  && tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
;
@@ -8084,6 +8088,19 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 "in-order reduction chain without SLP.\n");
  return false;
}
+ /* Code generation doesn't support function calls other
+than .COND_*.  */
+ if (!op.code.is_tree_code ()
+ && !(op.code.is_internal_fn ()
+  && conditional_internal_fn_code (internal_fn (op.code))
+   != ERROR_MARK))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"in-order reduction chain operation not "
+"supported.\n");
+ return false;
+   }
  STMT_VINFO_REDUC_TYPE (reduc_info)
= reduction_type = FOLD_LEFT_REDUCTION;

Re: [RFC] [lra] catch all to-sp eliminations [PR120424]

2025-06-25 Thread Vladimir Makarov




On 6/23/25 12:06 AM, Alexandre Oliva wrote:


Alex, thanks for investigation of corner cases of register elimination.


An x86_64-linux-gnu native with ix86_frame_pointer_required modified
to return true for nonzero frames, to exercize
lra_update_fp2sp_elimination, reveals in stage1 testing that wrong
code is generated for gcc.c-torture/execute/ieee/fp-cmp-8l.c:
argp-to-sp eliminations are used for one_test to pass its arguments on
to *pos, and the sp offsets survive the disabling of that elimination.

We didn't really have to disable that elimination, but the backend
disables eliminations to sp if frame_pointer_needed.

The workaround for this scenario is to compile with
-maccumulate-outgoing-args.

This change extends the catching of fp2sp eliminations to all (?)
eliminations to sp, since none of them can be properly reversed and
would silently lead to wrong code.  This is probably too strict.

I guess it is too strict.


Regstrapped on x86_64-linux-gnu, bootstrapped on arm-linux-gnueabihf
(arm and thumb modes), also tested with gcc-14 on arm-vx7r2 and
arm-linux-gnueabihf.

Unlike the combination of earlier patches, this one does NOT bootstrap
on x86_64-linux-gnu with ix86_frame_pointer_required modified to return
true for any positive frame sizes.

It also triggers one failure in acats-4 on arm-vx7r2, where I didn't
expect it to make any difference.  I'm yet to investigate it.

I wonder if it makes sense to put this in to (1.i) avoid silent wrong
code, and (1.ii) shake out some more lra_update_fp2sp_elimination
issues, or (2) keep it out and just file a PR about this one known
remaining issue, AFAICT only fixable by making sp offset adjustments
reversible.  WDYT?


I am not ready to answer the question about committing the patch right 
now.  It needs more time for investigation which I currently don't have 
but will have when the next release work starts.


In general I think we should have functionality generating the right 
code whenever any elimination goes prohibited or enabled back and forth 
during LRA work.  That is what I would like to aim at.  It might require 
considerable review of all existing elimination code.


So I think the best way right now is to fill a PR.



for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (elimination_2sp_occurred_p): Rename
from...
(elimination_fp2sp_occured_p): ... this.  Adjust all uses.
(lra_eliminate_regs_1): Don't require a from-frame-pointer
elimination to set it.
(update_reg_eliminate): Likewise to test it.
---
  gcc/lra-eliminations.cc |   24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 9cdd0c5ff53a2..f6ee33aa70a5d 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -309,8 +309,9 @@ move_plus_up (rtx x)
return x;
  }
  
-/* Flag that we already did frame pointer to stack pointer elimination.  */

-static bool elimination_fp2sp_occured_p = false;
+/* Flag that we already applied stack pointer elimination offset; sp
+   updates cannot be undone.  */
+static bool elimination_2sp_occurred_p = false;
  
  /* Scan X and replace any eliminable registers (such as fp) with a

 replacement (such as sp) if SUBST_P, plus an offset.  The offset is
@@ -369,8 +370,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx)
+   elimination_2sp_occurred_p = true;
  
  	  if (maybe_ne (update_sp_offset, 0))

{
@@ -402,8 +403,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx)
+   elimination_2sp_occurred_p = true;
  
  	  if (! update_p && ! full_p)

return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1));
@@ -465,8 +466,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx)
+   elimination_2sp_occurred_p = true;
  
  	  if (maybe_ne (update_sp_offset, 0))

{
@@ -1213,8 +1214,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
 pointer elimination the condition is a bit relaxed and we just 
require
 that actual elimination has not been done yet.   *

Re: [PATCH v1 2/2] middle-end: Enable masked load with non-constant offset

2025-06-25 Thread Richard Biener

On Tue, Jun 24, 2025 at 4:26 PM Karl Meakin  wrote:
>
> The function `vect_check_gather_scatter` requires the `base` of the load
> to be loop-invariant and the `off`set to be not loop-invariant. When faced
> with a scenario where `base` is not loop-invariant, instead of giving up
> immediately we can try swapping the `base` and `off`, if `off` is
> actually loop-invariant.
>
> Previously, it would only swap if `off` was the constant zero (and so
> trivially loop-invariant). This is too conservative: we can still
> perform the swap if `off` is a more complex but still loop-invariant
> expression, such as a variable defined outside of the loop.
>
> This allows loops like the function below to be vectorised, if the
> target has masked loads and sufficiently large vector registers (eg
> `-march=armv8-a+sve -msve-vector-bits=128`):
>
> ```c
> typedef struct Array {
> int elems[3];
> } Array;
>
> int loop(Array **pp, int len, int idx) {
> int nRet = 0;
>
> for (int i = 0; i < len; i++) {
> Array *p = pp[i];
> if (p) {
> nRet += p->elems[idx];
> }
> }
>
> return nRet;
> }
> ```
>
> gcc/ChangeLog:
> * tree-vect-data-refs.cc (vect_check_gather_scatter): Swap
> `base` and `off` in more scenarios. Also assert at the end of
> the function that `base` and `off` are loop-invariant and not
> loop-invariant respectively.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/sve/mask_load_2.c: Update tests.
> ---
>  .../gcc.target/aarch64/sve/mask_load_2.c  |  4 +--
>  gcc/tree-vect-data-refs.cc| 26 ---
>  2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> index 38fcf4f7206..66d95101a14 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> @@ -19,5 +19,5 @@ int loop(Array **pp, int len, int idx) {
>  return nRet;
>  }
>
> -// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } }
> -// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  0 } }
> +// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 1 } }
> +// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  1 } }
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ee040eb9888..d352ca8bcc3 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -4659,26 +4659,19 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>if (off == NULL_TREE)
>  off = size_zero_node;
>
> -  /* If base is not loop invariant, either off is 0, then we start with just
> - the constant offset in the loop invariant BASE and continue with base
> - as OFF, otherwise give up.
> - We could handle that case by gimplifying the addition of base + off
> - into some SSA_NAME and use that as off, but for now punt.  */
> +  /* BASE must be loop invariant.  If it is not invariant, but OFF is, then 
> we
> +   * can fix that by swapping BASE and OFF.  */
>if (!expr_invariant_in_loop_p (loop, base))
>  {
> -  if (!integer_zerop (off))
> +  if (!expr_invariant_in_loop_p (loop, off))
> return false;
> -  off = base;
> -  base = size_int (pbytepos);
> -}
> -  /* Otherwise put base + constant offset into the loop invariant BASE
> - and continue with OFF.  */
> -  else
> -{
> -  base = fold_convert (sizetype, base);
> -  base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
> +
> +  std::swap (base, off);
>  }
>
> +  base = fold_convert (sizetype, base);
> +  base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
> +
>/* OFF at this point may be either a SSA_NAME or some tree expression
>   from get_inner_reference.  Try to peel off loop invariants from it
>   into BASE as long as possible.  */
> @@ -4856,6 +4849,9 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>offset_vectype = NULL_TREE;
>  }
>
> +  gcc_assert (expr_invariant_in_loop_p (loop, base));
> +  gcc_assert (!expr_invariant_in_loop_p (loop, off));

Those are reasonably expensive that we want to avoid them, please
make them gcc_checking_assert at least.

OK with that change.
Richard.

> +
>info->ifn = ifn;
>info->decl = decl;
>info->base = base;
> --
> 2.45.2
>

[PATCH 16/17] Fortran: Silence a clang warning (suggesting a brace) in io.cc

2025-06-25 Thread Martin Jambor

Hi,

when GCC is built with clang, it suggests that we add a brace to the
initialization of format_asterisk:

  gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of 
subobject [-Wmissing-braces]

So this patch does that to silence it.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin



gcc/fortran/ChangeLog:

2025-06-24  Martin Jambor  

* io.cc (format_asterisk): Add a brace around static initialization
location part of the field locus.
---
 gcc/fortran/io.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/io.cc b/gcc/fortran/io.cc
index 7466d8fe094..4d28c2c90ba 100644
--- a/gcc/fortran/io.cc
+++ b/gcc/fortran/io.cc
@@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 
 gfc_st_label
 format_asterisk = {0, NULL, NULL, -1, ST_LABEL_FORMAT, ST_LABEL_FORMAT, NULL,
-  0, {NULL, NULL}, NULL, 0};
+  0, {NULL, {NULL}}, NULL, 0};
 
 typedef struct
 {
-- 
2.49.0

Re: [PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override

2025-06-25 Thread Andrew MacLeod


BTW, consider all such future changes in ranger code pre-approved!

Thanks

Andrew

On 6/25/25 10:27, Andrew MacLeod wrote:

OK for all the ranger related patches.

Thanks

Andrew

On 6/25/25 10:08, Martin Jambor wrote:

Hi,

When GCC is compiled with clang, it emits a warning that
dom_oracle::next_relation is not marked as override even though it
does override a virtual function of its ancestor.  This patch marks it
as such to silence the warning and for the sake of consistency.

There are other member functions in the class which are marked as
final override but this particular function is in the protected
section so I decided to just mark it as override.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* value-relation.h (class dom_oracle): Mark member function
next_relation as override.
---
  gcc/value-relation.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index 1081877ccca..87f0d856fab 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -235,7 +235,7 @@ public:
    void dump (FILE *f) const final override;
  protected:
    virtual relation_chain *next_relation (basic_block, 
relation_chain *,

- tree) const;
+ tree) const override;
    bool m_do_trans_p;
    bitmap m_tmp, m_tmp2;
    bitmap m_relation_set;  // Index by ssa-name. True if a relation 
exists

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-25 Thread Robin Dapp


This change reminds me that we lack documentation about arguments
of most of the "complicated" internal functions ...


I didn't mention it but I got implicitly reminded several times while writing
the patch... ;)  An overhaul has been on my todo list for a while but of course 
it never was top priority.  Ideally an adjusted API would also be useable by 
SLP's argument map.



We miss internal_fn_gatherscatter_{offset,scale}_index and possibly
a internal_fn_ldst_ptr_index (always zero?) and
internal_fn_ldst_alias_align_index (always one, if supported?).

  if (elsvals && icode != CODE_FOR_nothing)
 get_supported_else_vals
-  (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, 
*elsvals);

+  (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals);

these "fixes" seem to be independent?


Just realized I forgot to remove the comments.  Due to the additional argument, 
both optab and IFN happen to have the same arguments now.  That's why the + 1 
is not necessary any more.


Thanks for the comments.  Will adjust, test on x86 and re-spin.

--
Regards
Robin

[COMMITTED] - Remove unused vector in value-relation.cc.

2025-06-25 Thread Andrew MacLeod


On 6/23/25 18:21, Martin Jambor wrote:

@@ -208,66 +208,6 @@ static const tree_code relation_to_code [VREL_LAST] = {
ERROR_MARK, ERROR_MARK, LT_EXPR, LE_EXPR, GT_EXPR, GE_EXPR, EQ_EXPR,
NE_EXPR };
  
-// This routine validates that a relation can be applied to a specific set of

-// ranges.  In particular, floating point x == x may not be true if the NaN bit
-// is set in the range.  Symbolically the oracle will determine x == x,
-// but specific range instances may override this.
-// To verify, attempt to fold the relation using the supplied ranges.
-// One would expect [1,1] to be returned, anything else means there is 
something
-// in the range preventing the relation from applying.
-// If there is no mechanism to verify, assume the relation is acceptable.
-
-relation_kind
-relation_oracle::validate_relation (relation_kind rel, vrange &op1, vrange 
&op2)
-{
-  // If there is no mapping to a tree code, leave the relation as is.
-  tree_code code = relation_to_code [rel];

This seems to have been the only use of the array relation_to_code which
we however still have around.  Should it be removed too?

Thanks,

Martin


Indeed.

Removed thusly.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.   Pushed.

Andrew

From 25a15a4c0318d928d534a0db9592cb6f0e454707 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 24 Jun 2025 16:51:56 -0400
Subject: [PATCH 3/3] Remove unused vector in value-relation.cc.

The relation_to_code vector in value-relation is now unused, so we can
remove it.

	* value-relation.cc (relation_to_code): Remove.
---
 gcc/value-relation.cc | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index c7ced445ad7..2ac7650fe5b 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -202,12 +202,6 @@ adjust_equivalence_range (vrange &range)
 }
  }
 
-// This vector maps a relation to the equivalent tree code.
-
-static const tree_code relation_to_code [VREL_LAST] = {
-  ERROR_MARK, ERROR_MARK, LT_EXPR, LE_EXPR, GT_EXPR, GE_EXPR, EQ_EXPR,
-  NE_EXPR };
-
 // Given an equivalence set EQUIV, set all the bits in B that are still valid
 // members of EQUIV in basic block BB.
 
-- 
2.45.0

Re: [PATCH 15/17] coroutines: Removed unused private member in cp_coroutine_transform

2025-06-25 Thread Iain Sandoe




> On 25 Jun 2025, at 15:17, Martin Jambor  wrote:
> 
> Hi,
> 
> when building GCC with clang, it warns that the private member suffix
> in class cp_coroutine_transform (defined in gcc/cp/coroutines.h) is
> not used which indeed looks like it is the case.  This patch therefore
> removes it.
> 
> Bootstrapped and tested on x86_64-linx.  OK for master?

LGTM and presumably in the “trivial / obvious” category. If we need to
preserve the original fn body in upcoming changes, we can always add
it back.

thanks
Iain

> 
> Alternatively, as with all of these clang warning issues, I'm
> perfectly happy to add an entry to contrib/filter-clang-warnings.py to
> ignore the warning instead.
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/cp/ChangeLog:
> 
> 2025-06-24  Martin Jambor  
> 
>   * coroutines.h (class cp_coroutine_transform): Remove member
>   orig_fn_body.
> ---
> gcc/cp/coroutines.h | 1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/gcc/cp/coroutines.h b/gcc/cp/coroutines.h
> index 919dc9ab06b..fcc46457915 100644
> --- a/gcc/cp/coroutines.h
> +++ b/gcc/cp/coroutines.h
> @@ -100,7 +100,6 @@ public:
> 
> private:
>   tree orig_fn_decl;   /* The original function decl.  */
> -  tree orig_fn_body = NULL_TREE; /* The original function body.  */
>   location_t fn_start = UNKNOWN_LOCATION;
>   location_t fn_end = UNKNOWN_LOCATION;
>   tree resumer = error_mark_node;
> -- 
> 2.49.0
>

Re: [PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override

2025-06-25 Thread Andrew MacLeod


OK for all the ranger related patches.

Thanks

Andrew

On 6/25/25 10:08, Martin Jambor wrote:

Hi,

When GCC is compiled with clang, it emits a warning that
dom_oracle::next_relation is not marked as override even though it
does override a virtual function of its ancestor.  This patch marks it
as such to silence the warning and for the sake of consistency.

There are other member functions in the class which are marked as
final override but this particular function is in the protected
section so I decided to just mark it as override.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* value-relation.h (class dom_oracle): Mark member function
next_relation as override.
---
  gcc/value-relation.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index 1081877ccca..87f0d856fab 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -235,7 +235,7 @@ public:
void dump (FILE *f) const final override;
  protected:
virtual relation_chain *next_relation (basic_block, relation_chain *,
-tree) const;
+tree) const override;
bool m_do_trans_p;
bitmap m_tmp, m_tmp2;
bitmap m_relation_set;  // Index by ssa-name. True if a relation exists

[PATCH v7 4/9] AArch64: add constants for branch displacements

2025-06-25 Thread Karl Meakin

Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
(BRANCH_LEN_N_128MiB): Likewise.
(BRANCH_LEN_P_1MiB): Likewise.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
---
 gcc/config/aarch64/aarch64.md | 64 ++-
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d79b74924d4..c4c23dc3669 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -704,7 +704,23 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+;; Maximum PC-relative positive/negative displacements for various branching
+;; instructions.
+(define_constants
+  [
+;; +/- 128MiB.  Used by B, BL.
+(BRANCH_LEN_P_128MiB  134217724)
+(BRANCH_LEN_N_128MiB -134217728)
+
+;; +/- 1MiB.  Used by B., CBZ, CBNZ.
+(BRANCH_LEN_P_1MiB  1048572)
+(BRANCH_LEN_N_1MiB -1048576)
 
+;; +/- 32KiB.  Used by TBZ, TBNZ.
+(BRANCH_LEN_P_32KiB  32764)
+(BRANCH_LEN_N_32KiB -32768)
+  ]
+)
 
 ;; ---
 ;; Conditional jumps
@@ -769,13 +785,17 @@ (define_insn "aarch64_bcond"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -830,13 +850,17 @@ (define_insn "aarch64_cbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -870,13 +894,17 @@ (define_insn "*aarch64_tbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768))
-  (lt (minus (match_dup 1) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -931,13 +959,17 @@ (define_insn "@aarch64_tbz"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768))
-  (lt (minus (match_dup 2) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRAN

[PATCH v7 5/9] AArch64: make `far_branch` attribute a boolean

2025-06-25 Thread Karl Meakin

The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz1): Likewise.
(*aarch64_tbz1): Likewise.
(@aarch64_tbz): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c4c23dc3669..1ff887a977e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -569,9 +569,7 @@ (define_attr "enabled" "no,yes"
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -796,8 +794,8 @@ (define_insn "aarch64_bcond"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -861,8 +859,8 @@ (define_insn "aarch64_cbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
@@ -876,7 +874,7 @@ (define_insn "*aarch64_tbz1"
   {
 if (get_attr_length (insn) == 8)
   {
-   if (get_attr_far_branch (insn) == 1)
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
  return aarch64_gen_far_branch (operands, 1, "Ltb",
 "\\t%0, , ");
else
@@ -905,8 +903,8 @@ (define_insn "*aarch64_tbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; ---
@@ -970,8 +968,8 @@ (define_insn "@aarch64_tbz"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 
 )
 
-- 
2.45.2

Re: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644]

2025-06-25 Thread Jason Merrill


On 6/25/25 9:02 AM, Nathaniel Shead wrote:

On Tue, Jun 24, 2025 at 12:10:09PM -0400, Patrick Palka wrote:

On Tue, 24 Jun 2025, Jason Merrill wrote:


On 6/23/25 5:41 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?

-- >8 --

We were erroring because the TEMPLATE_DECL of the existing partial
specialisation has an undeduced return type, but the imported
declaration did not.

The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
where modules streaming code assumes that a TEMPLATE_DECL and its
DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
fixed the issue by ensuring that when the type of a variable is deduced
the TEMPLATE_DECL is updated as well, but this missed handling partial
specialisations.

However, I don't think we actually care about that, since it seems that
only the type of the inner decl actually matters in practice.  Instead,
this patch handles the issue on the modules side when deduping a
streamed decl, by only comparing the inner type.

PR c++/120644

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Remove workaround.


Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL correct,
maybe we should always set it to NULL_TREE to make sure we only look at the
inner type.


FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL
corresponding to a partial specialization via

  TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl)))

if we do want to end up keeping the two TREE_TYPEs in sync.



Thanks.  On further reflection, maybe the safest approach is to just
ensure that the types are always consistent (including for partial
specs); this is what the following patch does.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Subject: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is
  consistent with its template [PR120644]

We were erroring because the TEMPLATE_DECL of the existing partial
specialisation has an undeduced return type, but the imported
declaration did not.

The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
where modules streaming code assumes that a TEMPLATE_DECL and its
DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
fixed the issue by ensuring that when the type of a variable is deduced
the TEMPLATE_DECL is updated as well, but missed handling partial
specialisations.  This patch ensures that the same adjustment is made
there as well.

PR c++/120644

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Also propagate type to partial
templates.
* module.cc (trees_out::decl_value): Add assertion that the
TREE_TYPE of a streamed template decl matches its inner.
(trees_in::is_matching_decl): Clarify function return type
deduction should only occur for non-TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/auto-7.h: New test.
* g++.dg/modules/auto-7_a.H: New test.
* g++.dg/modules/auto-7_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
Reviewed-by: Patrick Palka 
---
  gcc/cp/decl.cc  | 13 +
  gcc/cp/module.cc|  7 ++-
  gcc/testsuite/g++.dg/modules/auto-7.h   | 12 
  gcc/testsuite/g++.dg/modules/auto-7_a.H |  5 +
  gcc/testsuite/g++.dg/modules/auto-7_b.C |  5 +
  5 files changed, 37 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4fe97ffbf8f..59701197e16 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8923,10 +8923,15 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
  
/* Update the type of the corresponding TEMPLATE_DECL to match.  */

-  if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl)
- && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl)
-   TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type;
+  if (DECL_LANG_SPECIFIC (decl) && DECL_TEMPLATE_INFO (decl))
+   {
+ tree info = DECL_TEMPLATE_INFO (decl);
+ tree tmpl = TI_TEMPLATE (info);
+ if (DECL_TEMPLATE_RESULT (tmpl) == decl)
+   TREE_TYPE (tmpl) = type;
+ else if (PRIMARY_TEMPLATE_P (tmpl) && TI_PARTIAL_INFO (info))
+   TREE_TYPE (TI_TEMPLATE (TI_PARTIAL_INFO (info))) = type;
+   }


Perhaps we should update template_for_substitution to handle partial 
specs and then use it here?



  }
  
if (ensure_literal_type_for_constexpr_object (decl) == error_mark_node)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c99988da05b..53edb2ff203 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8212,6 +8212,10 @@ trees_out::d

[PATCH v7 0/9] AArch64: CMPBR support

2025-06-25 Thread Karl Meakin

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Changelog:
* v7:
  - Support far branches and add a test for them.
  - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`.
  - Delete the new predicates that aren't needed anymore.
  - Minor formatting and comment fixes.
* v6:
  - Correct the constraint string for immediate operands.
  - Drop the commit for adding `%j` format specifiers. The suffix for
the `cb` instruction is now calculated by the `cmp_op` code
attribute.
* v5:
  - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
CMPBR...). Every commit in the series should now produce a correct
compiler.
  - Reduce excessive diff context by not passing `--function-context` to
`git format-patch`.
* v4:
  - Added a commit to use HS/LO instead of CS/CC mnemonics.
  - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
CBHS, CBLE and CBLS have different ranges of allowed immediates than
the other comparisons.

Karl Meakin (9):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |   39 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  564 --
 gcc/config/aarch64/constraints.md |   18 +
 gcc/config/aarch64/iterators.md   |   30 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1704 +
 gcc/testsuite/lib/target-supports.exp |   14 +-
 12 files changed, 2162 insertions(+), 221 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

--
2.45.2

[PATCH v7 8/9] AArch64: rules for CMPBR instructions

2025-06-25 Thread Karl Meakin

Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch4): Rename to ...
(cbranch4): ...here, and emit CMPBR if possible.
(cbranch4): New expand rule.
(aarch64_cb): New insn rule.
(aarch64_cb): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c:
---
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64.cc|  33 ++
 gcc/config/aarch64/aarch64.md|  95 +++-
 gcc/config/aarch64/constraints.md|  18 +
 gcc/config/aarch64/iterators.md  |  30 ++
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 587 ---
 6 files changed, 379 insertions(+), 386 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 31f2f5b8bd2..e946e8da11d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, 
vec,
 unsigned int, tree, unsigned int,
 tree *);
 
+bool aarch64_cb_rhs (rtx_code op_code, rtx rhs);
+
 namespace aarch64 {
   void report_non_ice (location_t, tree, unsigned int);
   void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2cd03b941bd..f3ce3a15b09 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern)
   gcc_unreachable ();
 }
 
+/* Return true if RHS is an operand suitable for a CB (immediate)
+   instruction.  OP_CODE determines the type of the comparison.  */
+bool
+aarch64_cb_rhs (rtx_code op_code, rtx rhs)
+{
+  if (!CONST_INT_P (rhs))
+return REG_P (rhs);
+
+  HOST_WIDE_INT rhs_val = INTVAL (rhs);
+
+  switch (op_code)
+{
+case EQ:
+case NE:
+case GT:
+case GTU:
+case LT:
+case LTU:
+  return IN_RANGE (rhs_val, 0, 63);
+
+case GE:  /* CBGE:   signed greater than or equal */
+case GEU: /* CBHS: unsigned greater than or equal */
+  return IN_RANGE (rhs_val, 1, 64);
+
+case LE:  /* CBLE:   signed less than or equal */
+case LEU: /* CBLS: unsigned less than or equal */
+  return IN_RANGE (rhs_val, -1, 62);
+
+default:
+  return false;
+}
+}
+
 /* Return the location of a piece that is known to be passed or returned
in registers.  FIRST_ZR is the first unused vector argument register
and FIRST_PR is the first unused predicate argument register.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1ff887a977e..32e0f739ae5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -717,6 +717,10 @@ (define_constants
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
@@ -724,18 +728,35 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2]))
+{
+  /* Fall-through to `aarch64_cb`.  */
+}
+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 "aarch64_reg_or_zero")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  ""
 )
 
 (define_expand "cbranch4"
@@ -763,6 +784,68 @@

[PATCH v7 7/9] AArch64: precommit test for CMPBR instructions

2025-06-25 Thread Karl Meakin

Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1877 ++
 gcc/testsuite/lib/target-supports.exp|   14 +-
 2 files changed, 1885 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..9ca376a8f33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1877 @@
+// Test that the instructions added by FEAT_CMPBR are emitted
+// { dg-do compile }
+// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } }
+// { dg-options "-march=armv9.5-a+cmpbr -O2" }
+// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } 
{\.L[0-9]+} } }
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return (x0 op rhs) ? taken() : not_taken();
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// Comparisons against the immediate 0 can be done for all types,
+// because we can use the wzr/xzr register as one of the operands.
+// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible,
+// because they have larger range.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+// CBB and CBH cannot have immediate operands.
+// Instead we have to do a MOV+CB.
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 64 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead,
+//   because B has a longer range than CB.
+COMPARE_ALL(u8, i8, 64);
+COMPARE_ALL(u16, i16, 64);
+COMPARE_ALL(u32, i32, 64);
+COMPARE_ALL(u64, i64, 64);
+
+// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 
12
+// bits), but it can be materialized in a single MOV.
+COMPARE_ALL(u16, i16, 4098);
+COMPARE_ALL(u32, i32, 4098);
+COMPARE_ALL(u64, i64, 4098);
+
+// If the branch destination is out of range (1KiB), we have to generate an
+// extra B instruction (which can handle larger displacements) and branch 
around
+// it
+int far_branch(i32 x, i32 y) {
+  volatile int z = 0;
+  if (x == y) {
+// clang-format off
+#define STORE_2()   z = 0; z = 0;
+#define STORE_4()   STORE_2();   STORE_2();
+#define STORE_8()   STORE_4();   STORE_4();
+#define STORE_16()  STORE_8();   STORE_8();
+#define STORE_32()  STORE_16();  STORE_16();
+#define STORE_64()  STORE_32();  STORE_32();
+#define STORE_128() STORE_64();  STORE_64();
+#define STORE_256() STORE_128(); STORE_128();
+// clang-format on
+
+STORE_256();
+  }
+  return taken();
+}
+
+/*
+** u8_x0_eq_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L4
+** b   not_taken
+** .L4:
+** b   taken
+*/
+
+/*
+** u8_x0_ne_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L6
+** b   taken
+** .L6:
+** b   not_taken
+*/
+
+/*
+** u8_x0_ult_

[PATCH v7 1/9] AArch64: place branch instruction rules together

2025-06-25 Thread Karl Meakin

The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e11e13033d2..fcc24e300e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -682,6 +682,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -700,6 +704,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -739,6 +749,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

Re: [PATCH] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-25 Thread Michael Welsh Duggan

Tomasz Kamiński  writes:

> This patch reworks the formatting for the chrono types, such that they are all
> formatted in terms of _ChronoData class, that includes all required fields.
> Populating each required field is performed in formatter for specific type,
> based on the chrono-spec used.
>
> To facilitate above, the _ChronoSpec now includes additional _M_needed field,
> that represnts the chrono data that is referenced by format spec (this value
> is also configured for __defSpec). This value differs from the value of
> __parts passed to _M_parse, which does include all fields that can be computed
> from input (e.g. weekday_indexed can be computed for year_month_day). Later
> it is used to fill _ChronoData, in particular _M_fill_* family of functions,
> to determine if given field needs to be set, and thus it's value needs to be
> computed.
>
> In consequence _ChronoParts enum was exteneded with additional values,
> that allows more fine grained indentification:
>  * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
>  * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
>  * _LocalDays, _WeekdayIndex are defiend in included in _Date,
>  * _Duration is removed, and instead _EpochUnits and _UnitSuffix are
>introduced.
> Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
> with additional operators that simplify uses.
>
> In addition to fields that can be printed using chron-spec, _ChronoData 
> stores:
>  * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
>struct tm construction, and for ISO calendar computation.
>  * Total seconds in wall time (_M_lseconds) - this value may be different from
>sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
>to allow future extension, like printing total minutes.
>  * Total seconds since epoch - due offset different from above. Again to be
>used with future extension (e.g. %s as proposed in P2945R1).
>  * Subseconds - count of attoseconds (10^(-18)), in addition to priting can
>be used to  compute fractional hours, minutes.
> The both total seconds fielkds we use single _TotalSeconds enumerator in
> _ChronoParts, that when present in combination with _EpochUnits or _LocalDays
> indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
> provided/required.
>
> To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
> format_args mechanism, where the result of +d.count() (see LWG4118) is erased
> into make_format_args to local __arg_store, that is later referenced by
> _M_ereps (_M_ereps.get(0)).
>
> To handle precision values, and in prepartion to allow user to configure ones,
> we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
> allows duration with precision to be printed using "{0:{2}}". For subseconds
> the precision is handled differently depending on the representation:
>  * for integral reps, _M_subseconds value is used to determine fractional 
> value,
>precision is trimmed to 18 digits;
>  * for floating-points, we _M_ereps stores duration initialized with only
>fractional seconds, that is later formatted with precision.
> Always using _M_subseconds fields for integral duration, means that we do not
> use formattter for user-defined durations that are considered to be integral
> (see empty_spec.cc file change). To avoid potentially expensive computation
> of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
> _Subseconds are needed. In particular we remove this flag for localized ouput
> in _M_parse.
>
> Construction the _M_ereps as described above is handled by 
> __formatter_duration,
> that is then used to format duration, hh_mm_ss and time_points specialization.
> This class also handles _UnitSuffix, the _M_units_suffix field is populated
> either with predefined suffix (chrono::__detail::__units_suffix) or one 
> produced
> locally.
>
> Finally, formatters for types listed below contains type specific logic:
>  * hh_mm_ss - we do not compute total duration and seconds, unless explicitly
>requested, as such computation may overflow;
>  * utc_time - for time during leap second insertion, the _M_seconds field is
>increased to 60;
>  * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
>abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
>futhermore conversion from `char` to `wchar_t` for abbreviation is 
> performed
>if needed.
>
>   PR libstdc++/110739
>
> libstdc++-v3/ChangeLog:
>
>   * include/bits/chrono_io.h (__format::__no_timezone_available):
>   Removed, replaced with separate throws in formatter for
>   __local_time_fmt
>   (__format::_ChronoParts): Defined additional enumertors and
>   declared as enum class.
>   (__format::operator&(_ChronoParts, _ChronoParts))
>   (__format::operator&=(_ChronoParts&, _ChronoParts))
>   (__format

Re: [PATCH 03/17] Diagnostics: Mark path_label::get_effects as final override

2025-06-25 Thread David Malcolm

On Wed, 2025-06-25 at 16:04 +0200, Martin Jambor wrote:
> Hi,
> 
> When compiling diagnostic-path-output.cc with clang, it warns that
> path_label::get_effects should be marked as override.  That looks
> like
> a good idea and from a brief look I also believe it should be marked
> as final (the other override in the class is marked as both), so this
> patch does that.
> 
> Likewise for html_output_format::after_diagnostic in
> diagnostic-format-html.cc which also already has quite a few member
> functions marked as final override.
> 
> Bootstrapped and tested on x86_64-linx.  OK for master?

Yes please

Thanks for doing this
Dave

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-25 Thread Jason Merrill


On 6/25/25 3:08 AM, Jakub Jelinek wrote:

On Tue, Jun 24, 2025 at 05:19:59PM -0400, Jason Merrill wrote:

I think we could move the initialization of the fixed_type_p and
virtual_access variables up, they don't need to be after cp_build_addr_expr.


I don't understand why it doesn't depend on cp_build_addr_expr.
I've tried the following patch and while it didn't regress anything on
make GXX_TESTSUITE_STDS=98,11,14,17,^C,23,26 check-g++
it regressed
FAIL: 23_containers/vector/bool/cmp_c++20.cc  -std=gnu++20 (test for excess 
errors)
FAIL: 23_containers/vector/bool/cmp_c++20.cc  -std=gnu++26 (test for excess 
errors)
In there code is PLUS_EXPR, !want_pointer, !has_empty, but uneval is true
and expr is
std::vector::begin (&c)
before cp_build_addr_expr and
&TARGET_EXPR ::begin (&c)>
after it.  resolves_to_fixed_type_p (expr) is 0 before cp_build_addr_expr
and 1 after it.


Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class 
type like a TARGET_EXPR.  I also wonder why the call isn't already 
wrapped in a TARGET_EXPR by build_cxx_call=>build_cplus_new at this point.



v_binfo is false though, so in that
particular case I think we don't actually care about fixed_type_p value,
but it doesn't raise confidence that testing resolves_to_fixed_type_p
early is ok.



So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has
CLASSTYPE_VBASECLASSES walk the
for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase))
if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase))
and in that case try to compare byte_position (TREE_OPERAND (path, 1))
against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type
check?) then decide based on BINFO_BASE_ACCESS or something like that
whether it was a private/protected vs. public virtual base?


It seems simpler to pass an accurate access to the build_base_field above.
At least whether the whole BINFO_INHERITANCE_CHAIN is public or not, I
suppose the distinction between private and protected doesn't matter.


I'm afraid I'm quite lost on what actually is public base class
that [expr.dynamic.cast] talks about in the case of virtual bases because
a virtual base can appear many times among the bases and if it is virtual
in all cases, there is just one copy of it and it can be public in some
paths and private/protected in others.  And where to find that information.


I think it would make sense to add a publicly_virtually_derived_p 
function after publicly_uniquely_derived_p, which adds 
ba_require_virtual to the flags passed by the latter function.  And then 
you can use that here.



I've tried the following testcase and it seems that it succeeds unless
-DP1 -DP2
-DP1 -DP3
-DP1 -DP6
-DP2 -DP3 -DP6
-DP4 -DP5 -DP6
-DP2 -DP3 -DP4 -DP5
is a subset of the -DPN options or in case of clang++ also
-DP2 -DP4 -DP5 (for that g++ passes, clang++ fails).
E.g. what is the difference between -DP1 which works and
S is private in one case and public in 2 others, while -DP1 -DP2
doesn't work and is private in two cases and public in one.


Hmm, that does seem wrong.  For -DP1 -DP2 dynamic_cast, following 
the logic in https://eel.is/c++draft/expr#dynamic.cast-9 we get


9.1) in t, (S&)t does not refer to a public base of a T, because -DP1 
makes it a private base.  So move on.
9.2) (S&)t does refer to a public base of t because we didn't specify 
-DP4.  V does have an unambiguous (because virtual) public (because no 
-DP3 -DP5) T base, so this ought to succeed.


This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81078

Jason


#ifdef P1
#undef P1
#define P1 private
#else
#define P1
#endif
#ifdef P2
#undef P2
#define P2 private
#else
#define P2
#endif
#ifdef P3
#undef P3
#define P3 private
#else
#define P3
#endif
#ifdef P4
#undef P4
#define P4 private
#else
#define P4
#endif
#ifdef P5
#undef P5
#define P5 private
#else
#define P5
#endif
#ifdef P6
#undef P6
#define P6 private
#else
#define P6
#endif
struct S { int a, b; virtual int bar (int) { return 0; } };
struct T : virtual P1 S { int c, d; };
struct U : virtual P2 S, virtual P3 T { int e; };
struct V : virtual P4 S, virtual P5 T, virtual P6 U { int f; S &foo () { return (S 
&)*this; } };

int
main ()
{
   V t;
   t.f = 1;
//  t.c = 2;
   dynamic_cast (t.foo ());
   dynamic_cast (t.foo ());
   dynamic_cast (t.foo ());
}

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-25 Thread Jakub Jelinek

On Wed, Jun 25, 2025 at 12:37:33PM -0400, Jason Merrill wrote:
> Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class type
> like a TARGET_EXPR.  I also wonder why the call isn't already wrapped in a
> TARGET_EXPR by build_cxx_call=>build_cplus_new at this point.

Wonder if it has anything to do with being in unevaluated context (and
whether perhaps cp_build_addr_expr isn't undesirable for that case, because
that can make vars odr-used etc.; are are odr uses in unevaluated context
also supposed to make vars odr-used?).

Jakub

Re: [PATCH 16/17] Fortran: Silence a clang warning (suggesting a brace) in io.cc

2025-06-25 Thread Steve Kargl

Thanks for cleaning up gfortran code.  I was curious about
what the GNU Coding Standard said about this case, but it
does not consider initialization of subobjects.  I did find

   5.3 Clean Use of C Constructs
   ...
   Don't make the program ugly just to placate static
   analysis tools such as lint, clang, and GCC with extra
   warnings options such as -Wconversion and -Wundef.
   These tools can help find bugs and unclear code, but
   they can also generate so many false alarms that it hurts
   readability to silence them with unnecessary casts,
   wrappers, and other complications.

I do not see the extra '{...}' as hurting readability.  I
have no objection to the change.  Does anyone else have a
comment?

-- 
steve

On Wed, Jun 25, 2025 at 04:18:16PM +0200, Martin Jambor wrote:
> Hi,
> 
> when GCC is built with clang, it suggests that we add a brace to the
> initialization of format_asterisk:
> 
>   gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of 
> subobject [-Wmissing-braces]
> 
> So this patch does that to silence it.
> 
> Bootstrapped and tested on x86_64-linx.  OK for master?
> 
> Alternatively, as with all of these clang warning issues, I'm
> perfectly happy to add an entry to contrib/filter-clang-warnings.py to
> ignore the warning instead.
> 
> Thanks,
> 
> Martin
> 
> 
> 
> gcc/fortran/ChangeLog:
> 
> 2025-06-24  Martin Jambor  
> 
>   * io.cc (format_asterisk): Add a brace around static initialization
>   location part of the field locus.
> ---
>  gcc/fortran/io.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/fortran/io.cc b/gcc/fortran/io.cc
> index 7466d8fe094..4d28c2c90ba 100644
> --- a/gcc/fortran/io.cc
> +++ b/gcc/fortran/io.cc
> @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  gfc_st_label
>  format_asterisk = {0, NULL, NULL, -1, ST_LABEL_FORMAT, ST_LABEL_FORMAT, NULL,
> -0, {NULL, NULL}, NULL, 0};
> +0, {NULL, {NULL}}, NULL, 0};
>  
>  typedef struct
>  {
> -- 
> 2.49.0

-- 
Steve

Re: [PATCH v6 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-06-25 Thread Richard Sandiford

Karl Meakin  writes:
> Move the rules for CBZ/TBZ to be above the rules for
> CBB/CBH/CB. We want them to have higher priority
> because they can express larger displacements.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (aarch64_cbz1): Move
>   above rules for CBB/CBH/CB.
>   (*aarch64_tbz1): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cmpbr.c: Update tests.
> ---
>  gcc/config/aarch64/aarch64.md| 159 ---
>  gcc/testsuite/gcc.target/aarch64/cmpbr.c |  32 ++---
>  2 files changed, 101 insertions(+), 90 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 23bce55f620..dd58e88fa2f 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -726,6 +726,17 @@ (define_constants
>  
>  ;; ---
>  ;; Conditional jumps

Very, very minor, but: if we're following the aarch64-sve.md convention,
there'd be another:

;; ---

here, to separate the heading from the description.

> +;; The order of the rules below is important.
> +;; Higher priority rules are preferred because they can express larger
> +;; displacements.
> +;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
> +;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
> +;; 3) When the CMPBR extension is enabled:
> +;;   a) Comparisons between two registers are handled by
> +;;  CBB/CBH/CB.
> +;;   b) Comparisons between a GP register and an immediate in the range 0-63 
> are

Maybe just "in-range immediate", given the multiple ranges in play.

OK with those changes, thanks.

However, I suppose this patch means that:

/* Fall through to `aarch64_cb`.  */

from patch 8 is not really accurate, since sometimes we might snag
a higher-priority comparison.  So maybe just.

/* The branch is supported natively.  */

Thanks,
Richard

> +;;  handled by CB (immediate).
> +;; 4) Otherwise, emit a CMP+B sequence.
>  ;; ---
>  
>  (define_expand "cbranch4"
> @@ -783,6 +794,80 @@ (define_expand "cbranchcc4"
>""
>  )
>  
> +;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
> +(define_insn "aarch64_cbz1"
> +  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
> + (const_int 0))
> +(label_ref (match_operand 1))
> +(pc)))]
> +  "!aarch64_track_speculation"
> +  {
> +if (get_attr_length (insn) == 8)
> +  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, 
> ");
> +else
> +  return "\\t%0, %l1";
> +  }
> +  [(set_attr "type" "branch")
> +   (set (attr "length")
> + (if_then_else (and (ge (minus (match_dup 1) (pc))
> +(const_int BRANCH_LEN_N_1MiB))
> +(lt (minus (match_dup 1) (pc))
> +(const_int BRANCH_LEN_P_1MiB)))
> +   (const_int 4)
> +   (const_int 8)))
> +   (set (attr "far_branch")
> + (if_then_else (and (ge (minus (match_dup 2) (pc))
> +(const_int BRANCH_LEN_N_1MiB))
> +(lt (minus (match_dup 2) (pc))
> +(const_int BRANCH_LEN_P_1MiB)))
> +   (const_string "no")
> +   (const_string "yes")))]
> +)
> +
> +;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
> +(define_insn "*aarch64_tbz1"
> +  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" 
> "r")
> +  (const_int 0))
> +(label_ref (match_operand 1))
> +(pc)))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "!aarch64_track_speculation"
> +  {
> +if (get_attr_length (insn) == 8)
> +  {
> + if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
> +   return aarch64_gen_far_branch (operands, 1, "Ltb",
> +  "\\t%0, , ");
> + else
> +   {
> + char buf[64];
> + uint64_t val = ((uint64_t) 1)
> + << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
> + sprintf (buf, "tst\t%%0, %" PRId64, val);
> + output_asm_insn (buf, operands);
> + return "\t%l1";
> +   }
> +  }
> +else
> +  return "\t%0, , %l1";
> +  }
> +  [(set_attr "type" "branch")
> +   (set (attr "length")
> + (if_then_else (and (ge (minus (match_dup 1) (pc))
> +(const_int BRANCH_LEN_N_32KiB))
> +(lt (minus (match_dup 1) (pc))
> +(const_int BRANCH_LEN_P_32KiB)))
> +   (const_int 4)
> +   (const_int 8)))
> +   (set (attr "far_branch")
> + (if_then_else (and (ge (minus (match_dup 1) (pc))
> +

[PATCH v3] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-25 Thread H.J. Lu

Add preserve_none attribute which is similar to no_callee_saved_registers
attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
used for integer parameter passing.  This can be used in an interpreter
to avoid saving/restoring the registers in functions which process byte
codes.  It improved the pystones benchmark by 6-7%:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15

Remove -mgeneral-regs-only restriction on no_caller_saved_registers
attribute.  Only SSE is allowed since SSE XMM register load preserves
the upper bits in YMM/ZMM register while YMM register load zeros the
upper 256 bits of ZMM register, and preserving 32 ZMM registers can
be quite expensive.

gcc/

PR target/119628
* config/i386/i386-expand.cc (ix86_expand_call): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Look up
preserve_none attribute.  Check preserve_none attribute for
interrupt attribute.  Don't check no_caller_saved_registers nor
no_callee_saved_registers conflicts here.
(ix86_set_func_type): Check no_callee_saved_registers before
checking no_caller_saved_registers attribute.
(ix86_set_current_function): Allow SSE with
no_caller_saved_registers attribute.
(ix86_handle_call_saved_registers_attribute): Check preserve_none,
no_callee_saved_registers and no_caller_saved_registers conflicts.
(ix86_gnu_attributes): Add preserve_none attribute.
* config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p):
New.
* config/i386/i386.cc
(x86_64_preserve_none_int_parameter_registers): New.
(ix86_using_red_zone): Don't use red-zone when there are no
caller-saved registers with SSE.
(ix86_type_no_callee_saved_registers_p): New.
(ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE
and call ix86_type_no_callee_saved_registers_p instead of looking
up no_callee_saved_registers attribute.
(ix86_comp_type_attributes): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.  Return 0 if preserve_none
attribute doesn't match in 64-bit mode.
(ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE,
use x86_64_preserve_none_int_parameter_registers.
(init_cumulative_args): Set preserve_none_abi.
(function_arg_64): Use x86_64_preserve_none_int_parameter_registers
with preserve_none attribute.
(setup_incoming_varargs_64): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
(ix86_save_reg): Treat TYPE_PRESERVE_NONE like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_nsaved_sseregs): Allow saving XMM registers for
no_caller_saved_registers attribute.
(ix86_compute_frame_layout): Likewise.
(x86_this_parameter): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
* config/i386/i386.h (ix86_args): Add preserve_none_abi.
(call_saved_registers_type): Add TYPE_PRESERVE_NONE.
(machine_function): Change call_saved_registers to 3 bits.
* doc/extend.texi: Add preserve_none attribute.  Update
no_caller_saved_registers attribute to remove -mgeneral-regs-only
restriction.

gcc/testsuite/

PR target/119628
* gcc.target/i386/no-callee-saved-3.c: Adjust error location.
* gcc.target/i386/no-callee-saved-19a.c: New test.
* gcc.target/i386/no-callee-saved-19b.c: Likewise.
* gcc.target/i386/no-callee-saved-19c.c: Likewise.
* gcc.target/i386/no-callee-saved-19d.c: Likewise.
* gcc.target/i386/no-callee-saved-19e.c: Likewise.
* gcc.target/i386/preserve-none-1.c: Likewise.
* gcc.target/i386/preserve-none-2.c: Likewise.
* gcc.target/i386/preserve-none-3.c: Likewise.
* gcc.target/i386/preserve-none-4.c: Likewise.
* gcc.target/i386/preserve-none-5.c: Likewise.
* gcc.target/i386/preserve-none-6.c: Likewise.
* gcc.target/i386/preserve-none-7.c: Likewise.
* gcc.target/i386/preserve-none-8.c: Likewise.
* gcc.target/i386/preserve-none-9.c: Likewise.
* gcc.target/i386/preserve-none-10.c: Likewise.
* gcc.target/i386/preserve-none-11.c: Likewise.
* gcc.target/i386/preserve-none-12.c: Likewise.
* gcc.target/i386/preserve-none-13.c: Likewise.
* gcc.target/i386/preserve-none-14.c: Likewise.
* gcc.target/i386/preserve-none-15.c: Likewise.
* gcc.target/i386/preserve-none-16.c: Likewise.
* gcc.target/i386/preserve-none-17.c: Likewise.
* gcc.target/i386/preserve-none-19.c: Likewise.
* gcc.target/i386/preserve-none-19.c: Likewise.
* gcc.target/i386/preserve-none-20.c: Likewise.
* gcc.target/i386/preserve-none-21.c: Likewise.
* gcc.target/i386/preserve-none-22.c: Likewise.
* gcc.target/i386/preserve-none-23.c: Likewise.
* gcc.target/i386/preserve-none-24.c: Likewise.
* gcc.target/i386/preserve-none-25.c: Likewise.
* gcc.target/i386/preserve-none-26.c: Likewise.
* gcc.target/i386/preserve-none-27.c: Likewise.
* gcc.target/i386/preserve-none-28.c: Likewise.
* gcc.target/i386/preserve-none-29.c: Likewise.
* gcc.target/i386/preserve-none-30a.c: Likewise.
* gcc.target/i386/preserve-none-30b.c: Likewise.

-- 
H.J.
From e8929476ee4e1499a631d569914e4f0c54881fd9 Mon

Re: [PATCH 11/17] tree-vect-stmts.cc: Remove an unused shadowed variable

2025-06-25 Thread Richard Biener




> Am 25.06.2025 um 16:26 schrieb Martin Jambor :
> 
> Hi,
> 
> when compiling tree-vect-stmts.cc with clang, it emits a warning:
> 
>  gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' 
> [-Wunused-variable]
> 
> And indeed, there are two mode_iter local variables in function
> supportable_indirect_convert_operation and the first one is not used
> at all.  This patch removes it.
> 
> Bootstrapped and tested on x86_64-linx.  OK for master?

Ok

Richard 

> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2025-06-24  Martin Jambor  
> 
>* tree-vect-stmts.cc (supportable_indirect_convert_operation):
>Remove an unused shadowed variable.
> ---
> gcc/tree-vect-stmts.cc | 1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index f699d808e68..652c590e553 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -14927,7 +14927,6 @@ supportable_indirect_convert_operation (code_helper 
> code,
>   bool found_mode = false;
>   scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_out));
>   scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_in));
> -  opt_scalar_mode mode_iter;
>   tree_code tc1, tc2, code1, code2;
> 
>   tree cvt_type = NULL_TREE;
> --
> 2.49.0
>

[PATCH] tree-optimization/120808 - SLP build with mixed .FMA/.FMS

2025-06-25 Thread Richard Biener

The following allows SLP build to succeed when mixing .FMA/.FMS
in different lanes like we handle mixed plus/minus.  This does not
yet address SLP pattern matching to not being able to form
a FMADDSUB from this.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

While the testcases are x86 specific I've kept them in vect/ with the
hope that we'd get better general dejagnu target_fma handling...

PR tree-optimization/120808
* tree-vectorizer.h (compatible_calls_p): Add flag to
indicate a FMA/FMS pair is allowed.
* tree-vect-slp.cc (compatible_calls_p): Likewise.
(vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator.
(vect_build_slp_tree_2): Handle calls in two-operator SLP build.
* tree-vect-slp-patterns.cc (compatible_complex_nodes_p):
Adjust.

* gcc.dg/vect/bb-slp-pr120808.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c | 12 +
 gcc/tree-vect-slp-patterns.cc   |  2 +-
 gcc/tree-vect-slp.cc| 52 ++---
 gcc/tree-vectorizer.h   |  2 +-
 4 files changed, 50 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c
new file mode 100644
index 000..c334d6ad8d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffp-contract=on" } */
+/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */
+
+void f(double x[restrict], double *y, double *z)
+{
+x[0] = x[0] * y[0] + z[0];
+x[1] = x[1] * y[1] - z[1];
+}
+
+/* The following should check for SLP build covering the loads.  */
+/* { dg-final { scan-tree-dump "transform load" "slp2" { target { x86_64-*-* 
i?86-*-* } } } } */
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index c0dff90d9ba..24ae203e6ff 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -786,7 +786,7 @@ compatible_complex_nodes_p (slp_compat_nodes_map_t 
*compat_cache,
   if (is_gimple_call (a_stmt))
 {
if (!compatible_calls_p (dyn_cast  (a_stmt),
-dyn_cast  (b_stmt)))
+dyn_cast  (b_stmt), false))
  return false;
 }
   else if (!is_gimple_assign (a_stmt))
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9f0cb978a5a..155da099d95 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -990,13 +990,18 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
to be combined into the same SLP group.  */
 
 bool
-compatible_calls_p (gcall *call1, gcall *call2)
+compatible_calls_p (gcall *call1, gcall *call2, bool allow_two_operators)
 {
   unsigned int nargs = gimple_call_num_args (call1);
   if (nargs != gimple_call_num_args (call2))
 return false;
 
-  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))
+  auto cfn1 = gimple_call_combined_fn (call1);
+  auto cfn2 = gimple_call_combined_fn (call2);
+  if (cfn1 != cfn2
+  && (!allow_two_operators
+ || !((cfn1 == CFN_FMA || cfn1 == CFN_FMS)
+  && (cfn2 == CFN_FMA || cfn2 == CFN_FMS
 return false;
 
   if (gimple_call_internal_p (call1))
@@ -1358,10 +1363,14 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   || rhs_code != IMAGPART_EXPR)
   /* Handle mismatches in plus/minus by computing both
  and merging the results.  */
-  && !((first_stmt_code == PLUS_EXPR
-|| first_stmt_code == MINUS_EXPR)
-   && (alt_stmt_code == PLUS_EXPR
-   || alt_stmt_code == MINUS_EXPR)
+  && !first_stmt_code == PLUS_EXPR
+  || first_stmt_code == MINUS_EXPR)
+ && (alt_stmt_code == PLUS_EXPR
+ || alt_stmt_code == MINUS_EXPR))
+|| ((first_stmt_code == CFN_FMA
+ || first_stmt_code == CFN_FMS)
+&& (alt_stmt_code == CFN_FMA
+|| alt_stmt_code == CFN_FMS)))
&& rhs_code == alt_stmt_code)
   && !(first_stmt_code.is_tree_code ()
&& rhs_code.is_tree_code ()
@@ -1410,7 +1419,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
{
  if (!is_a  (stmts[0]->stmt)
  || !compatible_calls_p (as_a  (stmts[0]->stmt),
- call_stmt))
+ call_stmt, true))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3059,24 +3068,35 @@ fail:
   SLP_TREE_CODE (node) = VEC_PERM_EXPR;

[Patch, Fortran, Coarray, PR88076, v1] 4/6 Add a shared memory multi process coarray library.

2025-06-25 Thread Andre Vehreschild

Hi all,

fix incorrect declarations in the libcaf.h header and use the correct printf
function when printing a va_list. (The latter is stripped into a separate file
by the next patch of this series.)

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From b4bdfd44ee3d1658eb67ef1a4cdf0de91b50386a Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 18 Jun 2025 09:23:32 +0200
Subject: [PATCH 4/6] Fortran: Fix signatures of coarray API and caf_single.

The teams argument to some functions was marked as unused in the header.
With upcoming caf_shmem this is incorrect, given the mark is repeated in
caf_single.

libgfortran/ChangeLog:

	* caf/libcaf.h (_gfortran_caf_failed_images): Team attribute is
	used now in some libs.
	(_gfortran_caf_image_status): Same.
	(_gfortran_caf_stopped_images): Same.
	* caf/single.c (caf_internal_error): Use correct printf function
	to handle va_list.
---
 libgfortran/caf/libcaf.h | 9 +++--
 libgfortran/caf/single.c | 2 +-
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/libgfortran/caf/libcaf.h b/libgfortran/caf/libcaf.h
index 7267bc76905..81549f9b980 100644
--- a/libgfortran/caf/libcaf.h
+++ b/libgfortran/caf/libcaf.h
@@ -175,12 +175,9 @@ void _gfortran_caf_event_post (caf_token_t, size_t, int, int *, char *, size_t);
 void _gfortran_caf_event_wait (caf_token_t, size_t, int, int *, char *, size_t);
 void _gfortran_caf_event_query (caf_token_t, size_t, int, int *, int *);
 
-void _gfortran_caf_failed_images (gfc_descriptor_t *,
-  caf_team_t * __attribute__ ((unused)), int *);
-int _gfortran_caf_image_status (int, caf_team_t * __attribute__ ((unused)));
-void _gfortran_caf_stopped_images (gfc_descriptor_t *,
-   caf_team_t * __attribute__ ((unused)),
-   int *);
+void _gfortran_caf_failed_images (gfc_descriptor_t *, caf_team_t *, int *);
+int _gfortran_caf_image_status (int, caf_team_t *);
+void _gfortran_caf_stopped_images (gfc_descriptor_t *, caf_team_t *, int *);
 
 void _gfortran_caf_random_init (bool, bool);
 
diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c
index 97876fa9d8c..a6576f28260 100644
--- a/libgfortran/caf/single.c
+++ b/libgfortran/caf/single.c
@@ -129,7 +129,7 @@ caf_internal_error (const char *msg, int *stat, char *errmsg,
   *stat = 1;
   if (errmsg_len > 0)
 	{
-	  int len = snprintf (errmsg, errmsg_len, msg, args);
+	  int len = vsnprintf (errmsg, errmsg_len, msg, args);
 	  if (len >= 0 && errmsg_len > (size_t) len)
 	memset (&errmsg[len], ' ', errmsg_len - len);
 	}
-- 
2.49.0

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-06-25 Thread Bill Wendling

I posted this on the LLVM Discourse forum[1] and got some traction, so
I want to get the GCC community's input. (My initial proposal is
replicated here.)

I had already mentioned this in previous emails in this thread, so
it's nothing super new, and there have been some suggested
improvements already. Parts of this reference a meeting that took
place between the LLVM developers and some non-LLVM developers. The
meeting mostly explained the issues regarding the "compromise" from
this thread and how it interacts (poorly) with C++, and vice versa.

There was a lengthy discussion after this proposal.

Please take a look and let me know what you think.

-bw

[1] 
https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/32?u=void

--

I’ve been putting off pushing this proposal, because it is a departure
from what Apple has done and added a lot of extra syntax for this
feature, but I think it’s appropriate right now.

The main issue at play is that C and C++ are two very different
languages. The scoping rules are completely different making name
resolution not work in one language without jumping through
non-obvious hoops. This was made clear in @rapidsna’s presentation
last week. Making matters worse is that GCC (and other) compilers
perform one pass parsing for C, making forward declarations necessary.
The forward declarations, while solving many issues, have their own
issues. Other solutions at play require changes to the base languages,
which require approval by the standards committee.

Even if the full struct was declared before the expression in the
attribute was defined, there would still be issues, due to one example
from @rapidsna’s presentation [as pointed out by Joseph Jelinek]:

typedef int T;
struct foo {
  int T;
  int U;
  int * __counted_by_expr(int T; (T)+U) buf; // Addition or cast?
};

Given this, I want to propose using functions / static methods for expressions.

The function takes one and only one argument: a "this" pointer to the
least enclosing non-anonymous struct.

The call to the function is generated by the compiler, so no argument
the attribute only needs to indicate the function’s name. This avoids
the need to add a new __builtin_* or __self element to C.

* The function needs to be declared before use in C. (It can be fully
defined if no fields within the struct are used.)
* The function should be static and marked as pure (and maybe always_inline).
* The function in C++ should be private or protected.

C example:

static size_t calc_counted_by(void *);
struct foo {
  /* ... */
  struct bar {
int * __counted_by_expr(calc_counted_by) buf;
int count;
int scale;
  };
};

enum { OFFSET = 42 };

// The function could be marked with the 'pure' attribute.
static size_t __pure calc_counted_by(void *p) {
  struct bar *ptr = (struct foo *)p;
  return ptr->count * ptr->scale - OFFSET;
}

C++ example:

struct foo {
  enum { OFFSET = 42 };
  struct bar {
int * __counted_by_expr(calc_counted_by) buf;
  private:
static size_t __pure calc_counted_by(struct bar *ptr) {
  return ptr->count * ptr->scale - OFFSET;
}
  public:
int count;
int scale;
  };
};

Pros

1. This uses the current language without any modifications to scoping
or requiring feature additions that need to be approved by the
standards committee. All compilers should be able to implement them
without major modifications.
2. Name lookup is no longer a problem, so there isn’t a need for
forward declarations or trying to determine which scope to use in
various circumstances.
3. In the general case where the full struct is pass into the
calculating function, both C and C++ parse the code in the same way.
In the C example above, it would need to be modified to this:

static size_t __pure calc_counted_by(void *p) {
#ifdef __cplusplus
  foo::bar *ptr = static_cast(p);
#else
  struct bar *ptr = (struct bar *)p;
#endif
  return ptr->count * ptr->scale - OFFSET;
}

This format can be extended to other languages if need be.

Cons

1 It’s wordy, which may make it unappealing to users.
2 The #ifdef __cplusplus ... #endif usage above is wordy and a bit awkward.
3 Importantly, it’s harder for Apple’s bounds safety work to analyze
the fields used within the expression.
4. Apple and their users already use the current syntax.

For (1), that’s an unfortunate outcome of this feature. There may be
ways to reduce the amount of code that needs to be written, but the
above is a good starting place.

[Note: Kees came up with a way to avoid the forward declaration of the
function---have the compiler generate the forward declaration with a
set declaration syntax: e.g. static __pure size_t
size_calculation(struct foo *);]

For (2), the rule about using the least enclosing non-anonymous struct
could be loosened and the whole struct passed in. The user has full
control over which fields to use.

For (3), it’s harder to get the expression because it

Re: [PATCH 1/4] c++: Add flag to detect underlying representative of bitfield decls

2025-06-25 Thread Jason Merrill


On 5/21/25 10:14 PM, Nathaniel Shead wrote:

This patch isn't currently necessary with how I've currently done the
follow-up patches, but is needed for avoiding any potential issues in
the future with DECL_CONTEXT'ful types getting created in the compiler
with no names on the fields.  (For instance, this change would make much
of r15-7342-gd3627c78be116e unnecessary.)

It does take up another flag though in the frontend though.  Another
possible approach would be to instead do a walk through all the fields
first to see if this is the target of a DECL_BIT_FIELD_REPRESENTATIVE;
thoughts?  Or would you prefer to skip this patch entirely?


It seems like the only way to reach such a FIELD_DECL is through 
DECL_BIT_FIELD_REPRESENTATIVE, so we ought to be able to use that 
without adding another walk?



-- >8 --

Modules streaming needs to handle these differently from other unnamed
FIELD_DECLs that are streamed for internal RECORD_DECLs, and there
doesn't seem to be a good way to detect this case otherwise.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind): Use new flag.

gcc/ChangeLog:

* stor-layout.cc (start_bitfield_representative): Mark with
DECL_BIT_FIELD_UNDERLYING_REPR_P.
* tree-core.h (struct tree_decl_common): Add comment.
* tree.h (DECL_BIT_FIELD_UNDERLYING_REPR_P): New accessor.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 4 +---
  gcc/stor-layout.cc | 1 +
  gcc/tree-core.h| 1 +
  gcc/tree.h | 5 +
  4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 13f8770b7bd..99cbfdbf01d 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11131,9 +11131,7 @@ trees_out::get_merge_kind (tree decl, depset *dep)
  return MK_named;
}
  
-	  if (!DECL_NAME (decl)

- && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
- && !DECL_BIT_FIELD_REPRESENTATIVE (decl))
+ if (!DECL_NAME (decl) && DECL_BIT_FIELD_UNDERLYING_REPR_P (decl))
{
  /* The underlying storage unit for a bitfield.  We do not
 need to dedup it, because it's only reachable through
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 12071c96ca7..1f37a130e24 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -2067,6 +2067,7 @@ static tree
  start_bitfield_representative (tree field)
  {
tree repr = make_node (FIELD_DECL);
+  DECL_BIT_FIELD_UNDERLYING_REPR_P (repr) = 1;
DECL_FIELD_OFFSET (repr) = DECL_FIELD_OFFSET (field);
/* Force the representative to begin at a BITS_PER_UNIT aligned
   boundary - C++ may use tail-padding of a base object to
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index bd19c99d326..2e773d7bf83 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1911,6 +1911,7 @@ struct GTY(()) tree_decl_common {
unsigned decl_read_flag : 1;
/* In a VAR_DECL or RESULT_DECL, this is DECL_NONSHAREABLE.  */
/* In a PARM_DECL, this is DECL_HIDDEN_STRING_LENGTH.  */
+  /* In a FIELD_DECL, this is DECL_BIT_FIELD_UNDERLYING_REPR_P.  */
unsigned decl_nonshareable_flag : 1;
  
/* DECL_OFFSET_ALIGN, used only for FIELD_DECLs.  */

diff --git a/gcc/tree.h b/gcc/tree.h
index 99f26177628..0d876234824 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3085,6 +3085,11 @@ extern void decl_value_expr_insert (tree, tree);
  #define DECL_BIT_FIELD_REPRESENTATIVE(NODE) \
(FIELD_DECL_CHECK (NODE)->field_decl.qualifier)
  
+/* In a FIELD_DECL of a RECORD_TYPE, this indicates whether the field

+   is used as the underlying storage unit for a bitfield.  */
+#define DECL_BIT_FIELD_UNDERLYING_REPR_P(NODE) \
+  (FIELD_DECL_CHECK (NODE)->decl_common.decl_nonshareable_flag)
+
  /* For a FIELD_DECL in a QUAL_UNION_TYPE, records the expression, which
 if nonzero, indicates that the field occupies the type.  */
  #define DECL_QUALIFIER(NODE) (FIELD_DECL_CHECK (NODE)->field_decl.qualifier)

Re: [PATCH] c++: fix ICE with [[deprecated]] [PR120756]

2025-06-25 Thread Jason Merrill


On 6/25/25 1:28 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches?


OK.


-- >8 --
Here we end up with "error reporting routines re-entered" because
resolve_nondeduced_context isn't passing complain to mark_used.

PR c++/120756

gcc/cp/ChangeLog:

* pt.cc (resolve_nondeduced_context): Pass complain to mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/warn/deprecated-22.C: New test.
---
  gcc/cp/pt.cc  |  2 +-
  gcc/testsuite/g++.dg/warn/deprecated-22.C | 13 +
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/deprecated-22.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index deb0106b158..18ad2d07c4f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24604,7 +24604,7 @@ resolve_nondeduced_context (tree orig_expr, 
tsubst_flags_t complain)
}
if (good == 1)
{
- mark_used (goodfn);
+ mark_used (goodfn, complain);
  expr = goodfn;
  if (baselink)
expr = build_baselink (BASELINK_BINFO (baselink),
diff --git a/gcc/testsuite/g++.dg/warn/deprecated-22.C 
b/gcc/testsuite/g++.dg/warn/deprecated-22.C
new file mode 100644
index 000..60ee607f717
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/deprecated-22.C
@@ -0,0 +1,13 @@
+// PR c++/120756
+// { dg-do compile { target c++11 } }
+
+struct A {
+template  [[deprecated]] void foo ();
+};
+
+template  [[deprecated]] auto bar () -> decltype (&A::foo);
+
+void foo ()
+{
+  bar<0> ();  // { dg-warning "deprecated" }
+}

base-commit: 5aca8510abea6c3fac3336a7445863db07fd4a06

Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-25 Thread Richard Sandiford

Christoph Müllner  writes:
> insn_info::has_been_deleted () is documented to return true if an
> instruction is deleted.  Such instructions have their `volatile` bit set,
> which can be tested via rtx_insn::deleted ().
>
> The current condition for insn_info::has_been_deleted () is:
> * m_rtl is not NULL: this can't happen as no member of insn_info
>   changes this pointer.

Yeah, it's invariant after creation, but it starts off null for some
artificial instructions:

  // Return the underlying RTL insn.  This instruction is null if is_phi ()
  // or is_bb_end () are true.  The instruction is a basic block note if
  // is_bb_head () is true.
  rtx_insn *rtl () const { return m_rtl; }

So I think we should keep the null check.  (But then is_call and is_jump
should check m_rtl is nonnull too -- that's preapproved if you want to
do it while you're here.)

> * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and
>   does not test the `volatile` bit.

Because of the need to stage multiple simultaneous changes, rtl-ssa first
uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note,
then uses remove_insn to remove the underlying instruction.  It doesn't
use delete_insn directly.  The call to remove_insn is fairly recent;
the original code just used set_insn_deleted, but not removing the notes
caused trouble for later passes.

The test was therefore supposed to be checking whether set_insn_deleted
had been called.  It should also have checked the note kind though.

However, I agree that testing the deleted flag would be better.
For that to work, we'd need to set the deleted flag here:

  if (rtx_insn *rtl = insn->rtl ())
::remove_insn (rtl); // Remove the underlying RTL insn.

as well as calling remove_insn.  Alternatively (and better), we could
try converting ::remove_insn to ::delete_insn.

Thanks,
Richard

> This patch drops these conditions and calls m_rtl->deleted () instead.
>
> The impact of this change is minimal as insn_info::has_been_deleted
> is only called in insn_info::print_full.
>
> Bootstrapped and regtested x86_64-linux.
>
> gcc/ChangeLog:
>
>   * rtl-ssa/insns.h: Fix implementation of has_been_deleted ().
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/rtl-ssa/insns.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h
> index d89dfc5c3f66..bb3f52efa83a 100644
> --- a/gcc/rtl-ssa/insns.h
> +++ b/gcc/rtl-ssa/insns.h
> @@ -186,7 +186,7 @@ public:
>// Return true if the instruction was a real instruction but has now
>// been deleted.  In this case the instruction is no longer part of
>// the SSA information.
> -  bool has_been_deleted () const { return m_rtl && !INSN_P (m_rtl); }
> +  bool has_been_deleted () const { return m_rtl->deleted (); }
>  
>// Return true if the instruction is a debug instruction (and thus
>// also a real instruction).

[PATCH v7 0/3] extend "counted_by" attribute to pointer fields of structures

2025-06-25 Thread Qing Zhao

This is the 7th version of the patch set to extend "counted_by" attribute
 to pointer fields of structures.

The C FE parts (patch #1 and #3) of the 5th version have been approved
by Joseph already (with a minor typo fix, which is included in this new 
version);

The middle end part (patch #2) of the 6th version was reviewed by Sid and 
Richard,
Sid raised several format issues in testing cases, and Richard raised one issue
in tree-object-size.cc. 

In this 7th version, I fixed all the format issues in testing cases and also 
the one
issue in tree-object-size.cc raised by Richard. 

The whole patch set has been bootstrapped and regression tested on both aarch64 
and x86.

Okay for trunk?
 
Thanks a lot.

Qing

Re: [PATCH v2 1/4] RISC-V: Add support for xtheadvector unit-stride segment load/store intrinsics

2025-06-25 Thread yunzezhu


> From:Kito Cheng 
> Send Time:2025 Jun. 19 (Thu.) 15:08
> To:yunzezhu; Jeff Law
> CC:"gcc-patches"
> Subject:Re: [PATCH v2 1/4] RISC-V: Add support for xtheadvector unit-stride 
> segment load/store intrinsics
> 
> Hi YunZe:
> 
> Generally I am open minded to accept vendor extensions, however this
> patch set really introduces too much pattern...
> 
> - NUM_INSN_CODES (defined in insn-codes.h) become 83625 from 48573.  (+72%)
> - Total line of insn-emit-*.cc becomes 1749362 from 1055750. (+65%)
> - Total line of insn-recog-*.cc becomes 1018407 from 670185 (+51%)
> 
> Also I believe that may also increase a lot of build time on native
> RISC-V environment, (I didn't measure that yet, but most generated
> insn-*.cc files grow a lot).
> 
> So sorry, I have to say no this time.

Hi Kito:

Thanks for reviewing and apologies for disturbing your work.
I'm so sorry I made some mistakes that generates
large amount of unnecessary patterns.
I removed these unnecessary patterns, and modified
insn patterns to make them requiring less patterns in v3 patches.
This shall reduce patterns generated in insn-codes.h and insn-*.cc files.
I tested v3 patches locally and here is data comparing to
origin gcc:

- NUM_INSN_CODES become 51547 from 48573.  (+5.83%)
- Total line of insn-emit-*.cc becomes 1113703 from 1055750. (+4.98%)
- Total line of insn-recog-*.cc becomes 700017 from 670185 (+3.82%)

I hope these patches satisfies requirments now. Thanks!

Best regards,
Yunze Zhu

Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-25 Thread Jan Hubicka

> Here is the v3 patch.  It no longer uses "rep mov/stos".   Lili, can you 
> measure
> its performance impact on Intel and AMD cpus?
> 
> The updated generic has
> 
> Update memcpy and memset inline strategies for -mtune=generic:
> 
> 1. Don't align memory.
This looks OK to me (recent microarchs seems to be good on handling
misaligned accesses in most cases, though we always risk partial memory
stalls then).
> 2. For known sizes, unroll loop with 4 moves or stores per iteration
>without aligning the loop, up to 256 bytes.

Main reason why limit was bigger is situation where we know the expected
size of the block copied from profile feedback or we have small upper
bound.  Calling mempcy means spilling all SSE registers to stack and
increasing integer regsiter pressure, too, which may be counter
productive and I do not think it is caught by the benchmarking done

I hacked the following micro-benchmark
#include 
#include 
#include 
int width = 1024, height = 1024;
char *buffer1;
char *buffer2;

__attribute__ ((noipa))
void
copy_triangle (int width, int from, int to, int start, float slope1, float 
slope2)
{
for (int i = from; i < to; i++)
{
  memcpy (buffer1 + start + (int)((i-from) * slope1) + i * width,
  buffer2 + start + (int)((i-from) * slope1) + i * width,
  (int)((i-from)*(slope2-slope1)));
}
}
int
main()
{
buffer1 = malloc (width *height);
buffer2 = malloc (width *height);
for (int i = 0; i < 10; i++)
copy_triangle (width, 0, 255, 0, 0, 1);
}

which copies triangle of given size from buffer1 to buffer2.  With
profile feedback we know the expected size of block and use the table to
inline memcpy.  It has two read-only values in xmm registers which needs
to be reloaded from memory if libgcall is used. For two values it seems
that for triangles of size 255 it is already win to use memcpy, for
smaller ones it is better to use inline sequence (can be tested by
copiling -O2 wrt -O2 -minline-all-stringops).

Of course one can modify the benchmark to use more XMM registers and the
tradeoffs will change, but it is hard to guess regiser pessure at the
expansion time  Situation is also likely different for kernel due to
mitigations making memcpy call expensive.

I wonder what kind of benefits you see for going from 8k to 256 bytes
here?  I also wonder if inline sequences can be iproved though.  It
seems that the offline memcpy for blocks >128 already benefits from
doing vector moves...
> 3. For unknown sizes, use memcpy/memset.
> 4. Since each loop iteration has 4 stores and 8 stores for zeroing with
>unroll loop may be needed, change CLEAR_RATIO to 10 so that zeroing
>up to 72 bytes are fully unrolled with 9 stores without SSE.

I guess it is OK.  I sitll think we ought to sovle the code bloat due to
repreated 4-byte $0 immediate, but hope we can do that incrementally.
> 
> Use move_by_pieces and store_by_pieces for memcpy and memset epilogues
> with the fixed epilogue size to enable overlapping moves and stores.
> 
> gcc/
> 
> PR target/102294
> PR target/119596
> PR target/119703
> PR target/119704
> * builtins.cc (builtin_memset_gen_str): Make it global.
> * builtins.h (builtin_memset_gen_str): New.
> * config/i386/i386-expand.cc (expand_cpymem_epilogue): Use
> move_by_pieces.
> (expand_setmem_epilogue): Use store_by_pieces.
> (ix86_expand_set_or_cpymem): Pass val_exp, instead of
> vec_promoted_val, to expand_setmem_epilogue.
> * config/i386/x86-tune-costs.h (generic_memcpy): Updated.
> (generic_memset): Likewise.
> (generic_cost): Change CLEAR_RATIO to 10.
> 
> gcc/testsuite/
> 
> PR target/102294
> PR target/119596
> PR target/119703
> PR target/119704
> * gcc.target/i386/auto-init-padding-3.c: Expect XMM stores.
> * gcc.target/i386/auto-init-padding-9.c: Expect loop.
> * gcc.target/i386/memcpy-strategy-12.c: New test.
> * gcc.target/i386/memcpy-strategy-13.c: Likewise.
> * gcc.target/i386/memset-strategy-25.c: Likewise.
> * gcc.target/i386/memset-strategy-26.c: Likewise.
> * gcc.target/i386/memset-strategy-27.c: Likewise.
> * gcc.target/i386/memset-strategy-28.c: Likewise.
> * gcc.target/i386/memset-strategy-29.c: Likewise.
> * gcc.target/i386/memset-strategy-30.c: Likewise.
> * gcc.target/i386/memset-strategy-31.c: Likewise.
> * gcc.target/i386/mvc17.c: Fail with "rep mov"
> * gcc.target/i386/pr111657-1.c: Scan for unrolled loop.  Fail
> with "rep mov".
> * gcc.target/i386/shrink_wrap_1.c: Also pass
> -mmemset-strategy=rep_8byte:-1:align.
> * gcc.target/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte.
> 
> 
> -- 
> H.J.

> From bcd7245314d3ba4eb55e9ea2bc0b7d165834f5b6 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Thu, 18 Mar 2021 18:43:10 -0700
> Subject: [PATCH v3] x86: Update memcpy/memset inline strategies for
>  -mtune=generic
> 
> Update memcpy and memset inline strategies for -mtune=generic:
> 
> 1. Don't align memory.
> 2. For known sizes, unroll loop with 4 moves

[Patch, Fortran, Coarray, PR88076, v1] 2/6 Add a shared memory multi process coarray library.

2025-06-25 Thread Andre Vehreschild

Hi all,

this patch fixes handling of optional arguments to coarray routines. Again I
stumbled over this while implementing caf_shmem. I did not find a ticket either.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 0b2f1d072d2131e341628648df20ebedefb5c5d1 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 18 Jun 2025 09:21:16 +0200
Subject: [PATCH 2/6] Fortran: Small fixes of coarray routines handling and
 code gen.

gcc/fortran/ChangeLog:

	* check.cc (gfc_check_image_status): Fix argument index of team=
	argument for correct error message.
	* trans-intrinsic.cc (conv_intrinsic_image_status): Team=
	argument is optional and is a pointer to the team handle.
	* trans-stmt.cc (gfc_trans_sync): Make images argument also a
	dereferencable pointer.  But treat errmsg as a pointer to a
	char array like in all other functions.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray_sync_memory.f90: Adapt grep pattern for
	msg being only &msg.
---
 gcc/fortran/check.cc  | 2 +-
 gcc/fortran/trans-intrinsic.cc| 6 +-
 gcc/fortran/trans-stmt.cc | 7 +--
 gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 | 4 ++--
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index a4040cae53a..3446c88b501 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -1835,7 +1835,7 @@ gfc_check_image_status (gfc_expr *image, gfc_expr *team)
   || !positive_check (0, image))
 return false;
 
-  return !team || (scalar_check (team, 0) && team_type_check (team, 0));
+  return !team || (scalar_check (team, 1) && team_type_check (team, 1));
 }
 
 
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index fce5ee28de8..03007f1d244 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -2073,9 +2073,13 @@ conv_intrinsic_image_status (gfc_se *se, gfc_expr *expr)
 	GFC_STAT_STOPPED_IMAGE));
 }
   else if (flag_coarray == GFC_FCOARRAY_LIB)
+/* The team is optional and therefore needs to be a pointer to the opaque
+   pointer.  */
 tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_image_status, 2,
 			   args[0],
-			   num_args < 2 ? null_pointer_node : args[1]);
+			   num_args < 2
+ ? null_pointer_node
+ : gfc_build_addr_expr (NULL_TREE, args[1]));
   else
 gcc_unreachable ();
 
diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index 487b7687ef1..be6f69c0d1f 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -1292,7 +1292,8 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 {
   gfc_init_se (&argse, NULL);
   gfc_conv_expr_val (&argse, code->expr1);
-  images = argse.expr;
+  images = gfc_trans_force_lval (&argse.pre, argse.expr);
+  gfc_add_block_to_block (&se.pre, &argse.pre);
 }
 
   if (code->expr2)
@@ -1302,6 +1303,7 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
   gfc_init_se (&argse, NULL);
   gfc_conv_expr_val (&argse, code->expr2);
   stat = argse.expr;
+  gfc_add_block_to_block (&se.pre, &argse.pre);
 }
   else
 stat = null_pointer_node;
@@ -1314,8 +1316,9 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
   argse.want_pointer = 1;
   gfc_conv_expr (&argse, code->expr3);
   gfc_conv_string_parameter (&argse);
-  errmsg = gfc_build_addr_expr (NULL, argse.expr);
+  errmsg = argse.expr;
   errmsglen = fold_convert (size_type_node, argse.string_length);
+  gfc_add_block_to_block (&se.pre, &argse.pre);
 }
   else if (flag_coarray == GFC_FCOARRAY_LIB)
 {
diff --git a/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 b/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90
index c4e660b8cf7..0030d91257d 100644
--- a/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90
@@ -14,5 +14,5 @@ end
 
 ! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, 0B, 0\\);" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, 0B, 0\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, &&msg, 42\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, &&msg, 42\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, &msg, 42\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, &msg, 42\\);" 1 "original" } }
-- 
2.49.0

Re: Do not drop discriminator when inlining

2025-06-25 Thread Jan Hubicka

> > What seems to be common now is profile breakage around loops that has
> > been fully unrolled or vectorized which is bit undderstandbale thought I
> > wonder if we can improve here.  I think we can fix problem where profile
> > of loop header stmts is partly or fully lost (which seems to be main
> > issue now that prevents loop optimization since then loop headers looks
> > cold).  I suppose this can be fixed by making sure the debug statement
> > is duplicated into the loop variants.
> 
> There's Alex's series as well waiting on review which fixes profile
> information with early-exit (PR117790):
> https://inbox.sourceware.org/gcc-patches/adctfxjzqewre...@arm.com/

I know of it and I was replying to the question about the inconsistent
profile handling this week too. I do apologize for taking so long - I
tought this was already approved, but it got stuck on that special case.
Alex, is there something else I should look into?  I over-planned last
semester but should be more in regular scheduel again.
Profile updating patches are really welcome.

It is a bit of an independent issue.  Alex's profile updating solves "forward"
problem: you know profile before vectorization and you need to turn it
into a profile after vectorization.

Auto-profile is working in a reverse direction.  We have sampled
executoun counts counts of individual (real or debug) statements after
optimization done to the train run. Now we need to produce CFG profile
for the feedback build for CFG is not optimized yet.
This is kind of fun problem by itself and can be useful to detect
situaitons where we forget to update debug statements correctly.

Honza
> 
> sam

[PATCH] ivopts: Change constant_multiple_of to expand aff nodes.

2025-06-25 Thread Alfie Richards

Hi all,

This is a small change to ivopts to expand SSA variables enabling ivopts to
correctly work out when an address IV step is set to be a multiple on index
step in the loop header (ie, not constant, not calculated each loop.)

Seems like this might have compile speed costs that need to be considered, but
I believe should be worth it.

This is also required for some upcoming work for vectorization of VLA loops with
iteration data dependencies.

Bootstrapped and reg tested on aarch64-linux-gnu and x86_64-unknown-linux-gnu.

Thanks,
Alfie

-- >8 --

This changes the calls to tree_to_aff_combination in constant_multiple_of to
tree_to_aff_combination_expand along with associated plumbing of ivopts_data
and required cache.

This improves cases such as:

```c
void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) {
for (unsigned long i = 0; i < end; i += step) {
svst1(pg, p1, svld1_s32(pg, p2));
p1 += step;
p2 += step;
}
}
```

Where ivopts previously didn't expand the SSA variables for the step increements
and so lacked the ability to group all the IV's and ended up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1wz31.s, p0/z, [x1]
add x4, x4, x2
st1wz31.s, p0, [x0]
add x1, x1, x2, lsl 2
add x0, x0, x2, lsl 2
cmp x3, x4
bhi .L3
.L1:
ret
```

After this change we end up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1wz31.s, p0/z, [x1, x4, lsl 2]
st1wz31.s, p0, [x0, x4, lsl 2]
add x4, x4, x2
cmp x3, x4
bhi .L3
.L1:
ret
```

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (constant_multiple_of): Change
tree_to_aff_combination to tree_to_aff_combination_expand and add
parameter to take ivopts_data.
(get_computation_aff_1): Change parameters and calls to include
ivopts_data.
(get_computation_aff): Ditto.
(get_computation_at) Ditto.:
(get_debug_computation_at) Ditto.:
(get_computation_cost) Ditto.:
(rewrite_use_nonlinear_expr) Ditto.:
(rewrite_use_address) Ditto.:
(rewrite_use_compare) Ditto.:
(remove_unused_ivs) Ditto.:

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/adr_7.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/sve/adr_7.c | 19 ++
 gcc/tree-ssa-loop-ivopts.cc  | 65 +++-
 2 files changed, 54 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/adr_7.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c 
b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c
new file mode 100644
index 000..61e23bbf182
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ftree-vectorize" } */
+
+#include 
+
+void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) {
+for (unsigned long i = 0; i < end; i += step) {
+svst1(pg, p1, svld1_s32(pg, p2));
+p1 += step;
+p2 += step;
+}
+}
+
+/* { dg-final { scan-assembler-not {\tld1w\tz[0-9]+\.d, 
p[0-9]+/z\[x[0-9]+\.d\]} } } */
+/* { dg-final { scan-assembler-not {\tst1w\tz[0-9]+\.d, 
p[0-9]+/z\[x[0-9]+\.d\]} } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-9]+/z, 
\[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-9]+, \[x[0-9]+, 
x[0-9]+, lsl 2\]} 1 } } */
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 8a6726f1988..544a946ff89 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2117,11 +2117,15 @@ idx_record_use (tree base, tree *idx,
signedness of TOP and BOT.  */
 
 static bool
-constant_multiple_of (tree top, tree bot, widest_int *mul)
+constant_multiple_of (tree top, tree bot, widest_int *mul,
+ struct ivopts_data *data)
 {
   aff_tree aff_top, aff_bot;
-  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
-  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  tree_to_aff_combination_expand (top, TREE_TYPE (top), &aff_top,
+ &data->name_expansion_cache);
+  tree_to_aff_combination_expand (bot, TREE_TYPE (bot), &aff_bot,
+ &data->name_expansion_cache);
+
   poly_widest_int poly_mul;
   if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
   && poly_mul.is_constant (mul))
@@ -3945,13 +3949,14 @@ determine_common_wider_type (tree *a, tree *b)
 }
 
 /* Determines the expression by that USE is expressed from induction variable
-   CAND at statement AT in LOOP.  The expression is stored in two parts in a
-   decomposed form.  The invariant part is stored in AFF_INV; while variant
-

[PATCH 04/17] ranger: Mark several member functions as final override

2025-06-25 Thread Martin Jambor

Hi,

When GCC is built with clang, it emits warnings that several member
functions of various ranger classes override a virtual function of an
ancestor but are not marked with the override keyword.  After
inspecting the cases, I found that all these classes had other member
functions marked as final override, so I added the final keyword
everywhere too.

In some cases other such overrides were not explicitly marked as
virtual, which made formatting easier.  For that reason and also for
consistency, in such cases I removed the virtual keyword from the
functions I marked as final override too.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warnings instead.

Thanks,

Martin


gcc/ChangeLog:

2025-06-24  Martin Jambor  

* range-op-mixed.h (class operator_plus): Mark member function
overflow_free_p as final override.
(class operator_minus): Likewise.
(class operator_mult): Likewise.
* range-op-ptr.cc (class pointer_plus_operator): Mark member
function lhs_op1_relation as final override.
* range-op.cc (class operator_div::): Mark member functions
op2_range and update_bitmask as final override.
(class operator_logical_and): Mark member functions fold_range,
op1_range and op2_range as final override.  Remove unnecessary
virtual.
(class operator_logical_or): Likewise.
(class operator_logical_not): Mark member functions fold_range and
op1_range as final override.  Remove unnecessary virtual.
formatting easier.
(class operator_absu): Mark member functions wi_fold as final
override.
---
 gcc/range-op-mixed.h | 12 
 gcc/range-op-ptr.cc  |  2 +-
 gcc/range-op.cc  | 72 +++-
 3 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index f8f18306904..567b0cdd31b 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -558,8 +558,8 @@ public:
   void update_bitmask (irange &r, const irange &lh,
   const irange &rh) const final override;
 
-  virtual bool overflow_free_p (const irange &lh, const irange &rh,
-   relation_trio = TRIO_VARYING) const;
+  bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const final override;
   // Check compatibility of all operands.
   bool operand_check_p (tree t1, tree t2, tree t3) const final override
 { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); }
@@ -634,8 +634,8 @@ public:
   void update_bitmask (irange &r, const irange &lh,
   const irange &rh) const final override;
 
-  virtual bool overflow_free_p (const irange &lh, const irange &rh,
-   relation_trio = TRIO_VARYING) const;
+  bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const final override;
   // Check compatibility of all operands.
   bool operand_check_p (tree t1, tree t2, tree t3) const final override
 { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); }
@@ -720,8 +720,8 @@ public:
const REAL_VALUE_TYPE &lh_lb, const REAL_VALUE_TYPE &lh_ub,
const REAL_VALUE_TYPE &rh_lb, const REAL_VALUE_TYPE &rh_ub,
relation_kind kind) const final override;
-  virtual bool overflow_free_p (const irange &lh, const irange &rh,
-   relation_trio = TRIO_VARYING) const;
+  bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const final override;
   // Check compatibility of all operands.
   bool operand_check_p (tree t1, tree t2, tree t3) const final override
 { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); }
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 6aadc9cf2c9..e0e21ad1b2a 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -315,7 +315,7 @@ public:
   virtual relation_kind lhs_op1_relation (const prange &lhs,
  const prange &op1,
  const irange &op2,
- relation_kind) const;
+ relation_kind) const final override;
   void update_bitmask (prange &r, const prange &lh, const irange &rh) const
 { update_known_bitmask (r, POINTER_PLUS_EXPR, lh, rh); }
 } op_pointer_plus;
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 0a3f0b6b56c..1f91066a44e 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2455,7 +2455,7 @@ class operator_div : public cross_product_operator
 public:
   operator_div (tree_code div_kind) { m_code = div_kind; }
   bo

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-25 Thread Jerry D


On 6/24/25 11:49 PM, Andre Vehreschild wrote:

Hi Jerry,

thank you very much. Just try it. I can only imagine that Paul had a somehow
corrupted build directory or left overs from some previous build. I am still
wondering, that I got no automated mail from the build hosts, but I can
imagine, that they get issues with a series of patches, that build upon each
other.

Just try it. The more feedback, the better.

Regards,
Andre

On Tue, 24 Jun 2025 11:07:23 -0700
Jerry D  wrote:


On 6/24/25 6:09 AM, Andre Vehreschild wrote:

Hi all,

this series of patches (six in total) adds a new coarray backend library to
libgfortran.  The library uses shared memory and processes to implement
running multiple images on the same node.  The work is based on work
started by Thomas and Nicolas Koenig. No changes to the gfortran compile
part are required for this.


--- snip ---

Hi Andre,

Thank you for this work. I have been wanting this functionality for
several years!

I will begin reviewing as best I can.  I did see Paul's initial comment
so your feedback on that would be appreciated.

Best regards,

Jerry





I was able to apply the patches without any issues.  I did see some 
trailing white space in a few places.


In running the testsuite the test lock_1.f90 test fails, unable to link 
to the new library.


After some brief investigation, it appears the the 64-bit version of the 
new library is not created or installed.  I did find the 32-bit version.


So something not right in the make mechanisms.

Looking ahead a bit I was wondering if one could enable co-array if 
co-array syntax is seen at the parsing phase of the compiler, if no 
--fcoarray= has been seen, default it to 'single' and issue a NOTE to 
the user "-fcoarray=single enabled, use -fcoarray=[none, shmem, lib] to 
override"


Regards,

Jerry

[PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

2025-06-25 Thread Jakub Jelinek

Hi!

The following patch implements the P3618R0 paper by tweaking pedwarn
condition, adjusting pedwarn wording, adjusting one testcase and adding 4
new ones.  The paper was voted in as DR, so it isn't guarded on C++ version.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-06-24  Jakub Jelinek  

PR c++/120773
* decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
main to the global module.  Only pedwarn for current_lang_name
other than lang_name_cplusplus and adjust pedwarn wording.

* g++.dg/parse/linkage5.C: Don't expect error on
extern "C++" int main ();.
* g++.dg/parse/linkage7.C: New test.
* g++.dg/parse/linkage8.C: New test.
* g++.dg/modules/main-2.C: New test.
* g++.dg/modules/main-3.C: New test.

--- gcc/cp/decl.cc.jj   2025-06-19 08:55:04.408676724 +0200
+++ gcc/cp/decl.cc  2025-06-23 17:47:13.942011687 +0200
@@ -11326,9 +11326,9 @@ grokfndecl (tree ctype,
  "cannot declare %<::main%> to be %qs", "consteval");
   if (!publicp)
error_at (location, "cannot declare %<::main%> to be static");
-  if (current_lang_depth () != 0)
+  if (current_lang_name != lang_name_cplusplus)
pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a"
-" linkage specification");
+" linkage specification other than %<\"C++\"%>");
   if (module_attach_p ())
error_at (location, "cannot attach %<::main%> to a named module");
   inlinep = 0;
--- gcc/testsuite/g++.dg/parse/linkage5.C.jj2024-05-22 09:11:46.979234663 
+0200
+++ gcc/testsuite/g++.dg/parse/linkage5.C   2025-06-23 18:00:38.067742494 
+0200
@@ -1,5 +1,6 @@
 // { dg-do compile }
-// The main function shall not be declared with a linkage-specification.
+// The main function shall not be declared with a linkage-specification
+// other than "C++".
 
 extern "C" {
   int main();  // { dg-error "linkage" }
@@ -9,6 +10,6 @@ namespace foo {
   extern "C" int main();  // { dg-error "linkage" }
 }
 
-extern "C++" int main(); // { dg-error "linkage" }
+extern "C++" int main();
 
 extern "C" struct S { int main(); };  // OK
--- gcc/testsuite/g++.dg/parse/linkage7.C.jj2025-06-23 18:01:17.622237056 
+0200
+++ gcc/testsuite/g++.dg/parse/linkage7.C   2025-06-23 18:01:32.385048426 
+0200
@@ -0,0 +1,7 @@
+// { dg-do compile }
+// The main function shall not be declared with a linkage-specification
+// other than "C++".
+
+extern "C++" {
+  int main();
+}
--- gcc/testsuite/g++.dg/parse/linkage8.C.jj2025-06-23 18:01:39.830953283 
+0200
+++ gcc/testsuite/g++.dg/parse/linkage8.C   2025-06-23 18:01:57.657725492 
+0200
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// The main function shall not be declared with a linkage-specification
+// other than "C++".
+
+extern "C" int main(); // { dg-error "linkage" }
--- gcc/testsuite/g++.dg/modules/main-2.C.jj2025-06-23 18:25:17.058941644 
+0200
+++ gcc/testsuite/g++.dg/modules/main-2.C   2025-06-23 18:26:11.416253264 
+0200
@@ -0,0 +1,4 @@
+// { dg-additional-options "-fmodules" }
+
+export module M;
+extern "C++" int main() {}
--- gcc/testsuite/g++.dg/modules/main-3.C.jj2025-06-23 18:26:20.393139580 
+0200
+++ gcc/testsuite/g++.dg/modules/main-3.C   2025-06-23 18:26:33.190977509 
+0200
@@ -0,0 +1,7 @@
+// { dg-additional-options "-fmodules" }
+
+export module M;
+extern "C++" {
+  int main() {}
+}
+

Jakub

Re: [Fortran, Patch, v1] 2/(3) Stop spending memory in coarray single mode executables.

2025-06-25 Thread Harald Anlauf


Am 25.06.25 um 13:42 schrieb Andre Vehreschild:

Hi,

attached patch prevents generation of a token component in derived types, when
-fcoarray=single is used. Generating the token only wastes memory. It is never
even initialized nor accessed.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?


This is OK.

Thanks for the patch!

Harald


Regards,
Andre

Re: [Fortran, Patch, PR120711, v1] 1/(3) Fix out of bounds access in cleanup of array constructor

2025-06-25 Thread Harald Anlauf


Am 25.06.25 um 13:39 schrieb Andre Vehreschild:

Hi all,

attached patch fixes an out of bounds access in the clean up code of a
concatenating array constructor. A fragment like

list = [ list, something() ]

lead to clean up using an offset (of the list array) that was manipulated in
the loop copying the existing array elements and at the end pointing to one
element past the list (after the concatenation).

This fixes a 15-regression. Releases prior to 15 do not have the out
of bounds access in the (non existing) clean up code. The have a memory
leak instead.

Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline?


This looks good to me.

Given the severity of the bug, do you plan to backport to 15-branch?

Thanks for the patch!

Harald


The subject says, that there will be 3 patches. Only this one fixes the bug.
The other fixes I found while hunting this issue and because they play in the
general same area, I don't want to loose them. I therefore publish them in this
context.

Regards,
Andre

[PATCH] RISC-V: update prepare_ternary_operands to handle the vector-scalar case [PR120828]

2025-06-25 Thread Paul-Antoine Arras

This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar
multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult
operands were wrongly swapped.
This patch ensures that operands are not swapped in the vector-scalar case.

PR target/120828

gcc/ChangeLog:

* config/riscv/riscv-v.cc (prepare_ternary_operands): Handle the
vector-scalar case.
---
 gcc/config/riscv/riscv-v.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git gcc/config/riscv/riscv-v.cc gcc/config/riscv/riscv-v.cc
index 45dd9256d..a3d704e81 100644
--- gcc/config/riscv/riscv-v.cc
+++ gcc/config/riscv/riscv-v.cc
@@ -4723,7 +4723,7 @@ prepare_ternary_operands (rtx *ops)
   ops[4], ops[1], ops[6], ops[7], ops[9]));
   ops[5] = ops[4] = ops[0];
 }
-  else
+  else if (VECTOR_MODE_P (GET_MODE (ops[2])))
 {
   /* Swap the multiplication ops if the fallback value is the
 second of the two.  */
@@ -4733,8 +4733,10 @@ prepare_ternary_operands (rtx *ops)
   /* TODO: ??? Maybe we could support splitting FMA (a, 4, b)
 into PLUS (ASHIFT (a, 2), b) according to uarchs.  */
 }
-  gcc_assert (rtx_equal_p (ops[5], RVV_VUNDEF (mode))
- || rtx_equal_p (ops[5], ops[2]) || rtx_equal_p (ops[5], ops[4]));
+  gcc_assert (
+rtx_equal_p (ops[5], RVV_VUNDEF (mode)) || rtx_equal_p (ops[5], ops[2])
+|| (!VECTOR_MODE_P (GET_MODE (ops[2])) && rtx_equal_p (ops[5], ops[3]))
+|| rtx_equal_p (ops[5], ops[4]));
 }
 
 /* Expand VEC_MASK_LEN_{LOAD_LANES,STORE_LANES}.  */
-- 
2.39.5

[PATCH v7 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-06-25 Thread Karl Meakin

Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 163 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  28 ++--
 2 files changed, 102 insertions(+), 89 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 32e0f739ae5..fc1cbbeaa4e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -728,6 +728,19 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an in range immediate are
+;;  handled by CB (immediate).
+;; 4) Otherwise, emit a CMP+B sequence.
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -738,7 +751,7 @@ (define_expand "cbranch4"
   {
   if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2]))
 {
-  /* Fall-through to `aarch64_cb`.  */
+  /* The branch is supported natively.  */
 }
   else
 {
@@ -784,6 +797,80 @@ (define_expand "cbranchcc4"
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
 ;; Emit a `CB (register)` or `CB (immediate)` instruction.
 ;; The immediate range depends on the comparison code.

[PATCH v2 2/2] middle-end: Enable masked load with non-constant offset

2025-06-25 Thread Karl Meakin

The function `vect_check_gather_scatter` requires the `base` of the load
to be loop-invariant and the `off`set to be not loop-invariant. When faced
with a scenario where `base` is not loop-invariant, instead of giving up
immediately we can try swapping the `base` and `off`, if `off` is
actually loop-invariant.

Previously, it would only swap if `off` was the constant zero (and so
trivially loop-invariant). This is too conservative: we can still
perform the swap if `off` is a more complex but still loop-invariant
expression, such as a variable defined outside of the loop.

This allows loops like the function below to be vectorised, if the
target has masked loads and sufficiently large vector registers (eg
`-march=armv8-a+sve -msve-vector-bits=128`):

```c
typedef struct Array {
int elems[3];
} Array;

int loop(Array **pp, int len, int idx) {
int nRet = 0;

for (int i = 0; i < len; i++) {
Array *p = pp[i];
if (p) {
nRet += p->elems[idx];
}
}

return nRet;
}
```

gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_check_gather_scatter): Swap
`base` and `off` in more scenarios. Also assert at the end of
the function that `base` and `off` are loop-invariant and not
loop-invariant respectively.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/mask_load_2.c: Update tests.
---
 .../gcc.target/aarch64/sve/mask_load_2.c  |  4 +--
 gcc/tree-vect-data-refs.cc| 26 ---
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
index 38fcf4f7206..66d95101a14 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
@@ -19,5 +19,5 @@ int loop(Array **pp, int len, int idx) {
 return nRet;
 }
 
-// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } }
-// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  0 } }
+// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 1 } }
+// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  1 } }
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ee040eb9888..ea8536ec262 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4659,26 +4659,19 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
   if (off == NULL_TREE)
 off = size_zero_node;
 
-  /* If base is not loop invariant, either off is 0, then we start with just
- the constant offset in the loop invariant BASE and continue with base
- as OFF, otherwise give up.
- We could handle that case by gimplifying the addition of base + off
- into some SSA_NAME and use that as off, but for now punt.  */
+  /* BASE must be loop invariant.  If it is not invariant, but OFF is, then we
+   * can fix that by swapping BASE and OFF.  */
   if (!expr_invariant_in_loop_p (loop, base))
 {
-  if (!integer_zerop (off))
+  if (!expr_invariant_in_loop_p (loop, off))
return false;
-  off = base;
-  base = size_int (pbytepos);
-}
-  /* Otherwise put base + constant offset into the loop invariant BASE
- and continue with OFF.  */
-  else
-{
-  base = fold_convert (sizetype, base);
-  base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
+
+  std::swap (base, off);
 }
 
+  base = fold_convert (sizetype, base);
+  base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
+
   /* OFF at this point may be either a SSA_NAME or some tree expression
  from get_inner_reference.  Try to peel off loop invariants from it
  into BASE as long as possible.  */
@@ -4856,6 +4849,9 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
   offset_vectype = NULL_TREE;
 }
 
+  gcc_checking_assert (expr_invariant_in_loop_p (loop, base));
+  gcc_checking_assert (!expr_invariant_in_loop_p (loop, off));
+
   info->ifn = ifn;
   info->decl = decl;
   info->base = base;
-- 
2.45.2

Re: [Fortran, Patch, v1] 3/(3) Prevent creating tree that is never used.

2025-06-25 Thread Harald Anlauf


Am 25.06.25 um 13:45 schrieb Andre Vehreschild:

Hi,

while hunting for pr120711 I found a construct where a call-tree was created
and never used. The patch now just suppresses the tree creation and instead
uses directly the tree that is desired.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?


This is OK.

Thanks for the patch!

Harald


Regards,
Andre

Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-25 Thread Jason Merrill


On 6/25/25 12:49 PM, Jakub Jelinek wrote:

On Wed, Jun 25, 2025 at 12:37:33PM -0400, Jason Merrill wrote:

Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class type
like a TARGET_EXPR.  I also wonder why the call isn't already wrapped in a
TARGET_EXPR by build_cxx_call=>build_cplus_new at this point.


Wonder if it has anything to do with being in unevaluated context


It seems to be bugginess in the handling of decltype_p, which is 
supposed to only apply to the immediate operand of decltype; the 
attached fixes the testcase.  I think we also still want the change to 
fixed_type_or_null.



(and
whether perhaps cp_build_addr_expr isn't undesirable for that case, because
that can make vars odr-used etc.; are are odr uses in unevaluated context
also supposed to make vars odr-used?).


That's fine, mark_used handles not actually odr-using things in 
unevaluated context.From 2cf9705f22ce2edcf749ef6721b1ee6c1200 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Wed, 25 Jun 2025 16:26:56 -0400
Subject: [PATCH] c++: fix decltype_p
To: gcc-patches@gcc.gnu.org

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_binary_expression): Don't pass decltype_p
	to the operands.
---
 gcc/cp/parser.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 80fd7990bbb..ba12c50fa7b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -10791,7 +10791,7 @@ cp_parser_binary_expression (cp_parser* parser, bool cast_p,
   current.lhs_type = (cp_lexer_next_token_is (parser->lexer, CPP_NOT)
 		  ? TRUTH_NOT_EXPR : ERROR_MARK);
   current.lhs = cp_parser_cast_expression (parser, /*address_p=*/false,
-	   cast_p, decltype_p, pidk);
+	   cast_p, /*decltype_p*/false, pidk);
   current.prec = prec;
 
   if (cp_parser_error_occurred (parser))
-- 
2.49.0

[PATCH v7 2/9] AArch64: reformat branch instruction rules

2025-06-25 Thread Karl Meakin

Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 68 +--
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fcc24e300e6..ee4c609ae0f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -714,7 +714,7 @@ (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
   "
@@ -729,30 +729,31 @@ (define_expand "cbranch4"
(match_operator 0 "aarch64_comparison_operator"
 [(match_operand:GPF_F16 1 "register_operand")
  (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
+   (label_ref (match_operand 3))
(pc)))]
   ""
-  "
+  {
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
-  "
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
@@ -789,10 +790,9 @@ (define_insn "condjump"
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2))
   (pc)))]
   "!aarch64_move_imm (INTVAL (operands[1]), mode)
&& !aarch64_plus_operand (operands[1], mode)
@@ -816,8 +816,8 @@ (define_insn_and_split "*compare_condjump"
 
 (define_insn "aarch64_cb1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
-   (const_int 0))
-  (label_ref (match_operand 1 "" ""))
+   (const_int 0))
+  (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
@@ -841,8 +841,8 @@ (define_insn "aarch64_cb1"
 
 (define_insn "*cb1"
   [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
-(const_int 0))
-  (label_ref (match_operand 1 "" ""))
+(const_int 0))
+  (label_ref (match_operand 1))
   (pc)))
(clobber (reg:CC CC_REGNUM))]
   "!aarch64_track_speculation"
@@ -883,11 +883,11 @@ (define_insn "*cb1"
 ;; ---
 
 (define_expand "tbranch_3"
-  [(set (pc) (if_then_else
- (EQL (match_operand:SHORT 0 "register_operand")
-  (match_operand 1 "const0_operand"))
- (label_ref (match_operand 2 ""))
- (pc)))]
+  [(set (pc) (if_then_else (EQL
+(match_operand:SHORT 0 "register_operand")
+(match_operand 1 "const0_operand"))
+

[PATCH v6 7/9] AArch64: precommit test for CMPBR instructions

2025-06-25 Thread Karl Meakin

Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1841 ++
 gcc/testsuite/lib/target-supports.exp|   14 +-
 2 files changed, 1849 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..b8925f14433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1841 @@
+// Test that the instructions added by FEAT_CMPBR are emitted
+// { dg-do compile }
+// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } }
+// { dg-options "-march=armv9.5-a+cmpbr -O2" }
+// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } 
{\.L[0-9]+} } }
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return (x0 op rhs) ? taken() : not_taken();
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// Comparisons against the immediate 0 can be done for all types,
+// because we can use the wzr/xzr register as one of the operands.
+// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible,
+// because they have larger range.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+// CBB and CBH cannot have immediate operands.
+// Instead we have to do a MOV+CB.
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 64 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead,
+//   because B has a longer range than CB.
+COMPARE_ALL(u8, i8, 64);
+COMPARE_ALL(u16, i16, 64);
+COMPARE_ALL(u32, i32, 64);
+COMPARE_ALL(u64, i64, 64);
+
+// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 
12
+// bits), but it can be materialized in a single MOV.
+COMPARE_ALL(u16, i16, 4098);
+COMPARE_ALL(u32, i32, 4098);
+COMPARE_ALL(u64, i64, 4098);
+
+/*
+** u8_x0_eq_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L4
+** b   not_taken
+** .L4:
+** b   taken
+*/
+
+/*
+** u8_x0_ne_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L6
+** b   taken
+** .L6:
+** b   not_taken
+*/
+
+/*
+** u8_x0_ult_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bls .L8
+** b   taken
+** .L8:
+** b   not_taken
+*/
+
+/*
+** u8_x0_ule_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcc .L10
+** b   taken
+** .L10:
+** b   not_taken
+*/
+
+/*
+** u8_x0_ugt_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcs .L12
+** b   taken
+** .L12:
+** b   not_taken
+*/
+
+/*
+** u8_x0_uge_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bhi .L14
+** b   taken
+** .L14:
+** b   not_taken
+*/
+
+/*
+** i8_x0_slt_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** ble

1 2 >

1 - 100 of 157 matches

Mail list logo