[to-be-committed][RISC-V][PR target/118886] Refine when two insns are signaled as fusion candidates

2025-07-02 Thread Jeff Law
A number of folks have had their fingers in this code and it's going to 
take a few submissions to do everything we want to do.


This patch is primarily concerned with avoiding signaling that fusion 
can occur in cases where it obviously should not be signaling fusion.


Every DEC based fusion I'm aware of requires the first instruction to 
set a destination register that is both used and set again by the second 
instruction.  If the two instructions set different registers, then the 
destination of the first instruction was not dead and would need to have 
a result produced.


This is complicated by the fact that we have pseudo registers prior to 
reload.  So the approach we take is to signal fusion prior to reload 
even if the destination registers don't match.  Post reload we require 
them to match.


That allows us to clean up the code ever-so-slightly.

Second, we sometimes signaled fusion into loads that weren't scalar 
integer loads.  I'm not aware of a design that's fusing into FP loads or 
vector loads.  So those get rejected explicitly.


Third, the store pair "fusion" code is cleaned up a little.  We use 
fusion to model store pair commits since the basic properties for 
detection are the same.  The point where they "fuse" is different.  Also 
this code liked to "return false" at each step along the way if fusion 
wasn't possible.  Future work for additional fusion cases makes that 
behavior undesirable.  So the logic gets reworked a little bit to be 
more friendly to future work.


Fourth, if we already fused the previous instruction, then we can't fuse 
it again.  Signaling fusion in that case is, umm, bad as it creates an 
atomic blob of code from a scheduling standpoint.


Hopefully I got everything correct with extracting this work out of a 
larger set of changes :-)  We will contribute some instrumentation & 
testing code so if I botched things in a major way we'll soon have a way 
to test that and I'll be on the hook to fix any goof's.


From a correctness standpoint this should be a big fat nop.  We've seen 
this make measurable differences in pico benchmarks, but obviously as 
you scale up to bigger stuff the gains largely disappear into the noise.


This has been through Ventana's internal CI and my tester.  I'll 
obviously wait for a verdict from the pre-commit tester.



Jeff



PR target/118886
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Check
for fusion being disabled earlier.  If PREV is already fused,
then it can't be fused again.  Be more selective about fusing
when the destination registers do not match.  Don't fuse into
loads that aren't scalar integer modes.  Revamp store pair
commit support.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index cd6d6b992b50..aeba31f8176a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,17 +10209,31 @@ riscv_fusion_enabled_p(enum riscv_fusion_pairs op)
 static bool
 riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 {
+  /* If fusion is not enabled, then there's nothing to do.  */
+  if (!riscv_macro_fusion_p ())
+return false;
+
+  /* If PREV is already marked as fused, then we can't fuse CURR with PREV
+ and if we were to fuse them we'd end up with a blob of insns that
+ essentially are an atomic unit which is bad for scheduling.  */
+  if (SCHED_GROUP_P (prev))
+return false;
+
   rtx prev_set = single_set (prev);
   rtx curr_set = single_set (curr);
   /* prev and curr are simple SET insns i.e. no flag setting or branching.  */
   bool simple_sets_p = prev_set && curr_set && !any_condjump_p (curr);
+  bool sched1 = can_create_pseudo_p ();
 
-  if (!riscv_macro_fusion_p ())
-return false;
+  unsigned int prev_dest_regno
+= REG_P (SET_DEST (prev_set)) ? REGNO (SET_DEST (prev_set)) : 
FIRST_PSEUDO_REGISTER;
+  unsigned int curr_dest_regno
+= REG_P (SET_DEST (curr_set)) ? REGNO (SET_DEST (curr_set)) : 
FIRST_PSEUDO_REGISTER;
 
   if (simple_sets_p
   && (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW)
- || riscv_fusion_enabled_p (RISCV_FUSE_ZEXTWS)))
+ || riscv_fusion_enabled_p (RISCV_FUSE_ZEXTWS))
+  && (sched1 || prev_dest_regno == curr_dest_regno))
 {
   /* We are trying to match the following:
   prev (slli) == (set (reg:DI rD)
@@ -10233,8 +10247,7 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && GET_CODE (SET_SRC (curr_set)) == LSHIFTRT
  && REG_P (SET_DEST (prev_set))
  && REG_P (SET_DEST (curr_set))
- && REGNO (SET_DEST (prev_set)) == REGNO (SET_DEST (curr_set))
- && REGNO (XEXP (SET_SRC (curr_set), 0)) == REGNO(SET_DEST (curr_set))
+ && REGNO (XEXP (SET_SRC (curr_set), 0)) == curr_dest_regno
  && CONST_INT_P (XEXP (SET_SRC (prev_set), 1))
  && CONST_INT_P (XEXP (SET_SRC (curr_set), 1))
  && INTVAL (XEXP (SET_SRC (prev_set), 1)) == 32
@@ -10245,7 +10258,8

Re: [PATCH v2] libstdc++: construct bitset from string_view (P2697) [PR119742]

2025-07-02 Thread Jonathan Wakely
On Thu, 3 Jul 2025, 00:56 Nathan Myers,  wrote:

> Changes in V2:
> * Generalize private member _M_check_initial_position for use with
>   both string and string_view arguments.
> * Remove unnecessary #if guards for version and hostedness.
> * Remove redundant "std::" qualifications in new code.
> * Improve Doxygen source readability.
> * Clarify commit message text.
> * Fix ChangeLog style.
>

OK for trunk, thanks



> Add a bitset constructor from string_view, per P2697. Fix existing
> tests that would fail to detect incorrect exception behavior.
>
> Argument checks that result in exceptions guarded by "#if HOSTED"
> are made unguarded because the functions called to throw just call
> terminate() in free-standing builds. Improve readability in Doxygen
> comments. Generalize a private member argument-checking function
> to work with string and string_view without mentioning either,
> obviating need for guards.
>
> The version.h symbol is not "hosted" because string_view, though
> not specified to be available in free-standing builds, is defined
> there and the feature is useful there.
>
> libstdc++-v3/ChangeLog:
> PR libstdc++/119742
> * include/bits/version.def: Add preprocessor symbol.
> * include/bits/version.h: Add preprocessor symbol.
> * include/std/bitset: Add constructor.
> * testsuite/20_util/bitset/cons/1.cc: Fix.
> * testsuite/20_util/bitset/cons/6282.cc: Fix.
> * testsuite/20_util/bitset/cons/string_view.cc: Test new ctor.
> * testsuite/20_util/bitset/cons/string_view_wide.cc: Test new ctor.
> ---
>  libstdc++-v3/include/bits/version.def |   8 ++
>  libstdc++-v3/include/bits/version.h   |  10 ++
>  libstdc++-v3/include/std/bitset   |  82 +++
>  .../testsuite/20_util/bitset/cons/1.cc|   1 +
>  .../testsuite/20_util/bitset/cons/6282.cc |   5 +-
>  .../20_util/bitset/cons/string_view.cc| 132 ++
>  .../20_util/bitset/cons/string_view_wide.cc   |   8 ++
>  7 files changed, 215 insertions(+), 31 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/20_util/bitset/cons/string_view.cc
>  create mode 100644
> libstdc++-v3/testsuite/20_util/bitset/cons/string_view_wide.cc
>
> diff --git a/libstdc++-v3/include/bits/version.def
> b/libstdc++-v3/include/bits/version.def
> index f4ba501c403..b89b287e8e8 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -2030,6 +2030,14 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = bitset  // ...construct from string_view
> +  values = {
> +v = 202306;
> +cxxmin = 26;
> +  };
> +};
> +
>  // Standard test specifications.
>  stds[97] = ">= 199711L";
>  stds[03] = ">= 199711L";
> diff --git a/libstdc++-v3/include/bits/version.h
> b/libstdc++-v3/include/bits/version.h
> index dc8ac07be16..a70a7ede68c 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -2273,4 +2273,14 @@
>  #endif /* !defined(__cpp_lib_exception_ptr_cast) &&
> defined(__glibcxx_want_exception_ptr_cast) */
>  #undef __glibcxx_want_exception_ptr_cast
>
> +#if !defined(__cpp_lib_bitset)
> +# if (__cplusplus >  202302L)
> +#  define __glibcxx_bitset 202306L
> +#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_bitset)
> +#   define __cpp_lib_bitset 202306L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_bitset) && defined(__glibcxx_want_bitset) */
> +#undef __glibcxx_bitset
> +
>  #undef __glibcxx_want_all
> diff --git a/libstdc++-v3/include/std/bitset
> b/libstdc++-v3/include/std/bitset
> index 8b5d270c2a9..1c1e1670c33 100644
> --- a/libstdc++-v3/include/std/bitset
> +++ b/libstdc++-v3/include/std/bitset
> @@ -61,8 +61,13 @@
>  #endif
>
>  #define __glibcxx_want_constexpr_bitset
> +#define __glibcxx_want_bitset  // ...construct from string_view
>  #include 
>
> +#ifdef __cpp_lib_bitset // ...construct from string_view
> +# include 
> +#endif
> +
>  #define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_LONG__)
>  #define _GLIBCXX_BITSET_WORDS(__n) \
>((__n) / _GLIBCXX_BITSET_BITS_PER_WORD + \
> @@ -752,7 +757,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> *  (Note that %bitset does @e not meet the formal requirements of a
> *  container.  Mainly, it lacks
> iterators.)
> *
> -   *  The template argument, @a Nb, may be any non-negative number,
> +   *  The template argument, `Nb`, may be any non-negative number,
> *  specifying the number of bits (e.g., "0", "12", "1024*1024").
> *
> *  In the general unoptimized case, storage is allocated in word-sized
> @@ -816,28 +821,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>typedef _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)> _Base;
>typedef unsigned long _WordT;
>
> -#if _GLIBCXX_HOSTED
> -  template
> -  _GLIBCXX23_CONSTEXPR
> -  void
> -  _M_check_initial_position(const std::basic_string<_CharT, _Traits,
> _Alloc>& __s,
> -   

Re: [PATCH] check-function-bodies: Support "^[0-9]+:"

2025-07-02 Thread H.J. Lu
On Wed, Jul 2, 2025 at 9:12 AM H.J. Lu  wrote:
>
> While working on
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120881
>
> I tried to use check-function-bodies to verify that
>
> 1: call mcount
>
> generated by "-pg" is placed at the function entry.  Add "^[0-9]+:" to
> check-function-bodies to allow:
>
> 1: call mcount
>
> PR testsuite/120881
> * lib/scanasm.exp (check-function-bodies): Allow "^[0-9]+:".
>
> OK for master?
>

Any comments on this simple change:

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 97935cb23c3..a2311de5704 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -1109,6 +1109,8 @@ proc check-function-bodies { args } {
  append function_regexp ".*"
  } elseif { [regexp {^\.L} $line] } {
  append function_regexp $line "\n"
+ } elseif { [regexp {^[0-9]+:} $line] } {
+ append function_regexp $line "\n"
  } else {
  append function_regexp $config(line_prefix) $line "\n"
  }

This blocks other patches I am working on.

Thanks.

-- 
H.J.
From 8ebcd475a287188cc242d886977a17e1559861da Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 2 Jul 2025 08:51:47 +0800
Subject: [PATCH] check-function-bodies: Support "^[0-9]+:"

While working on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120936

I tried to use check-function-bodies to verify that label for mcount
and __fentry__ is only generated by "-pg" if it is used by __mcount_loc
section:

1:	call	mcount
	.section __mcount_loc, "a",@progbits
	.quad 1b
	.previous

Add "^[0-9]+:" to check-function-bodies to allow:

1:	call	mcount

	PR testsuite/120881
	* lib/scanasm.exp (check-function-bodies): Allow "^[0-9]+:".

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/lib/scanasm.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 97935cb23c3..a2311de5704 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -1109,6 +1109,8 @@ proc check-function-bodies { args } {
 		append function_regexp ".*"
 	} elseif { [regexp {^\.L} $line] } {
 		append function_regexp $line "\n"
+	} elseif { [regexp {^[0-9]+:} $line] } {
+		append function_regexp $line "\n"
 	} else {
 		append function_regexp $config(line_prefix) $line "\n"
 	}
-- 
2.50.0



Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-02 Thread Steve Kargl
On Thu, Jul 03, 2025 at 12:56:24AM +0800, Yuao Ma wrote:
> 
> This patch adds the required function for Fortran trigonometric functions to
> work with glibc versions prior to 2.26. It's based on glibc source commit
> 632d895f3e5d98162f77b9c3c1da4ec19968b671.
> 
> I've built it successfully on my end. Documentation is also included.
> 
> Please take a look when you have a moment.
> 

...

>   * math/acospiq.c: New file.
>   * math/cospiq.c: New file.

I have only at these 2 functions and only quickly as I 
cannot read (L)GPL for implementation of special function
for fear of taint.

Have you done any benchmarks with respect to numerical
accuracy?  I suspect these functions have unnecessary
issues with precision.

-- 
steve


[PATCH v8 1/9] AArch64: place branch instruction rules together

2025-07-02 Thread Karl Meakin
The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e11e13033d2..fcc24e300e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -682,6 +682,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -700,6 +704,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -739,6 +749,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

[PATCH v8 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-07-02 Thread Karl Meakin
Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 161 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  28 ++--
 2 files changed, 101 insertions(+), 88 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c50c41753a7..509ef4c0f2f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -724,6 +724,19 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an in range immediate are
+;;  handled by CB (immediate).
+;; 4) Otherwise, emit a CMP+B sequence.
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -780,6 +793,80 @@ (define_expand "cbranchcc4"
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
 ;; Emit a `CB (register)` or `CB (immediate)` instruction.
 ;; The immediate range depends on the comparison code.
 ;; Comparisons against immediates outside this range fall back to
@@ -916,80 +1003,6 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
   }
 )
 
-;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
-(define_insn "aarch64_cbz1"
-  [(set (pc)

Re: [Fortran, Patch, PR120843, v3] Fix reject valid, because of inconformable coranks

2025-07-02 Thread Jerry D

On 7/2/25 9:40 AM, Jerry D wrote:

On 7/2/25 3:14 AM, Andre Vehreschild wrote:

Hi all,

I successfully created a big mess with the previous patch. First of all by
applying an outdated one and secondly by adding the conformance checks for
coranks in a3f1cdd8ed46f9816b31ab162ae4dac547d34ebc. Checking the standard even
using AI (haha) to figure if coranks of an expression have restrictions on
them, failed. I found nothing. AI fantasized about restrictions that did not
exist. Therefore the current approach is to remove the conformance check and
just use the computed coranks in expressions to prevent recomputaion whenever
they needed.

Jerry, Harald: Sorry for all the bother and all my mistakes. I am really sorry
to have wasted your time.

The patch has been regtested fine on x86_64-pc-linux-gnu / F41. Ok for mainline
and later backport to gcc-15?

Regards,
Andre


--- snip ---

With this fixer patch, I can successfully compile Toon's test case.

The patch also regression tests here OK.

OK to push.

Jerry


As a followup, with the fixer patch applied, OpenCoarrays builds.  However, make 
test hangs at Test 4.


 3/88 Test  #3: register_vector ... 
Passed0.42 sec

  Start  4: register_alloc_vector
^Cmake[3]: *** [CMakeFiles/check.dir/build.make:70: CMakeFiles/check] Interrupt
make[2]: *** [CMakeFiles/Makefile2:962: CMakeFiles/check.dir/all] Interrupt
make[1]: *** [CMakeFiles/Makefile2:969: CMakeFiles/check.dir/rule] Interrupt
make: *** [Makefile:205: check] Interrupt

I waited about 20 minutes.  This may be another bug. I built and tested 
OpenCorrays about 2 days ago with gfortran 14 with no problems.


I am able to compile and run Toon's test case with this OpenCoarrays. So we 
still have some breakage on 15 and 16 remaining.  I have not tried building 
OpenCoarrays with the shmem patch applied yet. This will be my next step here.


Regards,

Jerry



[PATCH v9 3/9] AArch64: rename branch instruction rules

2025-07-02 Thread Karl Meakin
Give the `define_insn` rules used in lowering `cbranch4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump): Rename to ...
(*aarch64_bcond_wide_imm): ...here.
(aarch64_cb): Rename to ...
(aarch64_cbz1): ...here.
(*cb1): Rename to ...
(*aarch64_tbz1): ...here.
(@aarch64_tb): Rename to ...
(@aarch64_tbz): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/aarch64/aarch64-sme.md  |  2 +-
 gcc/config/aarch64/aarch64.cc  |  6 +++---
 gcc/config/aarch64/aarch64.md  | 23 +--
 4 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index af574d5bb0a..8de79caa86d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3966,7 +3966,7 @@ (define_expand "cbranch4"
 
   rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
   rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[3]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index f7958c90eae..b8bb4cc14b6 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -391,7 +391,7 @@ (define_insn_and_split "aarch64_restore_za"
 auto label = gen_label_rtx ();
 auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
 emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
-auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+auto jump = emit_likely_jump_insn (gen_aarch64_cbznedi1 (tpidr2, label));
 JUMP_LABEL (jump) = label;
 
 aarch64_restore_za (operands[0]);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index abbb97768f5..2cd03b941bd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2884,10 +2884,10 @@ aarch64_gen_test_and_branch (rtx_code code, rtx x, int 
bitnum,
   emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask));
   rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM);
   rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx);
-  return gen_condjump (x, cc_reg, label);
+  return gen_aarch64_bcond (x, cc_reg, label);
 }
-  return gen_aarch64_tb (code, mode, mode,
-x, gen_int_mode (bitnum, mode), label);
+  return gen_aarch64_tbz (code, mode, mode,
+  x, gen_int_mode (bitnum, mode), label);
 }
 
 /* Consider the operation:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 25286add0c8..8ce991e2f35 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -749,7 +749,8 @@ (define_expand "cbranchcc4"
   ""
 )
 
-(define_insn "condjump"
+;; Emit `B`, assuming that the condition is already in the CC register.
+(define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
@@ -789,7 +790,7 @@ (define_insn "condjump"
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
-(define_insn_and_split "*compare_condjump"
+(define_insn_and_split "*aarch64_bcond_wide_imm"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(match_operand:GPI 1 "aarch64_imm24" "n"))
   (label_ref:P (match_operand 2))
@@ -809,12 +810,13 @@ (define_insn_and_split "*compare_condjump"
 rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
 rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
  cc_reg, const0_rtx);
-emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[2]));
 DONE;
   }
 )
 
-(define_insn "aarch64_cb1"
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
@@ -839,7 +841,8 @@ (define_insn "aarch64_cb1"
  (const_int 1)))]
 )
 
-(define_insn "*cb1"
+;; For an 

[PATCH v9 1/9] AArch64: place branch instruction rules together

2025-07-02 Thread Karl Meakin
The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e11e13033d2..fcc24e300e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -682,6 +682,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -700,6 +704,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -739,6 +749,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

[PATCH v9 6/9] AArch64: recognize `+cmpbr` option

2025-07-02 Thread Karl Meakin
Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 3 +++
 gcc/doc/invoke.texi  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index dbbb021f05a..1c3e69799f5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -249,6 +249,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("cmpbr", CMPBR, (), (), (), "cmpbr")
+
 AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
 
 AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e8bd8c73c12..d5c4a42e96d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -410,6 +410,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 /* CSSC instructions are enabled through +cssc.  */
 #define TARGET_CSSC AARCH64_HAVE_ISA (CSSC)
 
+/* CB instructions are enabled through +cmpbr.  */
+#define TARGET_CMPBR AARCH64_HAVE_ISA (CMPBR)
+
 /* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs.  */
 #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8163c3a185c..ea26f45bb4d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -22461,6 +22461,9 @@ Enable the FlagM2 flag conversion instructions.
 Enable the Pointer Authentication Extension.
 @item cssc
 Enable the Common Short Sequence Compression instructions.
+@item cmpbr
+Enable the shorter compare and branch instructions, @code{cbb}, @code{cbh} and
+@code{cb}.
 @item sme
 Enable the Scalable Matrix Extension.  This is only supported when SVE2 is also
 enabled.
-- 
2.48.1



[PATCH v9 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-07-02 Thread Karl Meakin
Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 161 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  36 ++---
 2 files changed, 105 insertions(+), 92 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c50c41753a7..509ef4c0f2f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -724,6 +724,19 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an in range immediate are
+;;  handled by CB (immediate).
+;; 4) Otherwise, emit a CMP+B sequence.
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -780,6 +793,80 @@ (define_expand "cbranchcc4"
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
 ;; Emit a `CB (register)` or `CB (immediate)` instruction.
 ;; The immediate range depends on the comparison code.
 ;; Comparisons against immediates outside this range fall back to
@@ -916,80 +1003,6 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
   }
 )
 
-;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
-(define_insn "aarch64_cbz1"
-  [(set (pc

[PATCH v9 8/9] AArch64: rules for CMPBR instructions

2025-07-02 Thread Karl Meakin
Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch4): Rename to ...
(cbranch4): ...here, and emit CMPBR if possible.
(cbranch4): New expand rule.
(aarch64_cb): New insn rule.
(aarch64_cb): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c:
---
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64.cc|  33 ++
 gcc/config/aarch64/aarch64.md|  95 ++-
 gcc/config/aarch64/constraints.md|  18 +
 gcc/config/aarch64/iterators.md  |  30 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 719 +--
 6 files changed, 450 insertions(+), 447 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 31f2f5b8bd2..e946e8da11d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, 
vec,
 unsigned int, tree, unsigned int,
 tree *);
 
+bool aarch64_cb_rhs (rtx_code op_code, rtx rhs);
+
 namespace aarch64 {
   void report_non_ice (location_t, tree, unsigned int);
   void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2cd03b941bd..f3ce3a15b09 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern)
   gcc_unreachable ();
 }
 
+/* Return true if RHS is an operand suitable for a CB (immediate)
+   instruction.  OP_CODE determines the type of the comparison.  */
+bool
+aarch64_cb_rhs (rtx_code op_code, rtx rhs)
+{
+  if (!CONST_INT_P (rhs))
+return REG_P (rhs);
+
+  HOST_WIDE_INT rhs_val = INTVAL (rhs);
+
+  switch (op_code)
+{
+case EQ:
+case NE:
+case GT:
+case GTU:
+case LT:
+case LTU:
+  return IN_RANGE (rhs_val, 0, 63);
+
+case GE:  /* CBGE:   signed greater than or equal */
+case GEU: /* CBHS: unsigned greater than or equal */
+  return IN_RANGE (rhs_val, 1, 64);
+
+case LE:  /* CBLE:   signed less than or equal */
+case LEU: /* CBLS: unsigned less than or equal */
+  return IN_RANGE (rhs_val, -1, 62);
+
+default:
+  return false;
+}
+}
+
 /* Return the location of a piece that is known to be passed or returned
in registers.  FIRST_ZR is the first unused vector argument register
and FIRST_PR is the first unused predicate argument register.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0169ec5cf24..c50c41753a7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -713,6 +713,10 @@ (define_constants
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
@@ -720,7 +724,7 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
@@ -728,12 +732,29 @@ (define_expand "cbranch4"
   (pc)))]
   ""
   {
-operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-  operands[2]);
-operands[2] = const0_rtx;
+if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2]))
+  {
+   /* The branch is supported natively.  */
+  }
+else
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+  operands[1], operands[2]);
+operands[2] = const0_rtx;
+  }
   }
 )
 
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 "aarch64_reg_or_zero")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  ""
+)
+
 (define_expand "cbranch4"
   [(set (pc) 

[PATCH v9 7/9] AArch64: precommit test for CMPBR instructions

2025-07-02 Thread Karl Meakin
Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1991 ++
 gcc/testsuite/lib/target-supports.exp|   14 +-
 2 files changed, 1999 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..83ca348bfc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1991 @@
+// Test that the instructions added by FEAT_CMPBR are emitted
+// { dg-do compile }
+// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } }
+// { dg-options "-march=armv9.5-a+cmpbr -O2" }
+// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } 
{\.L[0-9]+} } }
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return __builtin_expect(x0 op rhs, 0) ? taken() : not_taken(); 
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// Comparisons against the immediate 0 can be done for all types,
+// because we can use the wzr/xzr register as one of the operands.
+// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible,
+// because they have larger range.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+// CBB and CBH cannot have immediate operands.
+// Instead we have to do a MOV+CB.
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 64 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead,
+//   because B has a longer range than CB.
+COMPARE_ALL(u8, i8, 64);
+COMPARE_ALL(u16, i16, 64);
+COMPARE_ALL(u32, i32, 64);
+COMPARE_ALL(u64, i64, 64);
+
+// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 
12
+// bits), but it can be materialized in a single MOV.
+COMPARE_ALL(u16, i16, 4098);
+COMPARE_ALL(u32, i32, 4098);
+COMPARE_ALL(u64, i64, 4098);
+
+// If the branch destination is out of range (1KiB), we have to generate an
+// extra B instruction (which can handle larger displacements) and branch 
around
+// it
+
+// clang-format off
+#define STORE_1()   z = 0;
+#define STORE_2()   STORE_1()   STORE_1()
+#define STORE_4()   STORE_2()   STORE_2()
+#define STORE_8()   STORE_4()   STORE_4()
+#define STORE_16()  STORE_8()   STORE_8()
+#define STORE_32()  STORE_16()  STORE_16()
+#define STORE_64()  STORE_32()  STORE_32()
+#define STORE_128() STORE_64()  STORE_64()
+#define STORE_256() STORE_128() STORE_128()
+// clang-format on
+
+#define FAR_BRANCH(ty, rhs)
\
+  int far_branch_##ty##_x0_eq_##rhs(ty x0, ty x1) {
\
+volatile int z = 0;
\
+if (__builtin_expect(x0 == rhs, 1)) {  
\
+  STORE_256(); 
\
+}  

[PATCH v9 4/9] AArch64: add constants for branch displacements

2025-07-02 Thread Karl Meakin
Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
(BRANCH_LEN_N_128MiB): Likewise.
(BRANCH_LEN_P_1MiB): Likewise.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
---
 gcc/config/aarch64/aarch64.md | 60 +--
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8ce991e2f35..3f37ea6cff7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -704,7 +704,19 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+;; Maximum PC-relative positive/negative displacements for various branching
+;; instructions.
+(define_constants
+  [
+;; +/- 1MiB.  Used by B., CBZ, CBNZ.
+(BRANCH_LEN_P_1MiB  1048572)
+(BRANCH_LEN_N_1MiB -1048576)
 
+;; +/- 32KiB.  Used by TBZ, TBNZ.
+(BRANCH_LEN_P_32KiB  32764)
+(BRANCH_LEN_N_32KiB -32768)
+  ]
+)
 
 ;; ---
 ;; Conditional jumps
@@ -769,13 +781,17 @@ (define_insn "aarch64_bcond"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -830,13 +846,17 @@ (define_insn "aarch64_cbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -870,13 +890,17 @@ (define_insn "*aarch64_tbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768))
-  (lt (minus (match_dup 1) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -931,13 +955,17 @@ (define_insn "@aarch64_tbz"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768))
-  (lt (minus (match_dup 2) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_i

[PATCH v9 5/9] AArch64: make `far_branch` attribute a boolean

2025-07-02 Thread Karl Meakin
The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz1): Likewise.
(*aarch64_tbz1): Likewise.
(@aarch64_tbz): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3f37ea6cff7..0169ec5cf24 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -569,9 +569,7 @@ (define_attr "enabled" "no,yes"
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -792,8 +790,8 @@ (define_insn "aarch64_bcond"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -857,8 +855,8 @@ (define_insn "aarch64_cbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
@@ -872,7 +870,7 @@ (define_insn "*aarch64_tbz1"
   {
 if (get_attr_length (insn) == 8)
   {
-   if (get_attr_far_branch (insn) == 1)
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
  return aarch64_gen_far_branch (operands, 1, "Ltb",
 "\\t%0, , ");
else
@@ -901,8 +899,8 @@ (define_insn "*aarch64_tbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; ---
@@ -966,8 +964,8 @@ (define_insn "@aarch64_tbz"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 
 )
 
-- 
2.48.1



[PATCH v9 2/9] AArch64: reformat branch instruction rules

2025-07-02 Thread Karl Meakin
Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 84 +--
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fcc24e300e6..25286add0c8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -714,14 +714,14 @@ (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+  operands[2]);
+operands[2] = const0_rtx;
+  }
 )
 
 (define_expand "cbranch4"
@@ -729,30 +729,31 @@ (define_expand "cbranch4"
(match_operator 0 "aarch64_comparison_operator"
 [(match_operand:GPF_F16 1 "register_operand")
  (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
+   (label_ref (match_operand 3))
(pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+  operands[2]);
+operands[2] = const0_rtx;
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
@@ -789,10 +790,9 @@ (define_insn "condjump"
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2))
   (pc)))]
   "!aarch64_move_imm (INTVAL (operands[1]), mode)
&& !aarch64_plus_operand (operands[1], mode)
@@ -816,8 +816,8 @@ (define_insn_and_split "*compare_condjump"
 
 (define_insn "aarch64_cb1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
-   (const_int 0))
-  (label_ref (match_operand 1 "" ""))
+   (const_int 0))
+  (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
@@ -841,8 +841,8 @@ (define_insn "aarch64_cb1"
 
 (define_insn "*cb1"
   [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
-(const_int 0))
-  (label_ref (match_operand 1 "" ""))
+(const_int 0))
+  (label_ref (match_operand 1))
   (pc)))
(clobber (reg:CC CC_REGNUM))]
   "!aarch64_track_speculation

[PATCH v9 0/9] AArch64: CMPBR support

2025-07-02 Thread Karl Meakin
This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Changelog:
* v9:
  - Mark the non-far branches unlikely, so that the branch is consistently 
generated as:
  ```asm
branch-if-true .L123
b  not_taken
.L123:
b  taken
  ```
* v8:
  - Support far branches for the `CBB` and `CBH` instructions, and add tests 
for them.
  - Mark the branch in the far branch tests likely, so that the optimizer does
not invert the condition.
  - Use regex captures for register and label names so that the tests are less 
fragile.
  - Minor formatting fixes.
* v7:
  - Support far branches and add a test for them.
  - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`.
  - Delete the new predicates that aren't needed anymore.
  - Minor formatting and comment fixes.
* v6:
  - Correct the constraint string for immediate operands.
  - Drop the commit for adding `%j` format specifiers. The suffix for
the `cb` instruction is now calculated by the `cmp_op` code
attribute.
* v5:
  - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
CMPBR...). Every commit in the series should now produce a correct
compiler.
  - Reduce excessive diff context by not passing `--function-context` to
`git format-patch`.
* v4:
  - Added a commit to use HS/LO instead of CS/CC mnemonics.
  - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
CBHS, CBLE and CBLS have different ranges of allowed immediates than
the other comparisons.

Karl Meakin (9):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |   39 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  570 --
 gcc/config/aarch64/constraints.md |   18 +
 gcc/config/aarch64/iterators.md   |   30 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1824 +
 gcc/testsuite/lib/target-supports.exp |   14 +-
 12 files changed, 2285 insertions(+), 224 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

--
2.48.1


Re: [PATCH] x86-64: Add RDI clobber to tls_local_dynamic_64 patterns

2025-07-02 Thread Uros Bizjak
On Thu, Jul 3, 2025 at 6:32 AM H.J. Lu  wrote:
>
> *tls_local_dynamic_64_ uses RDI as the __tls_get_addr argument.
> Add RDI clobber to tls_local_dynamic_64 patterns to show it.
>
> PR target/120908
> * config/i386/i386.cc (legitimize_tls_address): Pass RDI to
> gen_tls_local_dynamic_64.
> * config/i386/i386.md (*tls_local_dynamic_64_): Add RDI
> clobber and use it to generate LEA.
> (@tls_local_dynamic_64_): Add a clobber.

*tls_local_dynamic_base_64_largepic needs the same treatment.

> OK for master?

OK with *tls_local_dynamic_base_64_largepic also fixed.

Thanks,
Uros.


Re: [PATCH v3] tree-optimization/120780: Support object size for containing objects

2025-07-02 Thread Richard Biener
On Wed, Jul 2, 2025 at 11:32 PM Siddhesh Poyarekar  wrote:
>
> MEM_REF cast of a subobject to its containing object has negative
> offsets, which objsz sees as an invalid access.  Support this use case
> by peeking into the structure to validate that the containing object
> indeed contains a type of the subobject at that offset and if present,
> adjust the wholesize for the object to allow the negative offset.

This variant works for me.

> gcc/ChangeLog:
>
> PR tree-optimization/120780
> * tree-object-size.cc (inner_at_offset,
> get_wholesize_for_memref): New functions.
> (addr_object_size): Call GET_WHOLESIZE_FOR_MEMREF.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/120780
> * gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
> Changes from v2:
> * Skip over sub-byte offsets
>
> Changes from v1:
> * Use byte_position to get byte position of a field
>
> Testing:
> - x86_64 bootstrap and test
> - i686 build and test
> - config=ubsan bootstrap
>
>  .../builtin-dynamic-object-size-pr120780.c| 233 ++
>  gcc/tree-object-size.cc   |  90 ++-
>  2 files changed, 322 insertions(+), 1 deletion(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
>
> diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
> b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
> new file mode 100644
> index 000..0d6593ec828
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
> @@ -0,0 +1,233 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "builtin-object-size-common.h"
> +typedef __SIZE_TYPE__ size_t;
> +#define NUM_MCAST_RATE 6
> +
> +#define MIN(a,b) ((a) < (b) ? (a) : (b))
> +#define MAX(a,b) ((a) > (b) ? (a) : (b))
> +
> +struct inner
> +{
> +  int dummy[4];
> +};
> +
> +struct container
> +{
> +  int mcast_rate[NUM_MCAST_RATE];
> +  struct inner mesh;
> +};
> +
> +static void
> +test1_child (struct inner *ifmsh, size_t expected)
> +{
> +  struct container *sdata =
> +(struct container *) ((void *) ifmsh
> + - __builtin_offsetof (struct container, mesh));
> +
> +  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
> +  != sizeof (sdata->mcast_rate))
> +FAIL ();
> +
> +  if (__builtin_dynamic_object_size (&sdata->mesh, 1) != expected)
> +FAIL ();
> +}
> +
> +void
> +__attribute__((noinline))
> +test1 (size_t sz)
> +{
> +  struct container *sdata = __builtin_malloc (sz);
> +  struct inner *ifmsh = &sdata->mesh;
> +
> +  test1_child (ifmsh,
> +  (sz > sizeof (sdata->mcast_rate)
> +   ? sz - sizeof (sdata->mcast_rate) : 0));
> +
> +  __builtin_free (sdata);
> +}
> +
> +struct container2
> +{
> +  int mcast_rate[NUM_MCAST_RATE];
> +  union
> +{
> +  int dummy;
> +  double dbl;
> +  struct inner mesh;
> +} u;
> +};
> +
> +static void
> +test2_child (struct inner *ifmsh, size_t sz)
> +{
> +  struct container2 *sdata =
> +(struct container2 *) ((void *) ifmsh
> +  - __builtin_offsetof (struct container2, u.mesh));
> +
> +  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
> +  != sizeof (sdata->mcast_rate))
> +FAIL ();
> +
> +  size_t diff = sizeof (*sdata) - sz;
> +  size_t expected = MIN(sizeof (double), MAX (sizeof (sdata->u), diff) - 
> diff);
> +
> +  if (__builtin_dynamic_object_size (&sdata->u.dbl, 1) != expected)
> +FAIL ();
> +
> +  expected = MAX (sizeof (sdata->u.mesh), diff) - diff;
> +  if (__builtin_dynamic_object_size (&sdata->u.mesh, 1) != expected)
> +FAIL ();
> +}
> +
> +void
> +__attribute__((noinline))
> +test2 (size_t sz)
> +{
> +  struct container2 *sdata = __builtin_malloc (sz);
> +  struct inner *ifmsh = &sdata->u.mesh;
> +
> +  test2_child (ifmsh, sz);;
> +
> +  __builtin_free (sdata);
> +}
> +
> +struct container3
> +{
> +  int mcast_rate[NUM_MCAST_RATE];
> +  char mesh[8];
> +};
> +
> +static void
> +test3_child (char ifmsh[], size_t expected)
> +{
> +  struct container3 *sdata =
> +(struct container3 *) ((void *) ifmsh
> +  - __builtin_offsetof (struct container3, mesh));
> +
> +  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
> +  != sizeof (sdata->mcast_rate))
> +FAIL ();
> +
> +  if (__builtin_dynamic_object_size (sdata->mesh, 1) != expected)
> +FAIL ();
> +}
> +
> +void
> +__attribute__((noinline))
> +test3 (size_t sz)
> +{
> +  struct container3 *sdata = __builtin_malloc (sz);
> +  char *ifmsh = sdata->mesh;
> +  size_t diff = sizeof (*sdata) - sz;
> +
> +  test3_child (ifmsh, MAX(sizeof (sdata->mesh), diff) - diff);
> +
> +  __builtin_free (sdata);
> +}
> +
> +
> +struct container4
> +{
> +  int mcast_rate[NUM_MCAST_RATE];
> +  struct
> +{
> +  int dummy;
> +  struct inner mesh;
> +} s;
> +};
> +
> +static void
> +test4_chi

Re: [PATCH v4 2/6] dwarf: create annotation DIEs for btf tags

2025-07-02 Thread Richard Biener
On Wed, Jul 2, 2025 at 7:17 PM David Faust  wrote:
>
>
>
> On 7/2/25 00:35, Richard Biener wrote:
> > On Tue, Jul 1, 2025 at 11:20 PM David Faust  wrote:
> >>
> >>
> >>
> >> On 7/1/25 01:02, Richard Biener wrote:
> >>> On Mon, Jun 30, 2025 at 9:12 PM David Faust  
> >>> wrote:
> 
> 
> 
>  On 6/30/25 06:11, Richard Biener wrote:
> >> +static void
> >> +gen_btf_decl_tag_dies (tree t, dw_die_ref target, dw_die_ref 
> >> context_die)
> >> +{
> >> +  if (t == NULL_TREE || !DECL_P (t) || !target)
> >> +return;
> >> +
> >> +  tree attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> >> +  if (attr == NULL_TREE)
> >> +return;
> >> +
> >> +  gen_btf_tag_dies (attr, target, context_die);
> >> +
> >> +  /* Strip the decl tag attribute once we have created the annotation 
> >> DIEs
> >> + to avoid attempting process it multiple times.  Global variable
> >> + declarations may reach this function more than once.  */
> >> +  DECL_ATTRIBUTES (t)
> >> += remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> > I do not like modifying trees as part of dwarf2out.  You should be able 
> > to
> > see whether a DIE already has the respective attribute applied?
> 
>  Yes, you're right. For decl_tag the case is simple and better handled by
>  consulting the hash table. Simple fix and this remove_attribute can be
>  deleted.
> 
>  Understood re: modifying trees in dwarf2out. I agree it's not ideal.
> 
>  For this case the remove_attribute can be deleted. For the two below,
>  one is already immediately restored and the other could be as well so
>  that there are no lasting changes in the tree at all.
> 
>  I will explain the reasoning some more below.
> 
> >
> >> +}
> >> +
> >>  /* Given a pointer to an arbitrary ..._TYPE tree node, return a 
> >> debugging
> >> entry that chains the modifiers specified by CV_QUALS in front of 
> >> the
> >> given type.  REVERSE is true if the type is to be interpreted in 
> >> the
> >> @@ -13674,6 +13894,7 @@ modified_type_die (tree type, int cv_quals, 
> >> bool reverse,
> >>tree item_type = NULL;
> >>tree qualified_type;
> >>tree name, low, high;
> >> +  tree tags;
> >>dw_die_ref mod_scope;
> >>struct array_descr_info info;
> >>/* Only these cv-qualifiers are currently handled.  */
> >> @@ -13783,10 +14004,62 @@ modified_type_die (tree type, int cv_quals, 
> >> bool reverse,
> >>   dquals &= cv_qual_mask;
> >>   if ((dquals & ~cv_quals) != TYPE_UNQUALIFIED
> >>   || (cv_quals == dquals && DECL_ORIGINAL_TYPE (name) != 
> >> type))
> >> -   /* cv-unqualified version of named type.  Just use
> >> -  the unnamed type to which it refers.  */
> >> -   return modified_type_die (DECL_ORIGINAL_TYPE (name), 
> >> cv_quals,
> >> - reverse, context_die);
> >> +   {
> >> + tree dtags = lookup_attribute ("btf_type_tag",
> >> +TYPE_ATTRIBUTES (dtype));
> >> + if ((tags = lookup_attribute ("btf_type_tag",
> >> +   TYPE_ATTRIBUTES (type)))
> >> + && !attribute_list_equal (tags, dtags))
> >> +   {
> >> + /* Use of a typedef with additional btf_type_tags.
> >> +Create a new typedef DIE to which we can attach 
> >> the
> >> +additional type_tag DIEs without disturbing other 
> >> users of
> >> +the underlying typedef.  */
> >> + dw_die_ref mod_die = modified_type_die (dtype, 
> >> cv_quals,
> >> + reverse, 
> >> context_die);
> >> + mod_die = clone_die (mod_die);
> >> + add_child_die (comp_unit_die (), mod_die);
> >> + if (!lookup_type_die (type))
> >> +   equate_type_number_to_die (type, mod_die);
> >> +
> >> + /* 'tags' is an accumulated list of type_tag 
> >> attributes
> >> +for the typedef'd type on both sides of the 
> >> typedef.
> >> +'dtags' is the set of type_tag attributes only 
> >> appearing
> >> +in the typedef itself.
> >> +Find the set of type_tags only on the _use_ of the
> >> +typedef, i.e. (tags - dtags).  By construction 
> >> these
> >> +additional type_tags have been chained onto the 
> >> head of
> >> +the attribute list of the o

[PATCH] x86-64: Add RDI clobber to tls_global_dynamic_64 patterns

2025-07-02 Thread H.J. Lu
*tls_global_dynamic_64_ uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.

PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_global_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_): Add RDI
clobber and use it to generate LEA.
(@tls_global_dynamic_64_): Add a clobber.

OK for master?

Thanks.

-- 
H.J.
From fb6b52e78caf70b2e3f9939952bda0604295cfce Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 1 Jul 2025 17:17:06 +0800
Subject: [PATCH] x86-64: Add RDI clobber to tls_global_dynamic_64 patterns

*tls_global_dynamic_64_ uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.

	PR target/120908
	* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
	gen_tls_global_dynamic_64.
	* config/i386/i386.md (*tls_global_dynamic_64_): Add RDI
	clobber and use it to generate LEA.
	(@tls_global_dynamic_64_): Add a clobber.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 3 ++-
 gcc/config/i386/i386.md | 8 +---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 44763c8eb01..9657c6ae31f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -12562,11 +12562,12 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  if (TARGET_64BIT)
 	{
 	  rtx rax = gen_rtx_REG (Pmode, AX_REG);
+	  rtx rdi = gen_rtx_REG (Pmode, DI_REG);
 	  rtx_insn *insns;
 
 	  start_sequence ();
 	  emit_call_insn
-		(gen_tls_global_dynamic_64 (Pmode, rax, x, caddr));
+		(gen_tls_global_dynamic_64 (Pmode, rax, x, caddr, rdi));
 	  insns = end_sequence ();
 
 	  if (GET_MODE (x) != Pmode)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index adff2af4563..370e79bb511 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23201,7 +23201,8 @@ (define_insn "*tls_global_dynamic_64_"
 	 (match_operand 3)))
(unspec:P [(match_operand 1 "tls_symbolic_operand")
 	  (reg:P SP_REG)]
-	 UNSPEC_TLS_GD)]
+	 UNSPEC_TLS_GD)
+   (clobber (match_operand:P 4 "register_operand" "=D"))]
   "TARGET_64BIT"
 {
   if (!TARGET_X32)
@@ -23218,7 +23219,7 @@ (define_insn "*tls_global_dynamic_64_"
Use data16 prefix instead, which doesn't have this problem.  */
 fputs ("\tdata16", asm_out_file);
   output_asm_insn
-("lea{q}\t{%E1@tlsgd(%%rip), %%rdi|rdi, %E1@tlsgd[rip]}", operands);
+("lea{q}\t{%E1@tlsgd(%%rip), %q4|%q4, %E1@tlsgd[rip]}", operands);
   if (TARGET_SUN_TLS || flag_plt || !HAVE_AS_IX86_TLS_GET_ADDR_GOT)
 fputs (ASM_SHORT "0x\n", asm_out_file);
   else
@@ -23265,7 +23266,8 @@ (define_expand "@tls_global_dynamic_64_"
 	   (const_int 0)))
  (unspec:P [(match_operand 1 "tls_symbolic_operand")
 		(reg:P SP_REG)]
-	   UNSPEC_TLS_GD)])]
+	   UNSPEC_TLS_GD)
+ (clobber (match_operand:P 3 "register_operand"))])]
   "TARGET_64BIT"
   "ix86_tls_descriptor_calls_expanded_in_cfun = true;")
 
-- 
2.50.0



Re: [PATCH] testsuite, powerpc, v2: Fix vsx-vectorize-* after alignment peeling [PR118567]

2025-07-02 Thread Segher Boessenkool
On Wed, Jul 02, 2025 at 11:06:38AM +0200, Jakub Jelinek wrote:
> On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> > No tests become good tests without effort.  And tests that are not good
> > tests require constant maintenance!
> 
> Here are two patches, either just the first one or both can be used
> and both were tested on powerpc64le-linux.
> 
> The first one removes all the checking etc. stuff from the testcases,
> as they are just dg-do compile, for the vectorize dump checks all we
> care about are the vectorized loops they want to test.
> 
> The second one adds further 8 tests, which are dg-do run which #include
> the former tests, don't do any dump tests and just define the checking/main
> for those.
> 
> Ok for trunk (both or just the first one)?

Why does it remove the includes?  They just aren't necessary and get in
the way?

Okay for trunk, and also okay wherever you want it backported.  Thanks
for the work!


Segher


[committed] [PR rtl-optimization/120242] Fix SUBREG_PROMOTED_VAR_P after ext-dce's actions

2025-07-02 Thread Jeff Law

[ Whoops, forgot to push the send button... ]

I've gone back and forth of these problems multiple times.  We have two 
passes, ext-dce and combine which eliminate extensions using totally 
different mechanisms.


ext-dce looks for cases where the state of upper bits in an object 
aren't observable and if they aren't observable, then eliminates 
extensions which set those bits.


combine looks for cases where we know the state of the upper bits and 
can prove an extension is just setting those bits to their prior value. 
Combine also looks for cases where the precise extension isn't really 
important, just the knowledge that the upper bits are zero or sign 
extended from a narrower mode  is needed.


Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. 
If the actions of ext-dce (or any other pass for that matter) make 
SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, 
then combine can end up generating incorrect code.


--

When ext-dce eliminates an extension and turns it into a subreg copy 
(without any known SUBREG_PROMOTED_VAR state).  Since we can no longer 
guarantee the destination object has any known extension state, we 
scurry around and wipe SUBREG_PROMOTED_VAR state for the destination object.



That's fine and dandy, but ultimately insufficient.  Consider if the 
destination of the optimized extension was used as a source in a simple 
copy insn.  Furthermore assume that the destination of that copy is used 
within a SUBREG expression with SUBREG_PROMOTED_VAR set.  ext-dce's 
actions have clobbered the SUBREG_PROMOTED_VAR state on the destination 
of that copy, albeit indirectly.


This patch addresses this problem by taking the set of pseudos directly 
impacted by ext-dce's actions and expands that set by building a 
transitive closure for pseudos connected via copies.  We then scurry 
around finding SUBREG_PROMOTED_VAR state to wipe for everything in that 
expanded set of pseudos.  Voila, everything just works.


--

The other approach here would be to further expand the liveness sets 
inside ext-dce.  That's a simpler path forward, but ultimately regresses 
the quality of codes we do care about.


One good piece of news is that with the transitive closure bits in 
place, we can eliminate a bit of the live set expansion we had in place 
for SUBREG_PROMOTED_VAR objects.


--

So let's take one case of the 5 that have been reported.

In ext-dce we have this insn:


(insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
(zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 
{*zero_extendhidi2_bitmanip}
 (expr_list:REG_DEAD (reg:DI 162)
(nil)))



There are reachable uses of (reg 134):


(insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186)
(symbol_ref:DI ("al") [flags 0x86]  )) [2 al+0 S2 A16])
(subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279 {*movhi_internal}
 (expr_list:REG_DEAD (reg/f:DI 186)
(nil)))

Obviously safe if we were to remove the extension.


(insn 52 49 53 6 (set (reg:DI 176)
(and:DI (reg:DI 134 [ al_lsm.9 ])
(const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3}
 (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
(nil)))
(insn 53 52 56 6 (set (reg:SI 177 [ _8 ])
(zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551 
{*zero_extendhisi2_bitmanip}
 (expr_list:REG_DEAD (reg:DI 176)
(nil))) 
Safe to remove the extension as we only read the low 16 bits from the 
destination register (reg 176) in insn 53.



(insn 27 26 29 3 (set (reg:DI 162)
(sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ]) 0)
(const_int 1 [0x1] "j.c":17:17 8 {addsi3_extended}
 (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
(nil)))
(insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
(zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 
{*zero_extendhidi2_bitmanip}
 (expr_list:REG_DEAD (reg:DI 162)
(nil)))


Again, not as obvious as the first case, but we only read the low 16 
bits from (reg 162) in insn 29.  So those upper bits in (reg 134) don't 
matter.



(insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
(reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
 (nil))  



(insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
(sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) 
"j.c":17:9 558 {*extendhidi2_bitmanip}
 (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
(nil)))
Also safe in isolation.  But worth noting that if we remove the 
extension at insn 29, then the promoted status on (reg:DI 144) in insn 
30 is no longer valid.


Setting aside the promoted state of (reg:DI 144) at insn 30 for a 
minute, let's look into combine.



(insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
(reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
 (nil))   

[ ... ]


(insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
(sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) 
"

Re: [PATCH V3] x86: Enable separate shrink wrapping

2025-07-02 Thread Segher Boessenkool
On Wed, Jul 02, 2025 at 01:32:37PM +, Cui, Lili wrote:
> > > +  /* Don't mess with the following registers.  */  if
> > > + (frame_pointer_needed)
> > > +bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> > 
> > What is that about?  Isn't that one of the bigger possible wins?
> 
> Good question!

I know :-)

> Initially, I looked at other architectures and disabled the hard frame 
> pointer,

Like aarch?  Yeah I always wondered why they don't do it.  I decided
that that is because of their ABI and architecture stuff they can save
and restore their frame reg (r29) with the same insn as they use for the
link reg (r30).  Of course they could do code to do tradeoffs there, but
apparently they did no see the use for that, or perhaps from experience
knew what way this would fall in the end.

> but after reconsidering, I realized your point makes sense. If the hard frame 
> pointer were enabled,  we would typically emit push %rbp and mov %rsp, %rbp 
> at the first of prologue,  there is no room for separate shrink wrap, but if 
> the function itself also use rbp, there might be room for optimization,

Yup, when using a frame pointer (hard or otherwise, and a very bad plan
nowadays, a 1970's thing) you typically get the frame pointer
established very first thing, anything that touches the frame needs it
after all!

But not all code accesses the frame, many early-out paths do not for
example.

> I took out these two lines and ran some tests, and everything seems fine. I 
> will do more testing t and try to find a case where the optimization is 
> really made.

For x86 all insns that access the frame explicitly refer to the (hard)
frame pointer register I think?  So yeah, then things should just work
like that :-)

Good luck, have fun, don't do cargo-cult,


Segher


Re: [PATCH v2] libstdc++: Use hidden friends for __normal_iterator operators

2025-07-02 Thread Patrick Palka


On Wed, 11 Jun 2025, Jonathan Wakely wrote:

> As suggested by Jason, this makes all __normal_iterator operators into
> friends so they can be found by ADL and don't need to be separately
> exported in module std.
> 
> The operator<=> comparing two iterators of the same type is removed
> entirely, instead of being made a hidden friend. That overload was added
> by r12-5882-g2c7fb16b5283cf to deal with unconstrained operator
> overloads found by ADL, as defined in the testsuite_greedy_ops.h header.
> We don't actually test that case as there's no unconstrained <=> in that
> header, and it doesn't seem reasonable for anybody to define such an
> operator<=> in C++20 when they should constrain their overloads properly
> (e.g. using a requires-clause). The heterogeneous operator<=> overloads
> added for reverse_iterator and move_iterator could also be removed, but
> that's not part of this commit.
> 
> I also had to reorder the __attribute__((always_inline)) and
> [[nodiscard]] attributes, which have to be in a particular order when
> used on friend functions.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/stl_iterator.h (__normal_iterator): Make all
>   non-member operators hidden friends, except ...
>   (operator<=>(__normal_iterator, __normal_iterator)):

__normal_iterator, __normal_iterator rather (i.e. the
heterogeneous overload)?  LGTM besides that

>   Remove.
>   * src/c++11/string-inst.cc: Remove explicit instantiations of
>   operators that are no longer templates.
>   * src/c++23/std.cc.in (__gnu_cxx): Do not export operators for
>   __normal_iterator.
> ---
> 
> v2: removed the unnecessary operator<=>, removed std.cc exports, fixed
> other minor issues noticed by Patrick.
> 
> Tested x86_64-linux.
> 
>  libstdc++-v3/include/bits/stl_iterator.h | 327 ---
>  libstdc++-v3/src/c++11/string-inst.cc|  11 -
>  libstdc++-v3/src/c++23/std.cc.in |   9 -
>  3 files changed, 169 insertions(+), 178 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> b/libstdc++-v3/include/bits/stl_iterator.h
> index 478a98fe8a4f..a7188f46f6db 100644
> --- a/libstdc++-v3/include/bits/stl_iterator.h
> +++ b/libstdc++-v3/include/bits/stl_iterator.h
> @@ -1164,188 +1164,199 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>const _Iterator&
>base() const _GLIBCXX_NOEXCEPT
>{ return _M_current; }
> -};
>  
> -  // Note: In what follows, the left- and right-hand-side iterators are
> -  // allowed to vary in types (conceptually in cv-qualification) so that
> -  // comparison between cv-qualified and non-cv-qualified iterators be
> -  // valid.  However, the greedy and unfriendly operators in std::rel_ops
> -  // will make overload resolution ambiguous (when in scope) if we don't
> -  // provide overloads whose operands are of the same type.  Can someone
> -  // remind me what generic programming is about? -- Gaby
> +private:
> +  // Note: In what follows, the left- and right-hand-side iterators are
> +  // allowed to vary in types (conceptually in cv-qualification) so that
> +  // comparison between cv-qualified and non-cv-qualified iterators be
> +  // valid.  However, the greedy and unfriendly operators in std::rel_ops
> +  // will make overload resolution ambiguous (when in scope) if we don't
> +  // provide overloads whose operands are of the same type.  Can someone
> +  // remind me what generic programming is about? -- Gaby
>  
>  #ifdef __cpp_lib_three_way_comparison
> -  template
> -[[nodiscard, __gnu__::__always_inline__]]
> -constexpr bool
> -operator==(const __normal_iterator<_IteratorL, _Container>& __lhs,
> -const __normal_iterator<_IteratorR, _Container>& __rhs)
> -noexcept(noexcept(__lhs.base() == __rhs.base()))
> -requires requires {
> -  { __lhs.base() == __rhs.base() } -> std::convertible_to;
> -}
> -{ return __lhs.base() == __rhs.base(); }
> +  template
> + [[nodiscard, __gnu__::__always_inline__]]
> + friend
> + constexpr bool
> + operator==(const __normal_iterator& __lhs,
> +const __normal_iterator<_Iter, _Container>& __rhs)
> + noexcept(noexcept(__lhs.base() == __rhs.base()))
> + requires requires {
> +   { __lhs.base() == __rhs.base() } -> std::convertible_to;
> + }
> + { return __lhs.base() == __rhs.base(); }
>  
> -  template
> -[[nodiscard, __gnu__::__always_inline__]]
> -constexpr std::__detail::__synth3way_t<_IteratorR, _IteratorL>
> -operator<=>(const __normal_iterator<_IteratorL, _Container>& __lhs,
> - const __normal_iterator<_IteratorR, _Container>& __rhs)
> -noexcept(noexcept(std::__detail::__synth3way(__lhs.base(), 
> __rhs.base(
> -{ return std::__detail::__synth3way(__lhs.base(), __rhs.base()); }
> +  [[nodiscard, __gnu__::__always_inline__]]
> +  friend
> +  constexpr bool
> +  operator==(const __normal_iterator& 

Re: [Fortran, Patch, PR120843, v3] Fix reject valid, because of inconformable coranks

2025-07-02 Thread Jerry D

On 7/2/25 3:14 AM, Andre Vehreschild wrote:

Hi all,

I successfully created a big mess with the previous patch. First of all by
applying an outdated one and secondly by adding the conformance checks for
coranks in a3f1cdd8ed46f9816b31ab162ae4dac547d34ebc. Checking the standard even
using AI (haha) to figure if coranks of an expression have restrictions on
them, failed. I found nothing. AI fantasized about restrictions that did not
exist. Therefore the current approach is to remove the conformance check and
just use the computed coranks in expressions to prevent recomputaion whenever
they needed.

Jerry, Harald: Sorry for all the bother and all my mistakes. I am really sorry
to have wasted your time.

The patch has been regtested fine on x86_64-pc-linux-gnu / F41. Ok for mainline
and later backport to gcc-15?

Regards,
Andre


--- snip ---

With this fixer patch, I can successfully compile Toon's test case.

The patch also regression tests here OK.

OK to push.

Jerry


[PATCH] libquadmath: add quad support for trig-pi functions

2025-07-02 Thread Yuao Ma

Hi all,

This patch adds the required function for Fortran trigonometric 
functions to work with glibc versions prior to 2.26. It's based on glibc 
source commit 632d895f3e5d98162f77b9c3c1da4ec19968b671.


I've built it successfully on my end. Documentation is also included.

Please take a look when you have a moment.

Best regards,
YuaoFrom 46ed3a1817e87567a7510eb4ca918589afcc9c3c Mon Sep 17 00:00:00 2001
From: Yuao Ma 
Date: Thu, 3 Jul 2025 00:40:58 +0800
Subject: [PATCH] libquadmath: add quad support for trig-pi functions

This function is required for Fortran trigonometric functions with glibc <2.26.
Use glibc commit 632d895f3e5d98162f77b9c3c1da4ec19968b671.

libquadmath/ChangeLog:

* Makefile.am: Add sources to makefile.
* Makefile.in: Regen makefile.
* libquadmath.texi: Add doc for trig-pi funcs.
* update-quadmath.py: Update generation script.
* math/acospiq.c: New file.
* math/asinpiq.c: New file.
* math/atan2piq.c: New file.
* math/atanpiq.c: New file.
* math/cospiq.c: New file.
* math/sinpiq.c: New file.
* math/tanpiq.c: New file.

Signed-off-by: Yuao Ma 
---
 libquadmath/Makefile.am|  2 ++
 libquadmath/Makefile.in| 26 --
 libquadmath/libquadmath.texi   |  7 
 libquadmath/math/acospiq.c | 33 ++
 libquadmath/math/asinpiq.c | 40 ++
 libquadmath/math/atan2piq.c| 36 
 libquadmath/math/atanpiq.c | 35 +++
 libquadmath/math/cospiq.c  | 37 
 libquadmath/math/sinpiq.c  | 44 
 libquadmath/math/tanpiq.c  | 62 ++
 libquadmath/update-quadmath.py | 48 ++
 11 files changed, 352 insertions(+), 18 deletions(-)
 create mode 100644 libquadmath/math/acospiq.c
 create mode 100644 libquadmath/math/asinpiq.c
 create mode 100644 libquadmath/math/atan2piq.c
 create mode 100644 libquadmath/math/atanpiq.c
 create mode 100644 libquadmath/math/cospiq.c
 create mode 100644 libquadmath/math/sinpiq.c
 create mode 100644 libquadmath/math/tanpiq.c

diff --git a/libquadmath/Makefile.am b/libquadmath/Makefile.am
index 93806106abb..a1b17fd7897 100644
--- a/libquadmath/Makefile.am
+++ b/libquadmath/Makefile.am
@@ -70,6 +70,8 @@ libquadmath_la_SOURCES = \
   math/llrintq.c math/log2q.c math/lrintq.c math/nearbyintq.c math/remquoq.c \
   math/ccoshq.c math/cexpq.c math/clog10q.c math/clogq.c math/csinq.c \
   math/csinhq.c math/csqrtq.c math/ctanq.c math/ctanhq.c \
+  math/acospiq.c math/asinpiq.c math/atanpiq.c math/atan2piq.c \
+  math/cospiq.c math/sinpiq.c math/tanpiq.c \
   printf/addmul_1.c printf/add_n.c printf/cmp.c printf/divrem.c \
   printf/flt1282mpn.c printf/fpioconst.c printf/lshift.c printf/mul_1.c \
   printf/mul_n.c printf/mul.c printf/printf_fphex.c printf/printf_fp.c \
diff --git a/libquadmath/Makefile.in b/libquadmath/Makefile.in
index ff3373064b1..b1d542c 100644
--- a/libquadmath/Makefile.in
+++ b/libquadmath/Makefile.in
@@ -197,9 +197,13 @@ am__dirstamp = $(am__leading_dot)dirstamp
 @BUILD_LIBQUADMATH_TRUE@   math/clog10q.lo math/clogq.lo \
 @BUILD_LIBQUADMATH_TRUE@   math/csinq.lo math/csinhq.lo \
 @BUILD_LIBQUADMATH_TRUE@   math/csqrtq.lo math/ctanq.lo \
-@BUILD_LIBQUADMATH_TRUE@   math/ctanhq.lo printf/addmul_1.lo \
-@BUILD_LIBQUADMATH_TRUE@   printf/add_n.lo printf/cmp.lo \
-@BUILD_LIBQUADMATH_TRUE@   printf/divrem.lo printf/flt1282mpn.lo \
+@BUILD_LIBQUADMATH_TRUE@   math/ctanhq.lo math/acospiq.lo \
+@BUILD_LIBQUADMATH_TRUE@   math/asinpiq.lo math/atanpiq.lo \
+@BUILD_LIBQUADMATH_TRUE@   math/atan2piq.lo math/cospiq.lo \
+@BUILD_LIBQUADMATH_TRUE@   math/sinpiq.lo math/tanpiq.lo \
+@BUILD_LIBQUADMATH_TRUE@   printf/addmul_1.lo printf/add_n.lo \
+@BUILD_LIBQUADMATH_TRUE@   printf/cmp.lo printf/divrem.lo \
+@BUILD_LIBQUADMATH_TRUE@   printf/flt1282mpn.lo \
 @BUILD_LIBQUADMATH_TRUE@   printf/fpioconst.lo printf/lshift.lo \
 @BUILD_LIBQUADMATH_TRUE@   printf/mul_1.lo printf/mul_n.lo \
 @BUILD_LIBQUADMATH_TRUE@   printf/mul.lo printf/printf_fphex.lo \
@@ -495,6 +499,8 @@ AUTOMAKE_OPTIONS = foreign info-in-builddir
 @BUILD_LIBQUADMATH_TRUE@  math/llrintq.c math/log2q.c math/lrintq.c 
math/nearbyintq.c math/remquoq.c \
 @BUILD_LIBQUADMATH_TRUE@  math/ccoshq.c math/cexpq.c math/clog10q.c 
math/clogq.c math/csinq.c \
 @BUILD_LIBQUADMATH_TRUE@  math/csinhq.c math/csqrtq.c math/ctanq.c 
math/ctanhq.c \
+@BUILD_LIBQUADMATH_TRUE@  math/acospiq.c math/asinpiq.c math/atanpiq.c 
math/atan2piq.c \
+@BUILD_LIBQUADMATH_TRUE@  math/cospiq.c math/sinpiq.c math/tanpiq.c \
 @BUILD_LIBQUADMATH_TRUE@  printf/addmul_1.c printf/add_n.c printf/cmp.c 
printf/divrem.c \
 @BUILD_LIBQUADMATH_TRUE@  printf/flt1282mpn.c printf/fpioconst.c 
printf/lshift.c printf/mul_1.c \
 @BUILD_LIBQUADMATH_TRUE@  printf/mul_n.c printf/mul.c printf/printf_fphex.c 
printf/printf_

Re: [PATCH] s390: Add -fno-stack-protector to 3 tests

2025-07-02 Thread Andreas Schwab
On Jul 02 2025, Stefan Schulze Frielinghaus wrote:

> I'm pretty new to tcl and didn't do extensive testing but for my few
> experiments it worked so far.  I guess `string match` uses globbing so
> something like "* -f(no-)?stack-protector* *" doesn't work which is why
> I used two matches.

You can also use lsearch -regexp.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [Fortran, Patch, PR120843, v3] Fix reject valid, because of inconformable coranks

2025-07-02 Thread Andre Vehreschild
Hi all,

I successfully created a big mess with the previous patch. First of all by
applying an outdated one and secondly by adding the conformance checks for
coranks in a3f1cdd8ed46f9816b31ab162ae4dac547d34ebc. Checking the standard even
using AI (haha) to figure if coranks of an expression have restrictions on
them, failed. I found nothing. AI fantasized about restrictions that did not
exist. Therefore the current approach is to remove the conformance check and
just use the computed coranks in expressions to prevent recomputaion whenever
they needed.

Jerry, Harald: Sorry for all the bother and all my mistakes. I am really sorry
to have wasted your time.

The patch has been regtested fine on x86_64-pc-linux-gnu / F41. Ok for mainline
and later backport to gcc-15?

Regards,
Andre

On Tue, 1 Jul 2025 11:17:58 +0200
Andre Vehreschild  wrote:

> Hi Harald,
> 
> thanks for the review. Committed as gcc-16-1885-g1b0930e9046.
> 
> Will backport to gcc-15 in about a week.
> 
> Thanks again.
> 
> Regards,
>   Andre
> 
> On Mon, 30 Jun 2025 22:31:08 +0200
> Harald Anlauf  wrote:
> 
> > Am 30.06.25 um 15:25 schrieb Andre Vehreschild:  
> > > Hi all,
> > > 
> > > here now the version of the patch that seems to be more complete.
> > > 
> > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline and later
> > > backport to gcc-15?
> > 
> > This looks good to me.  OK for both.
> > 
> > Thanks for the patch!
> > 
> > Harald
> >   
> > > Regards,
> > >   Andre
> > > 
> > > On Fri, 27 Jun 2025 15:44:20 +0200
> > > Andre Vehreschild  wrote:
> > > 
> > >> I take this patch back. It seems to be incomplete.
> > >>
> > >> - Andre
> > >>
> > >> On Fri, 27 Jun 2025 14:45:36 +0200
> > >> Andre Vehreschild  wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> this patch fixes a reject valid when the coranks of two operands do not
> > >>> match and no coindex is given. I.e. when only an implicit this_image
> > >>> co-ref is used.
> > >>>
> > >>> Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> > >>>
> > >>> Regards,
> > >>> Andre
> > >>
> > >>
> > > 
> > > 
> >   
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 3ad3d551fb457698b61ed459afa0b58bf8574df8 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 2 Jul 2025 11:06:17 +0200
Subject: [PATCH] Fortran: Remove corank conformability checks [PR120843]

Remove the checks on coranks conformability in expressions,
because there is nothing in the standard about it.  When a coarray
has no coindexes it it treated like a non-coarray, when it has
a full-corank coindex its result is a regular array.  So nothing
to check for corank conformability.

	PR fortran/120843

gcc/fortran/ChangeLog:

	* resolve.cc (resolve_operator): Remove conformability check,
	because it is not in the standard.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/coindexed_6.f90: Enhance test to have
	coarray components covered.
---
 gcc/fortran/resolve.cc| 29 ---
 .../gfortran.dg/coarray/coindexed_6.f90   | 13 +++--
 2 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 50a6fe7fc52..4a6e951cdf1 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -4807,35 +4807,6 @@ resolve_operator (gfc_expr *e)
 	  return false;
 	}
 	}
-
-  /* coranks have to be equal or one has to be zero to be combinable.  */
-  if (op1->corank == op2->corank || (op1->corank != 0 && op2->corank == 0))
-	{
-	  e->corank = op1->corank;
-	  /* Only do this, when regular array has not set a shape yet.  */
-	  if (e->shape == NULL)
-	{
-	  if (op1->corank != 0)
-		{
-		  e->shape = gfc_copy_shape (op1->shape, op1->corank);
-		}
-	}
-	}
-  else if (op1->corank == 0 && op2->corank != 0)
-	{
-	  e->corank = op2->corank;
-	  /* Only do this, when regular array has not set a shape yet.  */
-	  if (e->shape == NULL)
-	e->shape = gfc_copy_shape (op2->shape, op2->corank);
-	}
-  else if ((op1->ref && !gfc_ref_this_image (op1->ref))
-	   || (op2->ref && !gfc_ref_this_image (op2->ref)))
-	{
-	  gfc_error ("Inconsistent coranks for operator at %L and %L",
-		 &op1->where, &op2->where);
-	  return false;
-	}
-
   break;
 
 case INTRINSIC_PARENTHESES:
diff --git a/gcc/testsuite/gfortran.dg/coarray/coindexed_6.f90 b/gcc/testsuite/gfortran.dg/coarray/coindexed_6.f90
index 8f5dcabb859..d566c504134 100644
--- a/gcc/testsuite/gfortran.dg/coarray/coindexed_6.f90
+++ b/gcc/testsuite/gfortran.dg/coarray/coindexed_6.f90
@@ -5,13 +5,20 @@
 program p
   implicit none
 
-  integer, allocatable :: arr(:,:) [:,:]
+  type T
+integer, allocatable :: arr(:,:) [:,:]
+  end type
+
+  type(T) :: o
+  integer, allocatable :: vec(:)[:,:]
   integer :: c[*]
 
   c = 7
 
-  allocate(arr(4,3)[2,*], source=6)
+  allocate(o%arr(4,3)[2,*], source=6)
+  allocate(vec(10)[1,*], source=7)
 
-  if (arr(2,2)* c /= 42) stop 1
+  if (vec(

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 14:45, Mateusz Zych  wrote:
>
> > Oh actually the radix members should be of type int, not bool. I can fix 
> > that.
>
> Yes - thank you very much Jonathan for catching that!
> It was my honest oversight - I am so used to using auto, that I have 
> accidentally copied the wrong type.
> Also, thank you for adding tests - I should have added them myself in the 
> first place.
>
> > Thanks, I don't think there was any reason to omit these members,
> > and I agree we should add them.
>
> Regarding adding missing members,
> I am wondering whether template specializations of std::numeric_limits<> for 
> integer-class types
> should define remaining static data members and static member functions, that 
> is:
> - max_digits10
> - traps
> - is_iec559
> - round_style
>  - has_infinity
> - has_quiet_NaN
> - has_signaling_NaN
> - has_denorm
> - has_denorm_loss
> - min_exponent
> - min_exponent10
> - max_exponent
> - max_exponent10
> - tinyness_before
>  - epsilon()
> - round_error()
> - infinity()
> - quiet_NaN()
> - signaling_NaN()
> - denorm_min()
>
> Here are reading relevant sections of the C++ standard:
>
>   25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]
>
>   (5) For every integer-class type I,
>   let B(I) be a unique hypothetical extended integer type
>   of the same signedness with the same width as I.
>
>   [Note 2: The corresponding
>hypothetical specialization numeric_limits
>meets the requirements on
>numeric_limits specializations for integral types.]
>
>  (11) For every (possibly cv-qualified) integer-class type I,
>   numeric_limits is specialized such that:
>
>   - each static data member m
> has the same value as numeric_limits::m, and
>
>   - each static member function f
> returns I(numeric_limits::f()).
>
> In short, std::numeric_limits<> specializations for integer-class types 
> should be defined
> identically to std::numeric_limits<> specializations for extended integer 
> types,
> and thus define all static data members and static member functions.
> Am I reading this correctly?

Ah yes, if we're missing those ones too then we need to add them.


>
> Thank you, Mateusz Zych
>
> On Wed, Jul 2, 2025 at 1:59 PM Jonathan Wakely  wrote:
>>
>> On 02/07/25 11:52 +0100, Jonathan Wakely wrote:
>> >On Wed, 2 Jul 2025 at 10:50, Jonathan Wakely  wrote:
>> >>
>> >> On 02/07/25 03:36 +0300, Mateusz Zych wrote:
>> >> >Hello libstdc++ Team!
>> >> >
>> >> >I have recently found a bug in libstdc++, that is,
>> >> >the std::numeric_limits<> template specializations for integer-class 
>> >> >types
>> >> >are missing some of static data members,
>> >> >which results in compilation errors of valid C++ code:
>> >> >
>> >> >   - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4
>> >> >
>> >> >Since adding missing member constants, which are the most relevant to
>> >> >integer-like types,
>> >> >was not a lot of code, I have prepared a Git patch with relevant changes.
>> >> >
>> >> >I hope this patch is useful, Mateusz Zych
>> >>
>> >> Thanks, I don't think there was any reason to omit these members, and I
>> >> agree we should add them.
>> >>
>> >> The patch is simple and obvious enough that I don't think we need a
>> >> copyright assignment or DCO sign-off, so I'll push this to the
>> >> relevant branches. Thanks!
>> >
>> >Oh actually the radix members should be of type int, not bool. I can fix 
>> >that.
>>
>> Here's what I'm testing:
>>
>>
>> commit ddd5b88db4fe99166835fe1b94beca451bc1ce30
>> Author: Mateusz Zych 
>> AuthorDate: Tue Jul 1 23:51:40 2025
>> Commit: Jonathan Wakely 
>> CommitDate: Wed Jul 2 11:57:45 2025
>>
>>  libstdc++: Add missing members to numeric_limits specializations for 
>> integer-class types
>>
>>  [iterator.concept.winc]/11 says that std::numeric_limits should be
>>  specialized for integer-class types, with each member defined
>>  appropriately.
>>
>>  libstdc++-v3/ChangeLog:
>>
>>  * include/bits/max_size_type.h 
>> (numeric_limits<__max_size_type>):
>>  New static data members.
>>  (numeric_limits<__max_diff_type>): Likewise.
>>  * testsuite/std/ranges/iota/max_size_type.cc: Check new members.
>>
>>  Co-authored-by: Jonathan Wakely 
>>
>> diff --git a/libstdc++-v3/include/bits/max_size_type.h 
>> b/libstdc++-v3/include/bits/max_size_type.h
>> index 73a6d141d5bc..3ac2b8e6b878 100644
>> --- a/libstdc++-v3/include/bits/max_size_type.h
>> +++ b/libstdc++-v3/include/bits/max_size_type.h
>> @@ -775,6 +775,9 @@ namespace ranges
>> static constexpr bool is_signed = false;
>> static constexpr bool is_integer = true;
>> static constexpr bool is_exact = true;
>> +  static constexpr bool is_bounded = true;
>> +  static constexpr bool is_modulo = true;
>> +  static constexpr int radix = 2;
>> static constexpr int digits
>> = __gnu_cxx::_

Re: [PATCH] c, c++: Fix unused result for empty types [PR82134]

2025-07-02 Thread Patrick Palka
On Mon, 9 Jun 2025, Jeremy Rifkin wrote:

> Hi,
> This fixes PR c/82134 which concerns gcc emitting an incorrect unused
> result diagnostic for empty types. This diagnostic is emitted from
> tree-cfg.cc because of a couple code paths which attempt to avoid
> copying empty types, resulting in GIMPLE that isn't using the returned
> value of a call. To fix this I've added suppress_warning in three locations
> and a corresponding check in do_warn_unused_result.

Thanks for the patch (and the ping)!  Your patch looks fine to me,
though I can't formally approve it myself.  CC'ing Jason

> 
> Cheers,
> Jeremy
> 
> 
> PR c/82134
> 
> gcc/cp/ChangeLog:
> 
> * call.cc (build_call_a): Add suppress_warning
> * cp-gimplify.cc (cp_gimplify_expr): Add suppress_warning
> 
> gcc/ChangeLog:
> 
> * gimplify.cc (gimplify_modify_expr): Add suppress_warning
> * tree-cfg.cc (do_warn_unused_result): Check warning_suppressed_p
> 
> gcc/testsuite/ChangeLog:
> 
> * c-c++-common/attr-warn-unused-result-2.c: New test.
> 
> Signed-off-by: Jeremy Rifkin 
> ---
>  gcc/cp/call.cc|  1 +
>  gcc/cp/cp-gimplify.cc |  1 +
>  gcc/gimplify.cc   |  1 +
>  .../c-c++-common/attr-warn-unused-result-2.c  | 15 +++
>  gcc/tree-cfg.cc   |  2 ++
>  5 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index 2c3ef3dfc35..a70fc13c6a4 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -412,6 +412,7 @@ build_call_a (tree function, int n, tree *argarray)
>/* We're disconnecting the initializer from its target,
>   don't create a temporary.  */
>arg = TARGET_EXPR_INITIAL (arg);
> +suppress_warning (arg, OPT_Wunused_result);
>  tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (arg));
>  arg = build2 (COMPOUND_EXPR, TREE_TYPE (t), arg, t);
>  CALL_EXPR_ARG (function, i) = arg;
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 0fcfa16d2c5..2a21e960994 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -690,6 +690,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
> gimple_seq *post_p)
>  && (REFERENCE_CLASS_P (op1) || DECL_P (op1)))
>op1 = build_fold_addr_expr (op1);
>  
> +suppress_warning (op1, OPT_Wunused_result);
>  gimplify_and_add (op1, pre_p);
>}
>  gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 9f9ff92d064..fa9890e7cea 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -7305,6 +7305,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>&& !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
> && TREE_CODE (*from_p) == CALL_EXPR))
>  {
> +  suppress_warning (*from_p, OPT_Wunused_result);
>gimplify_stmt (from_p, pre_p);
>gimplify_stmt (to_p, pre_p);
>*expr_p = NULL_TREE;
> diff --git a/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c 
> b/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c
> new file mode 100644
> index 000..09be1a933c9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c
> @@ -0,0 +1,15 @@
> +// PR c/82134
> +/* { dg-do compile } */
> +
> +struct S {};
> +
> +__attribute__((warn_unused_result)) struct S foo();
> +
> +void use_s(struct S);
> +
> +void
> +test (void)
> +{
> +  struct S s = foo(); /* { dg-bogus "ignoring return value of" } */
> +  use_s(foo()); /* { dg-bogus "ignoring return value of" } */
> +}
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index fad308e7f7b..5cd72ed1b3e 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -9961,6 +9961,8 @@ do_warn_unused_result (gimple_seq seq)
>  break;
>if (gimple_call_internal_p (g))
>  break;
> +  if (warning_suppressed_p (g, OPT_Wunused_result))
> +break;
>  
>/* This is a naked call, as opposed to a GIMPLE_CALL with an
>   LHS.  All calls whose value is ignored should be
> -- 
> 2.43.0
> 
> 



Re: [PATCH v2] libstdc++: Use hidden friends for __normal_iterator operators

2025-07-02 Thread Patrick Palka
On Wed, 2 Jul 2025, Jonathan Wakely wrote:

> On Wed, 2 Jul 2025 at 15:29, Patrick Palka  wrote:
> >
> >
> > On Wed, 11 Jun 2025, Jonathan Wakely wrote:
> >
> > > As suggested by Jason, this makes all __normal_iterator operators into
> > > friends so they can be found by ADL and don't need to be separately
> > > exported in module std.
> > >
> > > The operator<=> comparing two iterators of the same type is removed
> > > entirely, instead of being made a hidden friend. That overload was added
> > > by r12-5882-g2c7fb16b5283cf to deal with unconstrained operator
> > > overloads found by ADL, as defined in the testsuite_greedy_ops.h header.
> > > We don't actually test that case as there's no unconstrained <=> in that
> > > header, and it doesn't seem reasonable for anybody to define such an
> > > operator<=> in C++20 when they should constrain their overloads properly
> > > (e.g. using a requires-clause). The heterogeneous operator<=> overloads

Do you mean homogenous, not heterogeneous, here?

> > > added for reverse_iterator and move_iterator could also be removed, but
> > > that's not part of this commit.
> > >
> > > I also had to reorder the __attribute__((always_inline)) and
> > > [[nodiscard]] attributes, which have to be in a particular order when
> > > used on friend functions.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   * include/bits/stl_iterator.h (__normal_iterator): Make all
> > >   non-member operators hidden friends, except ...
> > >   (operator<=>(__normal_iterator, __normal_iterator)):
> >
> > __normal_iterator, __normal_iterator rather (i.e. the
> > heterogeneous overload)?  LGTM besides that
> 
> Unless I've really confused myself, the heterogeneous overload is
> retained, and the homogeneous one is removed, so the ChangeLog is
> right (and matches the description above it in the commit msg)

D'oh, sorry about that.  LGTM as-is

> 
> 
> >
> > >   Remove.
> > >   * src/c++11/string-inst.cc: Remove explicit instantiations of
> > >   operators that are no longer templates.
> > >   * src/c++23/std.cc.in (__gnu_cxx): Do not export operators for
> > >   __normal_iterator.
> > > ---
> > >
> > > v2: removed the unnecessary operator<=>, removed std.cc exports, fixed
> > > other minor issues noticed by Patrick.
> > >
> > > Tested x86_64-linux.
> > >
> > >  libstdc++-v3/include/bits/stl_iterator.h | 327 ---
> > >  libstdc++-v3/src/c++11/string-inst.cc|  11 -
> > >  libstdc++-v3/src/c++23/std.cc.in |   9 -
> > >  3 files changed, 169 insertions(+), 178 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> > > b/libstdc++-v3/include/bits/stl_iterator.h
> > > index 478a98fe8a4f..a7188f46f6db 100644
> > > --- a/libstdc++-v3/include/bits/stl_iterator.h
> > > +++ b/libstdc++-v3/include/bits/stl_iterator.h
> > > @@ -1164,188 +1164,199 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >const _Iterator&
> > >base() const _GLIBCXX_NOEXCEPT
> > >{ return _M_current; }
> > > -};
> > >
> > > -  // Note: In what follows, the left- and right-hand-side iterators are
> > > -  // allowed to vary in types (conceptually in cv-qualification) so that
> > > -  // comparison between cv-qualified and non-cv-qualified iterators be
> > > -  // valid.  However, the greedy and unfriendly operators in std::rel_ops
> > > -  // will make overload resolution ambiguous (when in scope) if we don't
> > > -  // provide overloads whose operands are of the same type.  Can someone
> > > -  // remind me what generic programming is about? -- Gaby
> > > +private:
> > > +  // Note: In what follows, the left- and right-hand-side iterators 
> > > are
> > > +  // allowed to vary in types (conceptually in cv-qualification) so 
> > > that
> > > +  // comparison between cv-qualified and non-cv-qualified iterators 
> > > be
> > > +  // valid.  However, the greedy and unfriendly operators in 
> > > std::rel_ops
> > > +  // will make overload resolution ambiguous (when in scope) if we 
> > > don't
> > > +  // provide overloads whose operands are of the same type.  Can 
> > > someone
> > > +  // remind me what generic programming is about? -- Gaby
> > >
> > >  #ifdef __cpp_lib_three_way_comparison
> > > -  template
> > > -[[nodiscard, __gnu__::__always_inline__]]
> > > -constexpr bool
> > > -operator==(const __normal_iterator<_IteratorL, _Container>& __lhs,
> > > -const __normal_iterator<_IteratorR, _Container>& __rhs)
> > > -noexcept(noexcept(__lhs.base() == __rhs.base()))
> > > -requires requires {
> > > -  { __lhs.base() == __rhs.base() } -> std::convertible_to;
> > > -}
> > > -{ return __lhs.base() == __rhs.base(); }
> > > +  template
> > > + [[nodiscard, __gnu__::__always_inline__]]
> > > + friend
> > > + constexpr bool
> > > + operator==(const __normal_iterator& __lhs,
> > > +const __normal_iterator<_Iter, _Container>& __rh

Re: [PATCH v4 2/6] dwarf: create annotation DIEs for btf tags

2025-07-02 Thread David Faust



On 7/2/25 00:35, Richard Biener wrote:
> On Tue, Jul 1, 2025 at 11:20 PM David Faust  wrote:
>>
>>
>>
>> On 7/1/25 01:02, Richard Biener wrote:
>>> On Mon, Jun 30, 2025 at 9:12 PM David Faust  wrote:



 On 6/30/25 06:11, Richard Biener wrote:
>> +static void
>> +gen_btf_decl_tag_dies (tree t, dw_die_ref target, dw_die_ref 
>> context_die)
>> +{
>> +  if (t == NULL_TREE || !DECL_P (t) || !target)
>> +return;
>> +
>> +  tree attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
>> +  if (attr == NULL_TREE)
>> +return;
>> +
>> +  gen_btf_tag_dies (attr, target, context_die);
>> +
>> +  /* Strip the decl tag attribute once we have created the annotation 
>> DIEs
>> + to avoid attempting process it multiple times.  Global variable
>> + declarations may reach this function more than once.  */
>> +  DECL_ATTRIBUTES (t)
>> += remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> I do not like modifying trees as part of dwarf2out.  You should be able to
> see whether a DIE already has the respective attribute applied?

 Yes, you're right. For decl_tag the case is simple and better handled by
 consulting the hash table. Simple fix and this remove_attribute can be
 deleted.

 Understood re: modifying trees in dwarf2out. I agree it's not ideal.

 For this case the remove_attribute can be deleted. For the two below,
 one is already immediately restored and the other could be as well so
 that there are no lasting changes in the tree at all.

 I will explain the reasoning some more below.

>
>> +}
>> +
>>  /* Given a pointer to an arbitrary ..._TYPE tree node, return a 
>> debugging
>> entry that chains the modifiers specified by CV_QUALS in front of the
>> given type.  REVERSE is true if the type is to be interpreted in the
>> @@ -13674,6 +13894,7 @@ modified_type_die (tree type, int cv_quals, bool 
>> reverse,
>>tree item_type = NULL;
>>tree qualified_type;
>>tree name, low, high;
>> +  tree tags;
>>dw_die_ref mod_scope;
>>struct array_descr_info info;
>>/* Only these cv-qualifiers are currently handled.  */
>> @@ -13783,10 +14004,62 @@ modified_type_die (tree type, int cv_quals, 
>> bool reverse,
>>   dquals &= cv_qual_mask;
>>   if ((dquals & ~cv_quals) != TYPE_UNQUALIFIED
>>   || (cv_quals == dquals && DECL_ORIGINAL_TYPE (name) != 
>> type))
>> -   /* cv-unqualified version of named type.  Just use
>> -  the unnamed type to which it refers.  */
>> -   return modified_type_die (DECL_ORIGINAL_TYPE (name), 
>> cv_quals,
>> - reverse, context_die);
>> +   {
>> + tree dtags = lookup_attribute ("btf_type_tag",
>> +TYPE_ATTRIBUTES (dtype));
>> + if ((tags = lookup_attribute ("btf_type_tag",
>> +   TYPE_ATTRIBUTES (type)))
>> + && !attribute_list_equal (tags, dtags))
>> +   {
>> + /* Use of a typedef with additional btf_type_tags.
>> +Create a new typedef DIE to which we can attach the
>> +additional type_tag DIEs without disturbing other 
>> users of
>> +the underlying typedef.  */
>> + dw_die_ref mod_die = modified_type_die (dtype, 
>> cv_quals,
>> + reverse, 
>> context_die);
>> + mod_die = clone_die (mod_die);
>> + add_child_die (comp_unit_die (), mod_die);
>> + if (!lookup_type_die (type))
>> +   equate_type_number_to_die (type, mod_die);
>> +
>> + /* 'tags' is an accumulated list of type_tag attributes
>> +for the typedef'd type on both sides of the typedef.
>> +'dtags' is the set of type_tag attributes only 
>> appearing
>> +in the typedef itself.
>> +Find the set of type_tags only on the _use_ of the
>> +typedef, i.e. (tags - dtags).  By construction these
>> +additional type_tags have been chained onto the 
>> head of
>> +the attribute list of the original typedef.  */
>> + tree t = tags;
>> + bool altered_chain = false;
>> + while (t)
>> +   {
>> + if (TREE_CHAIN (t) == dtags)
>> +   {
>> + TREE_CHAIN (t) = NULL_TREE;
>>

Re: [PATCH v2] libstdc++: Use hidden friends for __normal_iterator operators

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 18:03, Patrick Palka  wrote:
>
> On Wed, 2 Jul 2025, Jonathan Wakely wrote:
>
> > On Wed, 2 Jul 2025 at 15:29, Patrick Palka  wrote:
> > >
> > >
> > > On Wed, 11 Jun 2025, Jonathan Wakely wrote:
> > >
> > > > As suggested by Jason, this makes all __normal_iterator operators into
> > > > friends so they can be found by ADL and don't need to be separately
> > > > exported in module std.
> > > >
> > > > The operator<=> comparing two iterators of the same type is removed
> > > > entirely, instead of being made a hidden friend. That overload was added
> > > > by r12-5882-g2c7fb16b5283cf to deal with unconstrained operator
> > > > overloads found by ADL, as defined in the testsuite_greedy_ops.h header.
> > > > We don't actually test that case as there's no unconstrained <=> in that
> > > > header, and it doesn't seem reasonable for anybody to define such an
> > > > operator<=> in C++20 when they should constrain their overloads properly
> > > > (e.g. using a requires-clause). The heterogeneous operator<=> overloads
>
> Do you mean homogenous, not heterogeneous, here?

Ah yes, so I did confuse myself in at least one place!

I'll change that, thanks.


> > > > added for reverse_iterator and move_iterator could also be removed, but
> > > > that's not part of this commit.
> > > >
> > > > I also had to reorder the __attribute__((always_inline)) and
> > > > [[nodiscard]] attributes, which have to be in a particular order when
> > > > used on friend functions.
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > >   * include/bits/stl_iterator.h (__normal_iterator): Make all
> > > >   non-member operators hidden friends, except ...
> > > >   (operator<=>(__normal_iterator, __normal_iterator)):
> > >
> > > __normal_iterator, __normal_iterator rather (i.e. the
> > > heterogeneous overload)?  LGTM besides that
> >
> > Unless I've really confused myself, the heterogeneous overload is
> > retained, and the homogeneous one is removed, so the ChangeLog is
> > right (and matches the description above it in the commit msg)
>
> D'oh, sorry about that.  LGTM as-is
>
> >
> >
> > >
> > > >   Remove.
> > > >   * src/c++11/string-inst.cc: Remove explicit instantiations of
> > > >   operators that are no longer templates.
> > > >   * src/c++23/std.cc.in (__gnu_cxx): Do not export operators for
> > > >   __normal_iterator.
> > > > ---
> > > >
> > > > v2: removed the unnecessary operator<=>, removed std.cc exports, fixed
> > > > other minor issues noticed by Patrick.
> > > >
> > > > Tested x86_64-linux.
> > > >
> > > >  libstdc++-v3/include/bits/stl_iterator.h | 327 ---
> > > >  libstdc++-v3/src/c++11/string-inst.cc|  11 -
> > > >  libstdc++-v3/src/c++23/std.cc.in |   9 -
> > > >  3 files changed, 169 insertions(+), 178 deletions(-)
> > > >
> > > > diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> > > > b/libstdc++-v3/include/bits/stl_iterator.h
> > > > index 478a98fe8a4f..a7188f46f6db 100644
> > > > --- a/libstdc++-v3/include/bits/stl_iterator.h
> > > > +++ b/libstdc++-v3/include/bits/stl_iterator.h
> > > > @@ -1164,188 +1164,199 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >const _Iterator&
> > > >base() const _GLIBCXX_NOEXCEPT
> > > >{ return _M_current; }
> > > > -};
> > > >
> > > > -  // Note: In what follows, the left- and right-hand-side iterators are
> > > > -  // allowed to vary in types (conceptually in cv-qualification) so 
> > > > that
> > > > -  // comparison between cv-qualified and non-cv-qualified iterators be
> > > > -  // valid.  However, the greedy and unfriendly operators in 
> > > > std::rel_ops
> > > > -  // will make overload resolution ambiguous (when in scope) if we 
> > > > don't
> > > > -  // provide overloads whose operands are of the same type.  Can 
> > > > someone
> > > > -  // remind me what generic programming is about? -- Gaby
> > > > +private:
> > > > +  // Note: In what follows, the left- and right-hand-side 
> > > > iterators are
> > > > +  // allowed to vary in types (conceptually in cv-qualification) 
> > > > so that
> > > > +  // comparison between cv-qualified and non-cv-qualified 
> > > > iterators be
> > > > +  // valid.  However, the greedy and unfriendly operators in 
> > > > std::rel_ops
> > > > +  // will make overload resolution ambiguous (when in scope) if we 
> > > > don't
> > > > +  // provide overloads whose operands are of the same type.  Can 
> > > > someone
> > > > +  // remind me what generic programming is about? -- Gaby
> > > >
> > > >  #ifdef __cpp_lib_three_way_comparison
> > > > -  template > > > _Container>
> > > > -[[nodiscard, __gnu__::__always_inline__]]
> > > > -constexpr bool
> > > > -operator==(const __normal_iterator<_IteratorL, _Container>& __lhs,
> > > > -const __normal_iterator<_IteratorR, _Container>& __rhs)
> > > > -noexcept(noexcept(__lhs.base() == __rhs.base()))
> > > > -requires re

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Mateusz Zych
> Oh actually the radix members should be of type int, not bool. I can fix
that.

Yes - thank you very much Jonathan for catching that!
It was my honest oversight - I am so used to using auto, that I have
accidentally copied the wrong type.
Also, thank you for adding tests - I should have added them myself in the
first place.

> Thanks, I don't think there was any reason to omit these members,
> and I agree we should add them.

Regarding adding missing members,
I am wondering whether template specializations of std::numeric_limits<>
for integer-class types
should define remaining static data members and static member functions,
that is:
- max_digits10
- traps
- is_iec559
- round_style
 - has_infinity
- has_quiet_NaN
- has_signaling_NaN
- has_denorm
- has_denorm_loss
- min_exponent
- min_exponent10
- max_exponent
- max_exponent10
- tinyness_before
 - epsilon()
- round_error()
- infinity()
- quiet_NaN()
- signaling_NaN()
- denorm_min()

Here are reading relevant sections of the C++ standard:

  25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]

  (5) For every integer-class type I,
  let B(I) be a unique hypothetical extended integer type
  of the same signedness with the same width as I.

  [Note 2: The corresponding
   hypothetical specialization numeric_limits
   meets the requirements on
   numeric_limits specializations for integral types.]

 (11) For every (possibly cv-qualified) integer-class type I,
  numeric_limits is specialized such that:

  - each static data member m
has the same value as numeric_limits::m, and

  - each static member function f
returns I(numeric_limits::f()).

In short, std::numeric_limits<> specializations for integer-class types
should be defined
identically to std::numeric_limits<> specializations for extended integer
types,
and thus define all static data members and static member functions.
Am I reading this correctly?

Thank you, Mateusz Zych

On Wed, Jul 2, 2025 at 1:59 PM Jonathan Wakely  wrote:

> On 02/07/25 11:52 +0100, Jonathan Wakely wrote:
> >On Wed, 2 Jul 2025 at 10:50, Jonathan Wakely  wrote:
> >>
> >> On 02/07/25 03:36 +0300, Mateusz Zych wrote:
> >> >Hello libstdc++ Team!
> >> >
> >> >I have recently found a bug in libstdc++, that is,
> >> >the std::numeric_limits<> template specializations for integer-class
> types
> >> >are missing some of static data members,
> >> >which results in compilation errors of valid C++ code:
> >> >
> >> >   - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4
> >> >
> >> >Since adding missing member constants, which are the most relevant to
> >> >integer-like types,
> >> >was not a lot of code, I have prepared a Git patch with relevant
> changes.
> >> >
> >> >I hope this patch is useful, Mateusz Zych
> >>
> >> Thanks, I don't think there was any reason to omit these members, and I
> >> agree we should add them.
> >>
> >> The patch is simple and obvious enough that I don't think we need a
> >> copyright assignment or DCO sign-off, so I'll push this to the
> >> relevant branches. Thanks!
> >
> >Oh actually the radix members should be of type int, not bool. I can fix
> that.
>
> Here's what I'm testing:
>
>
> commit ddd5b88db4fe99166835fe1b94beca451bc1ce30
> Author: Mateusz Zych 
> AuthorDate: Tue Jul 1 23:51:40 2025
> Commit: Jonathan Wakely 
> CommitDate: Wed Jul 2 11:57:45 2025
>
>  libstdc++: Add missing members to numeric_limits specializations for
> integer-class types
>
>  [iterator.concept.winc]/11 says that std::numeric_limits should be
>  specialized for integer-class types, with each member defined
>  appropriately.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/max_size_type.h
> (numeric_limits<__max_size_type>):
>  New static data members.
>  (numeric_limits<__max_diff_type>): Likewise.
>  * testsuite/std/ranges/iota/max_size_type.cc: Check new
> members.
>
>  Co-authored-by: Jonathan Wakely 
>
> diff --git a/libstdc++-v3/include/bits/max_size_type.h
> b/libstdc++-v3/include/bits/max_size_type.h
> index 73a6d141d5bc..3ac2b8e6b878 100644
> --- a/libstdc++-v3/include/bits/max_size_type.h
> +++ b/libstdc++-v3/include/bits/max_size_type.h
> @@ -775,6 +775,9 @@ namespace ranges
> static constexpr bool is_signed = false;
> static constexpr bool is_integer = true;
> static constexpr bool is_exact = true;
> +  static constexpr bool is_bounded = true;
> +  static constexpr bool is_modulo = true;
> +  static constexpr int radix = 2;
> static constexpr int digits
> = __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
> static constexpr int digits10
> @@ -802,6 +805,9 @@ namespace ranges
> static constexpr bool is_signed = true;
> static constexpr bool is_integer = true;
> static constexpr bool is_exact = true;
> +  static constexpr bool is_bounded = true;
> +  static c

Re: [PATCH 1/1] contrib: add vmtest-tool to test BPF programs

2025-07-02 Thread Jose E. Marchesi


> Hi Jose,
> Apologies for the late reply, I haven't been feeling well for the past 
> few days.

No worries!

>> > This patch adds the vmtest-tool subdirectory under contrib which tests
>> > BPF programs under a live kernel using a QEMU VM.  It automatically
>> > builds the specified kernel version with eBPF support enabled
>> > and stores it under "~/.vmtest-tool", which is reused for future
>> > invocations.
>>
>> I wonder, would it be a good idea to have "bpf" as part of the name of
>> the directory.  Something like bpf-vmtest-tool?
>
> I think adding the "bpf" prefix is a good idea. Should we also rename 
> the directory under contrib to include the bpf prefix for consistency?

Yes I would say so.

>> > +To run a BPF source file in the VM:
>> > +
>> > +python main.py --kernel-image 6.15 --bpf-src fail.c
>> > +
>>
>> Wouldn't --kernel-image expect the path to a kernel image?  Typo?
>
> Thanks for catching that. I’ll fix it in the revision.

Thanks.

>> > +DEVELOPMENT
>> > +===
>> > +
>> > +This tool uses `uv` (https://github.com/astral-sh/uv) for virtual 
>> > environment
>> > +and dependency management.
>> > +
>> > +To install development dependencies:
>> > +
>> > +uv sync
>> > +
>> > +To run the test suite:
>> > +
>> > +uv run pytest
>> > +
>> > +A `.pre-commit-config.yaml` is provided to assist with development.
>> > +Pre-commit hooks will auto-generate `requirements-dev.txt` and lint python
>> > +files.
>> > +
>> > +To enable pre-commit hooks:
>> > +
>> > +uv run pre-commit install
>>
>> Having uv installed would only be necessary for vmtest-tool development
>> purposes, right?  Not to run the script.  I see that all the
>> dependencies in vmtest-tool/pyproject.toml are related to testing and
>> python linting.
>
> Yes, uv is only needed for the development setup. The main script only 
> uses the standard library. Since all metadata is in the pyproject.toml, 
> any standard Python package manager (like pip, poetry, etc.) can be 
> used; there's no strict requirement to use uv. Only the pre-commit 
> script depends on uv

I think it would be good to stick to simplicitly whenever possible here.
The less dependencies the better.

>> Is the 3.10 minimum Python version requirement associated with any of
>> these dependencies?  If so, we could maybe relax the minimum version
>> requirement for users of the script?  My Debian-like system has python
>> 3.9.2, for example.
>
> I chose Python 3.10 as the minimum because 3.9 was nearing EOL in a few 
> months. The script only uses the standard library, so it works with 3.9 
> by making the following change, since the | type union isn't supported 
> in 3.9. I'll update the next revision to support 3.9

Thanks.  That will help many users.

> diff --git a/contrib/vmtest-tool/bpf.py b/contrib/vmtest-tool/bpf.py
> index 291e251e64c..91bfc0c5ae1 100644
> --- a/contrib/vmtest-tool/bpf.py
> +++ b/contrib/vmtest-tool/bpf.py
> @@ -3,6 +3,7 @@ import subprocess
>   import logging
>   from pathlib import Path
>   import tempfile
> +from typing import Optional
>   import utils
>   import config
>
> @@ -25,8 +26,8 @@ class BPFProgram:
>
>   def __init__(
>   self,
> -source_path: Path | None = None,
> -bpf_bytecode_path: Path | None = None,
> +source_path: Optional[Path] = None,
> +bpf_bytecode_path: Optional[Path] = None,
>   use_temp_dir: bool = False,
>   ):
>   path = source_path or bpf_bytecode_path
>
>> > +
>> > +def _compile_bpf(self):
>> > +"""Compile the eBPF program using gcc"""
>> > +logger.info(f"Compiling eBPF source: {self.bpf_src}")
>> > +cmd = [
>> > +"bpf-unknown-none-gcc",
>> > +"-g",
>> > +"-O2",
>> > +"-std=gnu17",
>> > +f"-D__TARGET_ARCH_{config.ARCH}",
>> > +"-gbtf",
>> > +"-Wno-error=attributes",
>> > +"-Wno-error=address-of-packed-member",
>> > +"-Wno-compare-distinct-pointer-types",
>> > +*BPF_INCLUDES,
>> > +"-c",
>> > +str(self.bpf_src),
>> > +"-o",
>> > +str(self.bpf_obj),
>> > +]
>>
>> It shall definitely be possible to specify the set of compilation flags
>> using some environment variables like BPF_CFLAGS and BPF_CPPFLAGS, or
>> arguments to the script.  The GCC testsuite will want to do torture-like
>> testing of the bpf programs by compiling and running them using
>> different set of optimization options (-O0, -Os, -O2, other options...)
>> any of which may break verifiability.
>
> Thanks for the suggestion. I'll add support for this in the next 
> revision.

Thanks.

> One can also pass a precompiled BPF object with the desired
> optimization options to the script to check it with the verifier.

Yes, but AFAIK building objects that can be actually loaded in the
kernel and verified requires in practice including kernel headers, which
is what this tool 

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Mateusz Zych
OK, then I’ll prepare appropriate patch with tests and send it when I’m
done implementing it.

Thanks, Mateusz Zych

On Wed, 2 Jul 2025 at 16:59, Jonathan Wakely  wrote:

> On Wed, 2 Jul 2025 at 14:45, Mateusz Zych  wrote:
> >
> > > Oh actually the radix members should be of type int, not bool. I can
> fix that.
> >
> > Yes - thank you very much Jonathan for catching that!
> > It was my honest oversight - I am so used to using auto, that I have
> accidentally copied the wrong type.
> > Also, thank you for adding tests - I should have added them myself in
> the first place.
> >
> > > Thanks, I don't think there was any reason to omit these members,
> > > and I agree we should add them.
> >
> > Regarding adding missing members,
> > I am wondering whether template specializations of std::numeric_limits<>
> for integer-class types
> > should define remaining static data members and static member functions,
> that is:
> > - max_digits10
> > - traps
> > - is_iec559
> > - round_style
> >  - has_infinity
> > - has_quiet_NaN
> > - has_signaling_NaN
> > - has_denorm
> > - has_denorm_loss
> > - min_exponent
> > - min_exponent10
> > - max_exponent
> > - max_exponent10
> > - tinyness_before
> >  - epsilon()
> > - round_error()
> > - infinity()
> > - quiet_NaN()
> > - signaling_NaN()
> > - denorm_min()
> >
> > Here are reading relevant sections of the C++ standard:
> >
> >   25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]
> >
> >   (5) For every integer-class type I,
> >   let B(I) be a unique hypothetical extended integer type
> >   of the same signedness with the same width as I.
> >
> >   [Note 2: The corresponding
> >hypothetical specialization numeric_limits
> >meets the requirements on
> >numeric_limits specializations for integral types.]
> >
> >  (11) For every (possibly cv-qualified) integer-class type I,
> >   numeric_limits is specialized such that:
> >
> >   - each static data member m
> > has the same value as numeric_limits::m, and
> >
> >   - each static member function f
> > returns I(numeric_limits::f()).
> >
> > In short, std::numeric_limits<> specializations for integer-class types
> should be defined
> > identically to std::numeric_limits<> specializations for extended
> integer types,
> > and thus define all static data members and static member functions.
> > Am I reading this correctly?
>
> Ah yes, if we're missing those ones too then we need to add them.
>
>
> >
> > Thank you, Mateusz Zych
> >
> > On Wed, Jul 2, 2025 at 1:59 PM Jonathan Wakely 
> wrote:
> >>
> >> On 02/07/25 11:52 +0100, Jonathan Wakely wrote:
> >> >On Wed, 2 Jul 2025 at 10:50, Jonathan Wakely 
> wrote:
> >> >>
> >> >> On 02/07/25 03:36 +0300, Mateusz Zych wrote:
> >> >> >Hello libstdc++ Team!
> >> >> >
> >> >> >I have recently found a bug in libstdc++, that is,
> >> >> >the std::numeric_limits<> template specializations for
> integer-class types
> >> >> >are missing some of static data members,
> >> >> >which results in compilation errors of valid C++ code:
> >> >> >
> >> >> >   - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4
> >> >> >
> >> >> >Since adding missing member constants, which are the most relevant
> to
> >> >> >integer-like types,
> >> >> >was not a lot of code, I have prepared a Git patch with relevant
> changes.
> >> >> >
> >> >> >I hope this patch is useful, Mateusz Zych
> >> >>
> >> >> Thanks, I don't think there was any reason to omit these members,
> and I
> >> >> agree we should add them.
> >> >>
> >> >> The patch is simple and obvious enough that I don't think we need a
> >> >> copyright assignment or DCO sign-off, so I'll push this to the
> >> >> relevant branches. Thanks!
> >> >
> >> >Oh actually the radix members should be of type int, not bool. I can
> fix that.
> >>
> >> Here's what I'm testing:
> >>
> >>
> >> commit ddd5b88db4fe99166835fe1b94beca451bc1ce30
> >> Author: Mateusz Zych 
> >> AuthorDate: Tue Jul 1 23:51:40 2025
> >> Commit: Jonathan Wakely 
> >> CommitDate: Wed Jul 2 11:57:45 2025
> >>
> >>  libstdc++: Add missing members to numeric_limits specializations
> for integer-class types
> >>
> >>  [iterator.concept.winc]/11 says that std::numeric_limits should be
> >>  specialized for integer-class types, with each member defined
> >>  appropriately.
> >>
> >>  libstdc++-v3/ChangeLog:
> >>
> >>  * include/bits/max_size_type.h
> (numeric_limits<__max_size_type>):
> >>  New static data members.
> >>  (numeric_limits<__max_diff_type>): Likewise.
> >>  * testsuite/std/ranges/iota/max_size_type.cc: Check new
> members.
> >>
> >>  Co-authored-by: Jonathan Wakely 
> >>
> >> diff --git a/libstdc++-v3/include/bits/max_size_type.h
> b/libstdc++-v3/include/bits/max_size_type.h
> >> index 73a6d141d5bc..3ac2b8e6b878 100644
> >> --- a/libstdc++-v3/include/bits/max_size_type.h
> >> +++ b/libstdc++-v3/include/bits/max_size_

RE: [PATCH V3] x86: Enable separate shrink wrapping

2025-07-02 Thread Cui, Lili



> -Original Message-
> From: Segher Boessenkool 
> Sent: Monday, June 30, 2025 6:23 AM
> To: Cui, Lili 
> Cc: ubiz...@gmail.com; gcc-patches@gcc.gnu.org; Liu, Hongtao
> ; richard.guent...@gmail.com; Michael Matz
> 
> Subject: Re: [PATCH V3] x86: Enable separate shrink wrapping
> 
> Hi!
> 
> On Tue, Jun 17, 2025 at 10:03:28PM +0800, Cui, Lili wrote:
> > Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No
> performance regression was observed.
> > For O2 multi-copy :
> > 511.povray_r improved by 2.8% on ZNVER5.
> > 511.povray_r improved by 4.2% on EMR
> 
> No huge improvement, but none was expected anyway, x86 is a target with
> only few registers.
> 
> > Tested against SPEC CPU 2017, this change always has a net-positive
> > effect on the dynamic instruction count.  See the following table for
> > the breakdown on how this reduces the number of dynamic instructions
> > per workload on a like-for-like (with/without this commit):
> >
> > instruction count   basewith commit (commit-base)/commit
> > 502.gcc_r   98666845943 96891561634 -1.80%
> > 526.blender_r   6.21226E+11 6.12992E+11 -1.33%
> > 520.omnetpp_r   1.1241E+11  1.11093E+11 -1.17%
> > 500.perlbench_r 1271558717  1263268350  -0.65%
> > 523.xalancbmk_r 2.20103E+11 2.18836E+11 -0.58%
> > 531.deepsjeng_r 2.73591E+11 2.72114E+11 -0.54%
> > 500.perlbench_r 64195557393 63881512409 -0.49%
> > 541.leela_r 2.99097E+11 2.98245E+11 -0.29%
> > 548.exchange2_r 1.27976E+11 1.27784E+11 -0.15%
> > 527.cam4_r  88981458425 7334679 -0.11%
> > 554.roms_r  2.60072E+11 2.59809E+11 -0.10%
> 
> The spec tests most representative for real-life code are perl and gcc, so 
> those
> are nice results :-)
> 
> This is code size, dynamic or static does not matter much here.  Nice to see 
> it
> improve anyway :-)
> 
> > +  /* Don't mess with the following registers.  */  if
> > + (frame_pointer_needed)
> > +bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> 
> What is that about?  Isn't that one of the bigger possible wins?

Good question!  Initially, I looked at other architectures and disabled the 
hard frame pointer, but after reconsidering, I realized your point makes sense. 
If the hard frame pointer were enabled,  we would typically emit push %rbp and 
mov %rsp, %rbp at the first of prologue,  there is no room for separate shrink 
wrap, but if the function itself also use rbp, there might be room for 
optimization,  I took out these two lines and ran some tests, and everything 
seems fine. I will do more testing t and try to find a case where the 
optimization is really made.

Thanks,
Lili.

> 
> Anyway, nice to see SWS finally used for x86 as well!
> 
> 
> Segher


Re: [PATCH v2] libstdc++: Use hidden friends for __normal_iterator operators

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 15:29, Patrick Palka  wrote:
>
>
> On Wed, 11 Jun 2025, Jonathan Wakely wrote:
>
> > As suggested by Jason, this makes all __normal_iterator operators into
> > friends so they can be found by ADL and don't need to be separately
> > exported in module std.
> >
> > The operator<=> comparing two iterators of the same type is removed
> > entirely, instead of being made a hidden friend. That overload was added
> > by r12-5882-g2c7fb16b5283cf to deal with unconstrained operator
> > overloads found by ADL, as defined in the testsuite_greedy_ops.h header.
> > We don't actually test that case as there's no unconstrained <=> in that
> > header, and it doesn't seem reasonable for anybody to define such an
> > operator<=> in C++20 when they should constrain their overloads properly
> > (e.g. using a requires-clause). The heterogeneous operator<=> overloads
> > added for reverse_iterator and move_iterator could also be removed, but
> > that's not part of this commit.
> >
> > I also had to reorder the __attribute__((always_inline)) and
> > [[nodiscard]] attributes, which have to be in a particular order when
> > used on friend functions.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/stl_iterator.h (__normal_iterator): Make all
> >   non-member operators hidden friends, except ...
> >   (operator<=>(__normal_iterator, __normal_iterator)):
>
> __normal_iterator, __normal_iterator rather (i.e. the
> heterogeneous overload)?  LGTM besides that

Unless I've really confused myself, the heterogeneous overload is
retained, and the homogeneous one is removed, so the ChangeLog is
right (and matches the description above it in the commit msg).


>
> >   Remove.
> >   * src/c++11/string-inst.cc: Remove explicit instantiations of
> >   operators that are no longer templates.
> >   * src/c++23/std.cc.in (__gnu_cxx): Do not export operators for
> >   __normal_iterator.
> > ---
> >
> > v2: removed the unnecessary operator<=>, removed std.cc exports, fixed
> > other minor issues noticed by Patrick.
> >
> > Tested x86_64-linux.
> >
> >  libstdc++-v3/include/bits/stl_iterator.h | 327 ---
> >  libstdc++-v3/src/c++11/string-inst.cc|  11 -
> >  libstdc++-v3/src/c++23/std.cc.in |   9 -
> >  3 files changed, 169 insertions(+), 178 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> > b/libstdc++-v3/include/bits/stl_iterator.h
> > index 478a98fe8a4f..a7188f46f6db 100644
> > --- a/libstdc++-v3/include/bits/stl_iterator.h
> > +++ b/libstdc++-v3/include/bits/stl_iterator.h
> > @@ -1164,188 +1164,199 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >const _Iterator&
> >base() const _GLIBCXX_NOEXCEPT
> >{ return _M_current; }
> > -};
> >
> > -  // Note: In what follows, the left- and right-hand-side iterators are
> > -  // allowed to vary in types (conceptually in cv-qualification) so that
> > -  // comparison between cv-qualified and non-cv-qualified iterators be
> > -  // valid.  However, the greedy and unfriendly operators in std::rel_ops
> > -  // will make overload resolution ambiguous (when in scope) if we don't
> > -  // provide overloads whose operands are of the same type.  Can someone
> > -  // remind me what generic programming is about? -- Gaby
> > +private:
> > +  // Note: In what follows, the left- and right-hand-side iterators are
> > +  // allowed to vary in types (conceptually in cv-qualification) so 
> > that
> > +  // comparison between cv-qualified and non-cv-qualified iterators be
> > +  // valid.  However, the greedy and unfriendly operators in 
> > std::rel_ops
> > +  // will make overload resolution ambiguous (when in scope) if we 
> > don't
> > +  // provide overloads whose operands are of the same type.  Can 
> > someone
> > +  // remind me what generic programming is about? -- Gaby
> >
> >  #ifdef __cpp_lib_three_way_comparison
> > -  template
> > -[[nodiscard, __gnu__::__always_inline__]]
> > -constexpr bool
> > -operator==(const __normal_iterator<_IteratorL, _Container>& __lhs,
> > -const __normal_iterator<_IteratorR, _Container>& __rhs)
> > -noexcept(noexcept(__lhs.base() == __rhs.base()))
> > -requires requires {
> > -  { __lhs.base() == __rhs.base() } -> std::convertible_to;
> > -}
> > -{ return __lhs.base() == __rhs.base(); }
> > +  template
> > + [[nodiscard, __gnu__::__always_inline__]]
> > + friend
> > + constexpr bool
> > + operator==(const __normal_iterator& __lhs,
> > +const __normal_iterator<_Iter, _Container>& __rhs)
> > + noexcept(noexcept(__lhs.base() == __rhs.base()))
> > + requires requires {
> > +   { __lhs.base() == __rhs.base() } -> std::convertible_to;
> > + }
> > + { return __lhs.base() == __rhs.base(); }
> >
> > -  template
> > -[[nodiscard, __gnu__::__always_inline__]]
> > -constexpr std::__detail::__synth3way_t<_IteratorR,

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 17:15, Mateusz Zych wrote:
>
> OK, then I’ll prepare appropriate patch with tests and send it when I’m done 
> implementing it.

That would be great, thanks. I won't push the initial patch, we can
wait for you to prepare the complete fix.

Please note that for a more significant change, we have some legal
prerequisites for contributions, as documented at:
https://gcc.gnu.org/contribute.html#legal

If you want to contribute under the DCO terms, please read
https://gcc.gnu.org/dco.html so that you understand exactly what the
Signed-off-by: trailer means.

Thanks!



Re: [PATCH 1/1] contrib: add vmtest-tool to test BPF programs

2025-07-02 Thread Piyush Raj

Hi Jose,
Apologies for the late reply, I haven't been feeling well for the past 
few days.

> This patch adds the vmtest-tool subdirectory under contrib which tests
> BPF programs under a live kernel using a QEMU VM.  It automatically
> builds the specified kernel version with eBPF support enabled
> and stores it under "~/.vmtest-tool", which is reused for future
> invocations.

I wonder, would it be a good idea to have "bpf" as part of the name of
the directory.  Something like bpf-vmtest-tool?


I think adding the "bpf" prefix is a good idea. Should we also rename 
the directory under contrib to include the bpf prefix for consistency?



> +To run a BPF source file in the VM:
> +
> +python main.py --kernel-image 6.15 --bpf-src fail.c
> +

Wouldn't --kernel-image expect the path to a kernel image?  Typo?


Thanks for catching that. I’ll fix it in the revision.


> +DEVELOPMENT
> +===
> +
> +This tool uses `uv` (https://github.com/astral-sh/uv) for virtual environment
> +and dependency management.
> +
> +To install development dependencies:
> +
> +uv sync
> +
> +To run the test suite:
> +
> +uv run pytest
> +
> +A `.pre-commit-config.yaml` is provided to assist with development.
> +Pre-commit hooks will auto-generate `requirements-dev.txt` and lint python
> +files.
> +
> +To enable pre-commit hooks:
> +
> +uv run pre-commit install

Having uv installed would only be necessary for vmtest-tool development
purposes, right?  Not to run the script.  I see that all the
dependencies in vmtest-tool/pyproject.toml are related to testing and
python linting.


Yes, uv is only needed for the development setup. The main script only 
uses the standard library. Since all metadata is in the pyproject.toml, 
any standard Python package manager (like pip, poetry, etc.) can be 
used; there's no strict requirement to use uv. Only the pre-commit 
script depends on uv



Is the 3.10 minimum Python version requirement associated with any of
these dependencies?  If so, we could maybe relax the minimum version
requirement for users of the script?  My Debian-like system has python
3.9.2, for example.


I chose Python 3.10 as the minimum because 3.9 was nearing EOL in a few 
months. The script only uses the standard library, so it works with 3.9 
by making the following change, since the | type union isn't supported 
in 3.9. I'll update the next revision to support 3.9


diff --git a/contrib/vmtest-tool/bpf.py b/contrib/vmtest-tool/bpf.py
index 291e251e64c..91bfc0c5ae1 100644
--- a/contrib/vmtest-tool/bpf.py
+++ b/contrib/vmtest-tool/bpf.py
@@ -3,6 +3,7 @@ import subprocess
 import logging
 from pathlib import Path
 import tempfile
+from typing import Optional
 import utils
 import config

@@ -25,8 +26,8 @@ class BPFProgram:

 def __init__(
 self,
-source_path: Path | None = None,
-bpf_bytecode_path: Path | None = None,
+source_path: Optional[Path] = None,
+bpf_bytecode_path: Optional[Path] = None,
 use_temp_dir: bool = False,
 ):
 path = source_path or bpf_bytecode_path


> +
> +def _compile_bpf(self):
> +"""Compile the eBPF program using gcc"""
> +logger.info(f"Compiling eBPF source: {self.bpf_src}")
> +cmd = [
> +"bpf-unknown-none-gcc",
> +"-g",
> +"-O2",
> +"-std=gnu17",
> +f"-D__TARGET_ARCH_{config.ARCH}",
> +"-gbtf",
> +"-Wno-error=attributes",
> +"-Wno-error=address-of-packed-member",
> +"-Wno-compare-distinct-pointer-types",
> +*BPF_INCLUDES,
> +"-c",
> +str(self.bpf_src),
> +"-o",
> +str(self.bpf_obj),
> +]

It shall definitely be possible to specify the set of compilation flags
using some environment variables like BPF_CFLAGS and BPF_CPPFLAGS, or
arguments to the script.  The GCC testsuite will want to do torture-like
testing of the bpf programs by compiling and running them using
different set of optimization options (-O0, -Os, -O2, other options...)
any of which may break verifiability.


Thanks for the suggestion. I'll add support for this in the next 
revision. One can also pass a precompiled BPF object with the desired 
optimization options to the script to check it with the verifier



> +
> +// STEP 3: Load the program (this will trigger verifier log 
output)
> +err = {self.name}__load(skel);
> +fprintf(
> + stderr,
> + "--- Verifier log start ---\\n"
> + "%s\\n"
> + "--- Verifier log end ---\\n",
> + log_buf

Eventually, the output of the loader when the verifier rejects a program
ought to be suitable for our dejagnu glue code to interpret it as a pass
or a fail.


Yes, for now this is meant to print to stdout for the user. This will 
change with DejaGNU integration.


Thanks!



Re: [PATCH 1/1] [RFC][AutoFDO] Propagate information to outline copies if not inlined

2025-07-02 Thread Jan Hubicka
> On 02/07/25 07:26, Kugan Vivekanandarajah wrote:
> > 
> > 
> > > 
> > > Given the latest few patches that you have committed, is this patch 
> > > necessary
> > > anymore? I have not fully understood the new logic as I was on holiday 
> > > last
> > > week, but it looks like the propagation is occurring correctly now?
> > > 
> > 
> > I think you are referring to the patch “Avoid some lost AFDO profiles with 
> > LTO” which introduces pass_ipa_auto_profile_offline. I dont think it does 
> > offline functions !afdo_callsite_hot_enough_for_early_inline.  However, it 
> > should be easier now as the early_inline is moved out of auto-profile.
> 
> Hmm, I was referring to the "Fix afdo profiles for functions that
> was not early-inlined" patch which introduces the
> 
> void
> autofdo_source_profile::offline_unrealized_inlines ()
> 
> function. This seems to merge profiles to offline definitions, and it
> is called from the main auto_profile function.

Sorry for the confusion. Indeed those two patches should make sure that
all functions that was not inlined are merged into their offline
versions.  With -flto training the problem is bit more complex than just
tracking down failed early inlining, since the function may be inlined
cross module and also the offline copies of functions may have functions
inlined to them.

So there are two passes now, the offline pass (run before early opts)
that takes care of reading the auto-profile, offlining all cross-module
inlines and also removes unnecesary parts of the profile (to save
memory).  This makes sure that the profile considered by afdo inlining
during early opts will not ignore functions inlined cross-module during
train run.

After all of early inlining afdo pass offlines remaining function
(in offline_unrealized_inlines).

Offlining is now recursive and merges also profiles of functions that
are inlined to functions being offlined.

Honza


Re: [PATCH] x86-64: Add RDI clobber to tls_global_dynamic_64 patterns

2025-07-02 Thread Uros Bizjak
On Wed, Jul 2, 2025 at 2:43 PM H.J. Lu  wrote:
>
> *tls_global_dynamic_64_ uses RDI as the __tls_get_addr argument.
> Add RDI clobber to tls_global_dynamic_64 patterns to show it.
>
> PR target/120908
> * config/i386/i386.cc (legitimize_tls_address): Pass RDI to
> gen_tls_global_dynamic_64.
> * config/i386/i386.md (*tls_global_dynamic_64_): Add RDI
> clobber and use it to generate LEA.
> (@tls_global_dynamic_64_): Add a clobber.
>
> OK for master?

OK.

Thanks,
Uros.


Re: [Fortran, Patch, PR120843, v3] Fix reject valid, because of inconformable coranks

2025-07-02 Thread Steve Kargl
On Wed, Jul 02, 2025 at 04:36:38AM -0700, Damian Rouson wrote:
> git branch
> gir checkout
> git add
> git commit
> git rebase
> git push
> 
> It’s time to move beyond emailing patches!  (Please.)

I don't use git other than 'git clone', 'git reset --hard',
and 'git diff'.  If gfortran development goes this route,
I am done.

--
steve


Re: [PATCH] libstdc++: construct bitset from string_view (P2697) [PR119742]

2025-07-02 Thread Jonathan Wakely
On Tue, 1 Jul 2025 at 23:36, Nathan Myers  wrote:
>
> Add a bitset constructor from string_view, with other arguments
> matching the constructor from string. Test in ways that exercise
> code paths not checked in existing tests for other constructors.
> Fix existing tests that would fail to detect incorrect exception
> behavior.
>
> libstdc++-v3/ChangeLog:
> PR libstdc++/119742
> * include/bits/version.def: new preprocessor symbol
> * include/bits/version.h: new preprocessor symbol
> * include/std/bitset: new constructor
> * testsuite/20_util/bitset/cons/1.cc: fix
> * testsuite/20_util/bitset/cons/6282.cc: fix
> * testsuite/20_util/bitset/cons/string_view.cc: new tests
> * testsuite/20_util/bitset/cons/string_view_wide.cc: new tests

Full sentences in these ChangeLog lines please, as with the explicit
ctors patch.

> ---
>  libstdc++-v3/include/bits/version.def |   8 ++
>  libstdc++-v3/include/bits/version.h   |  10 ++
>  libstdc++-v3/include/std/bitset   |  49 +++
>  .../testsuite/20_util/bitset/cons/1.cc|   1 +
>  .../testsuite/20_util/bitset/cons/6282.cc |   5 +-
>  .../20_util/bitset/cons/string_view.cc| 132 ++
>  .../20_util/bitset/cons/string_view_wide.cc   |   8 ++
>  7 files changed, 210 insertions(+), 3 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/20_util/bitset/cons/string_view.cc
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/bitset/cons/string_view_wide.cc
>
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index 880586e9126..d82585e1238 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -2012,6 +2012,14 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = bitset  // ...construct from string_view

This comment is useful, thanks.

> +  values = {
> +v = 202306;
> +cxxmin = 26;

I wondered whether we wanted to use hosted=yes here, because
 isn't required to be freestanding, but we do define it
unconditionally (and it seems useful to have in a freestanding env).
So not using hosted=yes here is correct.

> +  };
> +};
> +
>  // Standard test specifications.
>  stds[97] = ">= 199711L";
>  stds[03] = ">= 199711L";
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index 4300adb2276..c0d3cbe623f 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -2253,4 +2253,14 @@
>  #endif /* !defined(__cpp_lib_sstream_from_string_view) && 
> defined(__glibcxx_want_sstream_from_string_view) */
>  #undef __glibcxx_want_sstream_from_string_view
>
> +#if !defined(__cpp_lib_bitset)
> +# if (__cplusplus >  202302L)
> +#  define __glibcxx_bitset 202306L
> +#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_bitset)
> +#   define __cpp_lib_bitset 202306L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_bitset) && defined(__glibcxx_want_bitset) */
> +#undef __glibcxx_bitset
> +
>  #undef __glibcxx_want_all
> diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
> index 8b5d270c2a9..f4947902384 100644
> --- a/libstdc++-v3/include/std/bitset
> +++ b/libstdc++-v3/include/std/bitset
> @@ -61,8 +61,13 @@
>  #endif
>
>  #define __glibcxx_want_constexpr_bitset
> +#define __glibcxx_want_bitset  // ...construct from string_view
>  #include 
>
> +#ifdef __cpp_lib_bitset // ...construct from string_view
> +# include 
> +#endif
> +
>  #define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_LONG__)
>  #define _GLIBCXX_BITSET_WORDS(__n) \
>((__n) / _GLIBCXX_BITSET_BITS_PER_WORD + \
> @@ -831,6 +836,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>}
>  #endif // HOSTED
>
> +#ifdef __cpp_lib_bitset
> +  template

I was going to say that this should use 'typename' not 'class' but I
see that  is consistently "wrong" and uses 'class' everywhere.
Let's not change that now.

> +  constexpr void
> +  _M_check_initial_position(
> +   std::basic_string_view<_CharT, _Traits> __s,
> +   typename std::basic_string_view<_CharT, _Traits>::size_type __position

We don't need the std:: qualification on these names. That's also
consistently wrong in this file though. I don't mind if you change it
now for the new constructors, or keep the std:: here and we can fix it
throughout the file in a later commit.

> +  ) const
> +  {
> +# if _GLIBCXX_HOSTED

We define __throw_out_of_range_fmt for non-hosted, it just calls
terminate() (see include/bits/functexcept.h) so I think we can do this
unconditionally. Otherwise, we wouldn't have these range checks at all
for non-hosted. (Looks as though we should remove a similar HOSTED
check in the bitset(const charT*, ...) constructor).


> +   if (__position > __s.size())
> + __throw_out_of_range_fmt(__N("bitset::bitset: __position "
> +  "(which i

Re: [PATCH] [RISC-V] Fix shift type for RVV interleaved stepped patterns [PR120356]

2025-07-02 Thread Jeff Law




On 7/2/25 2:16 AM, Robin Dapp wrote:

CI-testing was failed:
https://github.com/ewlu/gcc-precommit-ci/ 
issues/3585#issuecomment-3022157670
for sat_u_add-5-u32.c and vect-reduc-sad-1.c. These failures are 
compile issues
appeared due to afdo-crossmodule-1b.c file. For some reason, in both 
cases

the following snippets are being inserted into the compile lines:
  /home/ewlu/.../testsuite/gcc.target/riscv/sat/afdo-crossmodule-1b.c 
-dumpbase ""

  for sat_u_add-5-u32.c
  /home/ewlu/.../testsuite/gcc.dg/vect/afdo-crossmodule-1b.c -dumpbase ""
  for vect-reduc-sad-1.c
... which causes the failures.

I've tried to reproduce it locally, but both test-cases are passing
in 100% of runs.


IMHO you've done the necessary testing.  It could well be a glitch and 
the change is obvious enough (as the it was clearly a copy-and-paste 
mistake in the code).  So I'd say let's go ahead and fix fallout if any.

Agreed and pushed to the trunk on Alexey's behalf.

Jeff


Re: [PATCH v7 8/9] AArch64: rules for CMPBR instructions

2025-07-02 Thread Karl Meakin



On 01/07/2025 11:02, Richard Sandiford wrote:

Karl Meakin  writes:

@@ -763,6 +784,68 @@ (define_expand "cbranchcc4"
""
  )
  
+;; Emit a `CB (register)` or `CB (immediate)` instruction.

+;; The immediate range depends on the comparison code.
+;; Comparisons against immediates outside this range fall back to
+;; CMP + B.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (INT_CMP
+(match_operand:GPI 0 "register_operand" "r")
+(match_operand:GPI 1 "nonmemory_operand"
+   "r"))
+  (label_ref (match_operand 2))
+  (pc)))]
+  "TARGET_CMPBR && aarch64_cb_rhs (, operands[1])"
+  {
+if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+  return aarch64_gen_far_branch (operands, 2, "L",
+ "cb\\t%0, %1, 
");
+else
+  return "cb\\t%0, %1, %l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; Emit a `CBB (register)` or `CBH (register)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (INT_CMP
+(match_operand:SHORT 0 "register_operand" "r")
+(match_operand:SHORT 1 "aarch64_reg_or_zero" "rZ"))
+  (label_ref (match_operand 2))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb\\t%0, %1, %l2"

This instruction also needs to handle far branches, in a similar way
to the GPI one.  (It would be good to have a test for that too).

Why does the code for u32_x0_uge_64 etc. not change?  I would have
expected 64 to be in range for that, whether it's treated as >= 64
or as > 63.
Because the comparison is normalized to `x0 <= 63` which does not fit in 
the range for `CBLS` (-1 to 62)


Re: [PATCH] AArch64: Use correct cost for shifted halfword load/stores

2025-07-02 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero
> for these.
>
> Passes regress, OK for commit?
>
> gcc:
> * config/aarch64/tuning_models/generic_armv9_a.h
> (generic_armv9_a_addrcost_table): Use zero cost for himode.

OK if there are no objections in 24 hours.

I suppose at some point we should also update the cortex-a510 and cortex-a520
entries in aarch64-cores.def to use a more up-to-date tuning structure.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> index 
> f76a2506f3841a05a89285e50ad96c8665732955..9eb1a20d3c4e5d5a3b6c8bf8a2c341a82ac8f7da
>  100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> @@ -26,7 +26,7 @@
>  static const struct cpu_addrcost_table generic_armv9_a_addrcost_table =
>  {
>  {
> -  1, /* hi  */
> +  0, /* hi  */
>0, /* si  */
>0, /* di  */
>1, /* ti  */


Re: [PATCH v2] vect: Misalign checks for gather/scatter.

2025-07-02 Thread Robin Dapp

I'm not sure?  I'd prefer some refactoring to make this more obvious
(and the split between the two functions doesn't help ...).

If you're sure it's all covered then ignore this comment, I can do
the refactoring as followup.  It just wasn't obvious to me.


Ah, I think I misread your original comment slightly, thinking what you 
suggested was part of your planned refactoring and not something the patch 
could still do.


I'll split everything into maybe three to four patches and include the pre 
transform for a v3.


--
Regards
Robin



Re: [PATCH] [RISC-V] Fix shift type for RVV interleaved stepped patterns [PR120356]

2025-07-02 Thread Robin Dapp

CI-testing was failed:
https://github.com/ewlu/gcc-precommit-ci/issues/3585#issuecomment-3022157670
for sat_u_add-5-u32.c and vect-reduc-sad-1.c. These failures are compile issues
appeared due to afdo-crossmodule-1b.c file. For some reason, in both cases
the following snippets are being inserted into the compile lines:
  /home/ewlu/.../testsuite/gcc.target/riscv/sat/afdo-crossmodule-1b.c -dumpbase 
""
  for sat_u_add-5-u32.c
  /home/ewlu/.../testsuite/gcc.dg/vect/afdo-crossmodule-1b.c -dumpbase ""
  for vect-reduc-sad-1.c
... which causes the failures.

I've tried to reproduce it locally, but both test-cases are passing
in 100% of runs.


IMHO you've done the necessary testing.  It could well be a glitch and the 
change is obvious enough (as the it was clearly a copy-and-paste mistake in the 
code).  So I'd say let's go ahead and fix fallout if any.


--
Regards
Robin



[PING][PATCH] Add string_slice class.

2025-07-02 Thread Alfie Richards

Ping for this patch.

Thanks,
Alfie

On 20/06/2025 14:23, Alfie Richards wrote:

Thanks for the pointer Joseph.

This update adds tests to gcc/testsuite/g++.dg/warn/Wformat-gcc_diag-1.C
as this seems to be where similar tests are done (eg, %D for tree).

I couldn't find any tests for the actual output of string slice debug
statements for the other format specifiers so haven't included any. The
formatting is tested indirectly via the later diagnostic tests.

Thanks,
Alfie

-- >8 --

The string_slice inherits from array_slice and is used to refer to a
substring of an array that is memory managed elsewhere without modifying
the underlying array.

For example, this is useful in cases such as when needing to refer to a
substring of an attribute in the syntax tree.

Adds some minimal helper functions for string_slice,
such as a strtok alternative, equality operators, strcmp, and a function
to strip whitespace from the beginning and end of a string_slice.

gcc/c-family/ChangeLog:

* c-format.cc (local_string_slice_node): New node type.
(asm_fprintf_char_table): New entry.
(init_dynamic_diag_info): Add support for string_slice.
* c-format.h (T_STRING_SLICE): New node type.

gcc/ChangeLog:

* pretty-print.cc (format_phase_2): Add support for string_slice.
* vec.cc (string_slice::tokenize): New static method.
(string_slice::strcmp): New static method.
(string_slice::strip): New method.
(test_string_slice_initializers): New test.
(test_string_slice_tokenize): Ditto.
(test_string_slice_strcmp): Ditto.
(test_string_slice_equality): Ditto.
(test_string_slice_inequality): Ditto.
(test_string_slice_invalid): Ditto.
(test_string_slice_strip): Ditto.
(vec_cc_tests): Add new tests.
* vec.h (class string_slice): New class.

gcc/testsuite/ChangeLog

* g++.dg/warn/Wformat-gcc_diag-1.C: Add string_slice "%B" format tests.
---
  gcc/c-family/c-format.cc  |   7 +
  gcc/c-family/c-format.h   |   1 +
  gcc/pretty-print.cc   |  10 +
  .../g++.dg/warn/Wformat-gcc_diag-1.C  |  21 +-
  gcc/vec.cc| 228 ++
  gcc/vec.h |  46 
  6 files changed, 309 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index a44249a0222..80430e9a8f7 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -70,6 +70,7 @@ static GTY(()) tree local_event_ptr_node;
  static GTY(()) tree local_pp_element_ptr_node;
  static GTY(()) tree local_gimple_ptr_node;
  static GTY(()) tree local_cgraph_node_ptr_node;
+static GTY(()) tree local_string_slice_node;
  static GTY(()) tree locus;
  
  static bool decode_format_attr (const_tree, tree, tree, function_format_info *,

@@ -770,6 +771,7 @@ static const format_char_info asm_fprintf_char_table[] =
{ "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "c",  NULL }, \
{ "r",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","//cR",   NULL }, \
{ "@",   1, STD_C89, { T_EVENT_PTR,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN  }, "", "\"",   NULL }, \
+  { "B",   1, STD_C89, { T_STRING_SLICE,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN  }, "q", "",   NULL }, \
{ "e",   1, STD_C89, { T_PP_ELEMENT_PTR,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN  }, "", "\"", NULL }, \
{ "<",   0, STD_C89, NOARGUMENTS, "",  "<",   NULL }, \
{ ">",   0, STD_C89, NOARGUMENTS, "",  ">",   NULL }, \
@@ -5211,6 +5213,11 @@ init_dynamic_diag_info (void)
|| local_cgraph_node_ptr_node == void_type_node)
  local_cgraph_node_ptr_node = get_named_type ("cgraph_node");
  
+  /* Similar to the above but for string_slice*.  */

+  if (!local_string_slice_node
+  || local_string_slice_node == void_type_node)
+local_string_slice_node = get_named_type ("string_slice");
+
/* Similar to the above but for diagnostic_event_id_t*.  */
if (!local_event_ptr_node
|| local_event_ptr_node == void_type_node)
diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 323338cb8e7..d44d3862d83 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -317,6 +317,7 @@ struct format_kind_info
  #define T89_G   { STD_C89, NULL, &local_gimple_ptr_node }
  #define T_CGRAPH_NODE   { STD_C89, NULL, &local_cgraph_node_ptr_node }
  #define T_EVENT_PTR{ STD_C89, NULL, &local_event_ptr_node }
+#define T_STRING_SLICE{ STD_C89, NULL, &local_string_slice_node }
  #define T_PP_ELEMENT_PTR{ STD_C89, NULL, &local_pp_element_ptr_node }
  #define T89_T   { STD_C89, NULL, &local_tre

[PATCH] testsuite, powerpc, v2: Fix vsx-vectorize-* after alignment peeling [PR118567]

2025-07-02 Thread Jakub Jelinek
On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort.  And tests that are not good
> tests require constant maintenance!

Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.

The first one removes all the checking etc. stuff from the testcases,
as they are just dg-do compile, for the vectorize dump checks all we
care about are the vectorized loops they want to test.

The second one adds further 8 tests, which are dg-do run which #include
the former tests, don't do any dump tests and just define the checking/main
for those.

Ok for trunk (both or just the first one)?

Jakub
2025-07-02  Jakub Jelinek  

PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-1.c: Remove includes, checking
part of main1 and main.
* gcc.target/powerpc/vsx-vectorize-2.c: Remove includes, replace
bar definition with declaration, remove main.
* gcc.target/powerpc/vsx-vectorize-3.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-4.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-5.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-6.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-7.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-8.c: Likewise.

--- gcc/testsuite/gcc.target/powerpc/vsx-vectorize-1.c.jj   2024-06-03 
20:17:52.287099944 +0200
+++ gcc/testsuite/gcc.target/powerpc/vsx-vectorize-1.c  2025-07-02 
10:42:31.855099110 +0200
@@ -4,8 +4,6 @@
 /* { dg-require-effective-target powerpc_vsx } */
 
 /* Taken from vect/vect-align-1.c.  */
-#include 
-#include 
 
 /* Compile time known misalignment. Cannot use loop peeling to align
the store.  */
@@ -28,23 +26,6 @@ main1 (struct foo * __restrict__ p)
 {
   p->y[i] = x[i];
 }
-
-  /* check results:  */
-  for (i = 0; i < N; i++)
-{
-  if (p->y[i] != x[i])
-   abort ();
-}
-  return 0;
-}
-
-
-int main (void)
-{
-  int i;
-  struct foo *p = malloc (2*sizeof (struct foo));
-  
-  main1 (p);
   return 0;
 }
 
--- gcc/testsuite/gcc.target/powerpc/vsx-vectorize-2.c.jj   2024-06-03 
20:17:52.287099944 +0200
+++ gcc/testsuite/gcc.target/powerpc/vsx-vectorize-2.c  2025-07-02 
10:43:12.688576248 +0200
@@ -4,28 +4,10 @@
 /* { dg-require-effective-target powerpc_vsx } */
 
 /* Taken from vect/vect-95.c.  */
-#include 
-#include 
 
 #define N 256
 
-__attribute__ ((noinline))
-void bar (float *pd, float *pa, float *pb, float *pc) 
-{
-  int i;
-
-  /* check results:  */
-  for (i = 0; i < N; i++)
-{
-  if (pa[i] != (pb[i] * pc[i]))
-   abort ();
-  if (pd[i] != 5.0)
-   abort ();
-}
-
-  return;
-}
-
+void bar (float *pd, float *pa, float *pb, float *pc);
 
 __attribute__ ((noinline)) int
 main1 (int n, float * __restrict__ pd, float * __restrict__ pa, float * 
__restrict__ pb, float * __restrict__ pc)
@@ -42,20 +24,6 @@ main1 (int n, float * __restrict__ pd, f
 
   return 0;
 }
-
-int main (void)
-{
-  int i;
-  float a[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
-  float d[N+1] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
-  float b[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57};
-  float c[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19};
-
-  main1 (N,&d[1],a,b,c);
-  main1 (N-2,&d[1],a,b,c);
-
-  return 0;
-}
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 0 "vect" {xfail { {! vect_hw_misalign } || powerpc*-*-* } } } } */
--- gcc/testsuite/gcc.target/powerpc/vsx-vectorize-3.c.jj   2024-06-03 
20:17:52.287099944 +0200
+++ gcc/testsuite/gcc.target/powerpc/vsx-vectorize-3.c  2025-07-02 
10:43:32.358324384 +0200
@@ -4,26 +4,10 @@
 /* { dg-require-effective-target powerpc_vsx } */
 
 /* Taken from vect/vect-95.c.  */
-#include 
-#include 
 
 #define N 256
 
-__attribute__ ((noinline))
-void bar (short *pa, short *pb, short *pc) 
-{
-  int i;
-
-  /* check results:  */
-  for (i = 0; i < N; i++)
-{
-  if (pa[i] != (pb[i] * pc[i]))
-   abort ();
-}
-
-  return;
-}
-
+void bar (short *pa, short *pb, short *pc);
 
 __attribute__ ((noinline)) int
 main1 (int n, short * __restrict__ pa, short * __restrict__ pb, short * 
__restrict__ pc)
@@ -39,19 +23,6 @@ main1 (int n, short * __restrict__ pa, s
 
   return 0;
 }
-
-int main (void)
-{
-  int i;
-  short a[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
-  short b[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57};
-  short c[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19};
-
-  main1 (N,a,b,c);
-  main1 (N-2,a,b,c);
-
-  return 0;
-}
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" } } */
--- gcc/testsuite/gcc.target/powerpc/vsx-vectorize-4.c.jj   2024-06-03 
20

Re: [PATCH v2] libstdc++: Lift locale initialization in main chrono format loop [PR110739]

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 10:33, Tomasz Kaminski  wrote:
>
>
>
> On Wed, Jul 2, 2025 at 11:27 AM Jonathan Wakely  wrote:
>>
>> On 01/07/25 16:54 +0200, Tomasz Kamiński wrote:
>> >This patch lifts locale initialization from locale-specific handling methods
>> >into _M_format_to function, and pass the locale by const reference.
>> >To avoid unnecessary computation of locale::classic(), we use 
>> >_Optional_locale,
>> >and emplace into it only for localized formatting (_M_spec._M_localized) or 
>> >if
>> >chrono-spec contains locale specific specifiers 
>> >(_M_spec._M_locale_specific).
>> >The later is constructs locale::classic() in more cases that strictly 
>> >necessary,
>> >as only subset of locale specific specifiers (%a, %A, %b, %B, %c, %p, %r) 
>> >needs
>> >locale, while _M_locale_specific is also set for %x,%X and when O/E 
>> >modifiers are
>> >used. However, none of default outputs are affects, so I believe this is
>> >acceptable.
>> >
>> >In _M_S we no longer guard querying of numpuct facet, with check that 
>> >requires
>> >potentially equally expensive construction of locale::classic. We also mark
>> >localized path as unlikely.
>> >
>> >The _M_locale method is no longer used in __formatter_chrono, and thus was
>> >moved to __formatter_duration.
>> >
>> >libstdc++-v3/ChangeLog:
>> >
>> >   * include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
>> >   Compute locale and pass it to specifiers method.
>> >   (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
>> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_p)
>> >   (__formatter_chrono::_M_r): Accept locale instead of format context.
>> >   (__formatter_chrono::_M_subsecs): Call __ctx.locale() directly,
>> >   instead of _M_locale and do not compare with locale::classic().
>> >   Add [[unlikely]] attributes.
>> >   (__formatter_chrono::_M_locale): Move to __formatter_duration.
>> >   (__formatter_duration::_M_locale): Moved from __formatter_chrono.
>> >---
>> >v2 updates the commit message text only in hope to make it more readable.
>> >
>> > libstdc++-v3/include/bits/chrono_io.h | 71 ---
>> > 1 file changed, 43 insertions(+), 28 deletions(-)
>> >
>> >diff --git a/libstdc++-v3/include/bits/chrono_io.h 
>> >b/libstdc++-v3/include/bits/chrono_io.h
>> >index bcf9830fb9e..a25cb9ada01 100644
>> >--- a/libstdc++-v3/include/bits/chrono_io.h
>> >+++ b/libstdc++-v3/include/bits/chrono_io.h
>> >@@ -964,10 +964,16 @@ namespace __format
>> >   return std::move(__out);
>> > };
>> >
>> >+_Optional_locale __loc;
>> We could add:
>>
>>bool __loc_is_classic = false;
>>
>> >+if (_M_spec._M_localized)
>> >+  __loc = __fc.locale();
>>
>> and set it to __loc == locale::classic() here,
>>
>> >+else if (_M_spec._M_locale_specific)
>> >+  __loc = locale::classic();
>>
>> and set it to true here.
>>
>> >+
>> > struct tm __tm{};
>> > bool __use_locale_fmt = false;
>> > if (_M_spec._M_localized && _M_spec._M_locale_specific)
>> >-  if (__fc.locale() != locale::classic())
>> >+  if (__loc.value() != locale::classic())
>>
>> Then we could just test __loc_is_classic here. That would avoid a
>> second call to classic() for the case where we explicitly set __loc to
>> classic().
>
> I thought about it, but I think that the number of calls to locale::classic() 
> remains the same,
> and we call it only once in both implementations.
> My reasoning, the above is performed only if _M_spec._M_localized is true, in 
> that
> case we initialize __loc with __fc.locale(), and would perform that check 
> anyway.
> So I do not think extracting bool is worth the complexity.
>
> Or in other words, we never compare __loc with locale::classic() in case we 
> initialize
> it with it.

Makes sense, thanks.

OK fo trunk with the redundant __formatter_chrono::_M_locale removed.


>>
>>
>> > {
>> >   __use_locale_fmt = true;
>> >
>> >@@ -1004,7 +1010,7 @@ namespace __format
>> >   {
>> > _CharT __c = *__first++;
>> > if (__use_locale_fmt && _S_localized_spec(__c, __mod)) 
>> > [[unlikely]]
>> >-  __out = _M_locale_fmt(std::move(__out), __fc.locale(),
>> >+  __out = _M_locale_fmt(std::move(__out), __loc.value(),
>> > __tm, __c, __mod);
>> > else switch (__c)
>> >   {
>> >@@ -1014,15 +1020,17 @@ namespace __format
>> > break;
>> >   case 'a':
>> >   case 'A':
>> >-__out = _M_a_A(__t._M_weekday, std::move(__out), __fc, __c 
>> >== 'A');
>> >+__out = _M_a_A(__t._M_weekday, std::move(__out),
>> >+   __loc.value(), __c == 'A');
>> > break;
>> >   case 'b':
>> >   case 'h':
>> >   case 'B':
>> >-__out = _M_b_B(__t._

Re: [PATCH] libstdc++: make range view ctors explicit (P2711) [PR119744]

2025-07-02 Thread Jonathan Wakely
On Tue, 1 Jul 2025 at 23:34, Nathan Myers  wrote:
>
> Make range view constructors explicit, per P2711. Technically, this
> is a breaking change, but it is unlikely to break any production
> code, as reliance on non-explicit construction is unidiomatic..
>
> libstdc++-v3/ChangeLog
> PR libstdc++/119744
> * include/std/ranges: view ctors become explicit

The "view ctors become explicit" should be a complete sentence here
please, so an uppercase letter and period.

OK for trunk with the commit message amended like that, thanks.

> ---
>  libstdc++-v3/include/std/ranges | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 210ac8274fc..f764aa7512e 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -660,7 +660,7 @@ namespace ranges
>: _M_value(__value)
>{ }
>
> -  constexpr
> +  constexpr explicit
>iota_view(type_identity_t<_Winc> __value,
> type_identity_t<_Bound> __bound)
>: _M_value(__value), _M_bound(__bound)
> @@ -669,19 +669,19 @@ namespace ranges
>   __glibcxx_assert( bool(__value <= __bound) );
>}
>
> -  constexpr
> +  constexpr explicit
>iota_view(_Iterator __first, _Iterator __last)
> requires same_as<_Winc, _Bound>
> : iota_view(__first._M_value, __last._M_value)
>{ }
>
> -  constexpr
> +  constexpr explicit
>iota_view(_Iterator __first, unreachable_sentinel_t __last)
> requires same_as<_Bound, unreachable_sentinel_t>
> : iota_view(__first._M_value, __last)
>{ }
>
> -  constexpr
> +  constexpr explicit
>iota_view(_Iterator __first, _Sentinel __last)
> requires (!same_as<_Winc, _Bound>) && (!same_as<_Bound, 
> unreachable_sentinel_t>)
> : iota_view(__first._M_value, __last._M_bound)
> @@ -1811,7 +1811,7 @@ namespace views::__adaptor
>   && default_initializable<_Pred>)
> = default;
>
> -  constexpr
> +  constexpr explicit
>filter_view(_Vp __base, _Pred __pred)
> : _M_base(std::move(__base)), _M_pred(std::move(__pred))
>{ }
> @@ -2188,7 +2188,7 @@ namespace views::__adaptor
>  && default_initializable<_Fp>)
> = default;
>
> -  constexpr
> +  constexpr explicit
>transform_view(_Vp __base, _Fp __fun)
> : _M_base(std::move(__base)), _M_fun(std::move(__fun))
>{ }
> @@ -2323,7 +2323,7 @@ namespace views::__adaptor
>  public:
>take_view() requires default_initializable<_Vp> = default;
>
> -  constexpr
> +  constexpr explicit
>take_view(_Vp __base, range_difference_t<_Vp> __count)
> : _M_base(std::move(__base)), _M_count(std::move(__count))
>{ }
> @@ -2562,7 +2562,7 @@ namespace views::__adaptor
>   && default_initializable<_Pred>)
> = default;
>
> -  constexpr
> +  constexpr explicit
>take_while_view(_Vp __base, _Pred __pred)
> : _M_base(std::move(__base)), _M_pred(std::move(__pred))
>{ }
> @@ -2650,7 +2650,7 @@ namespace views::__adaptor
>  public:
>drop_view() requires default_initializable<_Vp> = default;
>
> -  constexpr
> +  constexpr explicit
>drop_view(_Vp __base, range_difference_t<_Vp> __count)
> : _M_base(std::move(__base)), _M_count(__count)
>{ __glibcxx_assert(__count >= 0); }
> @@ -2804,7 +2804,7 @@ namespace views::__adaptor
>   && default_initializable<_Pred>)
> = default;
>
> -  constexpr
> +  constexpr explicit
>drop_while_view(_Vp __base, _Pred __pred)
> : _M_base(std::move(__base)), _M_pred(std::move(__pred))
>{ }
> @@ -3641,7 +3641,7 @@ namespace views::__adaptor
>   && default_initializable<_Pattern>)
> = default;
>
> -  constexpr
> +  constexpr explicit
>lazy_split_view(_Vp __base, _Pattern __pattern)
> : _M_base(std::move(__base)), _M_pattern(std::move(__pattern))
>{ }
> @@ -3649,7 +3649,7 @@ namespace views::__adaptor
>template
> requires constructible_from<_Vp, views::all_t<_Range>>
>   && constructible_from<_Pattern, single_view>>
> -   constexpr
> +   constexpr explicit
> lazy_split_view(_Range&& __r, range_value_t<_Range> __e)
>   : _M_base(views::all(std::forward<_Range>(__r))),
> _M_pattern(views::single(std::move(__e)))
> @@ -3766,7 +3766,7 @@ namespace views::__adaptor
>&& default_initializable<_Pattern>)
>= default;
>
> -constexpr
> +constexpr explicit
>  split_view(_Vp __base, _Pattern __pattern)
>: _M_base(std::move(__base)), _M_pattern(std::move(__pattern))
>  

Re: [PATCH] s390: Add -fno-stack-protector to 3 tests

2025-07-02 Thread Stefan Schulze Frielinghaus
On Tue, Jul 01, 2025 at 06:33:12PM +0200, Jakub Jelinek wrote:
> On Tue, Jul 01, 2025 at 03:47:53PM +0200, Stefan Schulze Frielinghaus wrote:
> > In the past years I have started to use more and more function body
> > checks whenever gcc emits optimal code for a function.  With that I
> > wanted to make sure that we do not regress like introducing unnecessary
> > extends or whatever which might not have been caught by only testing the
> > "interesting"/actual part of a patch.  Thus, as long as those function
> > body checks are stable enough, i.e., not subject to insn reordering or
> > the like, I would like to make use of them in the future, too.   That
> > being said I'm wondering whether it would make sense to automatically
> > add option -fno-stack-protector for tests which make use function-body
> > checks?  If the testsuite infrastructure doesn't provide this
> > functionality trivially, I will try to keep this in mind and always add
> > the option manually.
> 
> I think even better would be to make the check-function-body UNSUPPORTED
> if there is no explicit -f{,no-}stack-protector* among
> dg-options/dg-additional-options and the option is still used.
> Because the test can be also dg-do run and it might be useful to run the
> test.

So maybe something along the lines?

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 97935cb23c3..bb16ae897c1 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -1042,6 +1042,15 @@ proc check-function-bodies { args } {
 # The name might include a list of options; extract the file name.
 set filename [lindex $testcase 0]

+global compiler_flags
+set current_compiler_flags [current_compiler_flags]
+if { [string match "* -fstack-protector* *" " ${compiler_flags} "]
+&& ![string match "* -fstack-protector* *" " ${current_compiler_flags} 
"]
+&& ![string match "* -fno-stack-protector* *" " 
${current_compiler_flags} "] } {
+   unsupported "$testcase: skip check-function-bodies due to stack 
protector"
+   return
+}
+
 global srcdir
 set input_filename "$srcdir/$filename"
 set output_filename "[file rootname [file tail $filename]]"

I'm pretty new to tcl and didn't do extensive testing but for my few
experiments it worked so far.  I guess `string match` uses globbing so
something like "* -f(no-)?stack-protector* *" doesn't work which is why
I used two matches.

Cheers,
Stefan

> 
> Or another way could be just add a new effective target whether any kind of
> -fstack-protector{,-strong,-all} is enabled and guard the
> check-function-body directives explicitly with negation of that effective
> target.  Or perhaps have no_stack_protection effective target and use that
> to guard some directives manually.
> 
>   Jakub
> 


Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Jonathan Wakely

On 02/07/25 03:36 +0300, Mateusz Zych wrote:

Hello libstdc++ Team!

I have recently found a bug in libstdc++, that is,
the std::numeric_limits<> template specializations for integer-class types
are missing some of static data members,
which results in compilation errors of valid C++ code:

  - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4

Since adding missing member constants, which are the most relevant to
integer-like types,
was not a lot of code, I have prepared a Git patch with relevant changes.

I hope this patch is useful, Mateusz Zych


Thanks, I don't think there was any reason to omit these members, and I
agree we should add them.

The patch is simple and obvious enough that I don't think we need a
copyright assignment or DCO sign-off, so I'll push this to the
relevant branches. Thanks!



From 1e83287bbd6adf6ad8f483bd2f891692e0bed0c7 Mon Sep 17 00:00:00 2001
From: Mateusz Zych 
Date: Wed, 2 Jul 2025 01:51:40 +0300
Subject: [PATCH] libstdc++: Added missing member constants to numeric_limits
specializations for integer-class types.

25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]

 (5) For every integer-class type I,
 let B(I) be a unique hypothetical extended integer type
 of the same signedness with the same width as I.

 [Note 2: The corresponding
  hypothetical specialization numeric_limits
  meets the requirements on
  numeric_limits specializations for integral types.]

(11) For every (possibly cv-qualified) integer-class type I,
 numeric_limits is specialized such that:

 - each static data member m
   has the same value as numeric_limits::m, and

 - each static member function f
   returns I(numeric_limits::f()).

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h
(numeric_limits<__max_size_type>): New static data members.
(numeric_limits<__max_diff_type>): Likewise.
---
libstdc++-v3/include/bits/max_size_type.h | 6 ++
1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/bits/max_size_type.h 
b/libstdc++-v3/include/bits/max_size_type.h
index 73a6d141d5b..bc7700506a2 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -775,6 +775,9 @@ namespace ranges
  static constexpr bool is_signed = false;
  static constexpr bool is_integer = true;
  static constexpr bool is_exact = true;
+  static constexpr bool is_bounded = true;
+  static constexpr bool is_modulo = true;
+  static constexpr bool radix = 2;
  static constexpr int digits
= __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
  static constexpr int digits10
@@ -802,6 +805,9 @@ namespace ranges
  static constexpr bool is_signed = true;
  static constexpr bool is_integer = true;
  static constexpr bool is_exact = true;
+  static constexpr bool is_bounded = true;
+  static constexpr bool is_modulo = false;
+  static constexpr bool radix = 2;
  static constexpr int digits = numeric_limits<_Sp>::digits - 1;
  static constexpr int digits10
= static_cast(digits * numbers::ln2 / numbers::ln10);
--
2.48.1





Re: [PATCH] s390: Add -fno-stack-protector to 3 tests

2025-07-02 Thread Jakub Jelinek
On Wed, Jul 02, 2025 at 11:43:13AM +0200, Stefan Schulze Frielinghaus wrote:
> On Tue, Jul 01, 2025 at 06:33:12PM +0200, Jakub Jelinek wrote:
> > On Tue, Jul 01, 2025 at 03:47:53PM +0200, Stefan Schulze Frielinghaus wrote:
> > > In the past years I have started to use more and more function body
> > > checks whenever gcc emits optimal code for a function.  With that I
> > > wanted to make sure that we do not regress like introducing unnecessary
> > > extends or whatever which might not have been caught by only testing the
> > > "interesting"/actual part of a patch.  Thus, as long as those function
> > > body checks are stable enough, i.e., not subject to insn reordering or
> > > the like, I would like to make use of them in the future, too.   That
> > > being said I'm wondering whether it would make sense to automatically
> > > add option -fno-stack-protector for tests which make use function-body
> > > checks?  If the testsuite infrastructure doesn't provide this
> > > functionality trivially, I will try to keep this in mind and always add
> > > the option manually.
> > 
> > I think even better would be to make the check-function-body UNSUPPORTED
> > if there is no explicit -f{,no-}stack-protector* among
> > dg-options/dg-additional-options and the option is still used.
> > Because the test can be also dg-do run and it might be useful to run the
> > test.
> 
> So maybe something along the lines?
> 
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 97935cb23c3..bb16ae897c1 100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -1042,6 +1042,15 @@ proc check-function-bodies { args } {
>  # The name might include a list of options; extract the file name.
>  set filename [lindex $testcase 0]
> 
> +global compiler_flags
> +set current_compiler_flags [current_compiler_flags]
> +if { [string match "* -fstack-protector* *" " ${compiler_flags} "]
> +&& ![string match "* -fstack-protector* *" " 
> ${current_compiler_flags} "]
> +&& ![string match "* -fno-stack-protector* *" " 
> ${current_compiler_flags} "] } {
> +   unsupported "$testcase: skip check-function-bodies due to stack 
> protector"
> +   return
> +}
> +
>  global srcdir
>  set input_filename "$srcdir/$filename"
>  set output_filename "[file rootname [file tail $filename]]"
> 
> I'm pretty new to tcl and didn't do extensive testing but for my few
> experiments it worked so far.  I guess `string match` uses globbing so
> something like "* -f(no-)?stack-protector* *" doesn't work which is why
> I used two matches.

Yes.  Though maybe it could/should include other options known to change
the prologue/epilogue substantially, I think e.g. -fstack-clash-protection
or -fhardened or -fstack-check* might be other candidates.
I admit I haven't looked out what compiler_flags and current_compiler_flags
are at that point, which one is the effective command line after all
RUNTESTFLAGS and dg-options/dg-additional-options options and what is just
from the latter two.

And we'd need to document it.

That said, I think some distros (Debian?) configure with -fhardened on by
default or similar options, not sure if those tests fail in all those cases
and if those distros have been just ignoring it or what.

Jakub



Re: [PATCH] libstdc++: Format chrono %a/%A/%b/%B/%p using locale's time_put [PR117214]

2025-07-02 Thread Tomasz Kaminski
To do that, you may need to extract a debug/throws checks from this
functions, i.e. having something like:
if (_M_check_ok(__t, __conv))
   {};
else (__use_locale_fmt && ...)
else switch()

Where _M_check_ok() would have a switch that checks if value is ok() based
on specifier, and throw, print message.

On Wed, Jul 2, 2025 at 11:29 AM Tomasz Kaminski  wrote:

>
>
> On Wed, Jul 2, 2025 at 9:13 AM XU Kailiang  wrote:
>
>>
>> C++ formatting locale could have a custom time_put that performs
>> differently from the C locale, so do not use __timepunct directly.
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/117214
>> * include/bits/chrono_io.h (__formatter_chrono::_M_a_A,
>> __formatter_chrono::_M_b_B, __formatter_chrono::_M_p): use
>> _M_locale_fmt to format %a/%A/%b/%B/%p.
>> * testsuite/std/time/format/pr117214_custom_timeput.cc: New
>> test.
>>
>> Signed-off-by: XU Kailiang 
>> ---
>>  libstdc++-v3/include/bits/chrono_io.h | 31 ++--
>>  .../time/format/pr117214_custom_timeput.cc| 36 +++
>>  2 files changed, 47 insertions(+), 20 deletions(-)
>>  create mode 100644
>> libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>>
>> diff --git a/libstdc++-v3/include/bits/chrono_io.h
>> b/libstdc++-v3/include/bits/chrono_io.h
>> index abbf4efcc3b..8358105c26b 100644
>> --- a/libstdc++-v3/include/bits/chrono_io.h
>> +++ b/libstdc++-v3/include/bits/chrono_io.h
>> @@ -905,14 +905,10 @@ namespace __format
>> }
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __days[7];
>> - if (__full)
>> -   __tp._M_days(__days);
>> - else
>> -   __tp._M_days_abbreviated(__days);
>> - __string_view __str(__days[__wd.c_encoding()]);
>> - return _M_write(std::move(__out), __loc, __str);
>> + struct tm __tm{};
>> + __tm.tm_wday = __wd.c_encoding();
>> + return _M_locale_fmt(std::move(__out), __loc, __tm,
>> +  __full ? 'A' : 'a', 0);
>>
> I have recently removed all call to _M_locale_fmt from inside the
> specifiers,
> into the format loop, instead of calling it per each specifier
> individually.
> I think, we should follow same approach here, by updating
> _S_localized_spec to
> return true for all above specifiers.
> Then we can remove calls to _M_write and replace it with __format_writes,
> as
> this functions will be only used for C locale.
>
>> }
>>
>>template
>> @@ -936,14 +932,10 @@ namespace __format
>> }
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __months[12];
>> - if (__full)
>> -   __tp._M_months(__months);
>> - else
>> -   __tp._M_months_abbreviated(__months);
>> - __string_view __str(__months[(unsigned)__m - 1]);
>> - return _M_write(std::move(__out), __loc, __str);
>> + struct tm __tm{};
>> + __tm.tm_mon = (unsigned)__m - 1;
>> + return _M_locale_fmt(std::move(__out), __loc, __tm,
>> +  __full ? 'B' : 'b', 0);
>> }
>>
>>template
>> @@ -1329,10 +1321,9 @@ namespace __format
>> __hi %= 24;
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __ampm[2];
>> - __tp._M_am_pm(__ampm);
>> - return _M_write(std::move(__out), __loc, __ampm[__hi >= 12]);
>> + struct tm __tm{};
>> + __tm.tm_hour = __hi;
>> + return _M_locale_fmt(std::move(__out), __loc, __tm, 'p', 0);
>> }
>>
>>template
>> diff --git
>> a/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>> b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>> new file mode 100644
>> index 000..8c9f3d29bc6
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>> @@ -0,0 +1,36 @@
>> +// { dg-do run { target c++20 } }
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +struct custom_time_put : std::time_put
>> +{
>> +  iter_type
>> +  do_put(iter_type out, std::ios_base& io, char_type fill, const tm* t,
>> +char format, char modifier) const override
>> +  {
>> +using Base = std::time_put;
>> +
>> +switch (format) {
>> +  case 'a': case 'A': case 'b': case 'B': case 'p':
>> +   *out++ = '[';
>> +   *out++ = format;
>> +   *out++ = ']';
>> +}
>> +return Base::do_put(out, io, fill, t, format, modifier);
>> +  }
>> +};
>> +
>> +int main()
>> +{
>> +  using namespace std::chrono;
>> +  std::locale loc(std::locale::classic(), new custom_time_put);
>> +#define test(t, fmt, exp) VERIFY( std::format(loc, fmt, t) == exp )
>> +  test(Monday,  "{:L%a}", "[a]

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 10:50, Jonathan Wakely  wrote:
>
> On 02/07/25 03:36 +0300, Mateusz Zych wrote:
> >Hello libstdc++ Team!
> >
> >I have recently found a bug in libstdc++, that is,
> >the std::numeric_limits<> template specializations for integer-class types
> >are missing some of static data members,
> >which results in compilation errors of valid C++ code:
> >
> >   - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4
> >
> >Since adding missing member constants, which are the most relevant to
> >integer-like types,
> >was not a lot of code, I have prepared a Git patch with relevant changes.
> >
> >I hope this patch is useful, Mateusz Zych
>
> Thanks, I don't think there was any reason to omit these members, and I
> agree we should add them.
>
> The patch is simple and obvious enough that I don't think we need a
> copyright assignment or DCO sign-off, so I'll push this to the
> relevant branches. Thanks!

Oh actually the radix members should be of type int, not bool. I can fix that.


>
> >From 1e83287bbd6adf6ad8f483bd2f891692e0bed0c7 Mon Sep 17 00:00:00 2001
> >From: Mateusz Zych 
> >Date: Wed, 2 Jul 2025 01:51:40 +0300
> >Subject: [PATCH] libstdc++: Added missing member constants to numeric_limits
> > specializations for integer-class types.
> >
> >25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]
> >
> >  (5) For every integer-class type I,
> >  let B(I) be a unique hypothetical extended integer type
> >  of the same signedness with the same width as I.
> >
> >  [Note 2: The corresponding
> >   hypothetical specialization numeric_limits
> >   meets the requirements on
> >   numeric_limits specializations for integral types.]
> >
> > (11) For every (possibly cv-qualified) integer-class type I,
> >  numeric_limits is specialized such that:
> >
> >  - each static data member m
> >has the same value as numeric_limits::m, and
> >
> >  - each static member function f
> >returns I(numeric_limits::f()).
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/max_size_type.h
> >   (numeric_limits<__max_size_type>): New static data members.
> >   (numeric_limits<__max_diff_type>): Likewise.
> >---
> > libstdc++-v3/include/bits/max_size_type.h | 6 ++
> > 1 file changed, 6 insertions(+)
> >
> >diff --git a/libstdc++-v3/include/bits/max_size_type.h 
> >b/libstdc++-v3/include/bits/max_size_type.h
> >index 73a6d141d5b..bc7700506a2 100644
> >--- a/libstdc++-v3/include/bits/max_size_type.h
> >+++ b/libstdc++-v3/include/bits/max_size_type.h
> >@@ -775,6 +775,9 @@ namespace ranges
> >   static constexpr bool is_signed = false;
> >   static constexpr bool is_integer = true;
> >   static constexpr bool is_exact = true;
> >+  static constexpr bool is_bounded = true;
> >+  static constexpr bool is_modulo = true;
> >+  static constexpr bool radix = 2;
> >   static constexpr int digits
> >   = __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
> >   static constexpr int digits10
> >@@ -802,6 +805,9 @@ namespace ranges
> >   static constexpr bool is_signed = true;
> >   static constexpr bool is_integer = true;
> >   static constexpr bool is_exact = true;
> >+  static constexpr bool is_bounded = true;
> >+  static constexpr bool is_modulo = false;
> >+  static constexpr bool radix = 2;
> >   static constexpr int digits = numeric_limits<_Sp>::digits - 1;
> >   static constexpr int digits10
> >   = static_cast(digits * numbers::ln2 / numbers::ln10);
> >--
> >2.48.1
> >
>



Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-02 Thread Jonathan Wakely

On 02/07/25 11:52 +0100, Jonathan Wakely wrote:

On Wed, 2 Jul 2025 at 10:50, Jonathan Wakely  wrote:


On 02/07/25 03:36 +0300, Mateusz Zych wrote:
>Hello libstdc++ Team!
>
>I have recently found a bug in libstdc++, that is,
>the std::numeric_limits<> template specializations for integer-class types
>are missing some of static data members,
>which results in compilation errors of valid C++ code:
>
>   - Compiler Explorer: https://godbolt.org/z/E7z4WYfj4
>
>Since adding missing member constants, which are the most relevant to
>integer-like types,
>was not a lot of code, I have prepared a Git patch with relevant changes.
>
>I hope this patch is useful, Mateusz Zych

Thanks, I don't think there was any reason to omit these members, and I
agree we should add them.

The patch is simple and obvious enough that I don't think we need a
copyright assignment or DCO sign-off, so I'll push this to the
relevant branches. Thanks!


Oh actually the radix members should be of type int, not bool. I can fix that.


Here's what I'm testing:


commit ddd5b88db4fe99166835fe1b94beca451bc1ce30
Author: Mateusz Zych 
AuthorDate: Tue Jul 1 23:51:40 2025
Commit: Jonathan Wakely 
CommitDate: Wed Jul 2 11:57:45 2025

libstdc++: Add missing members to numeric_limits specializations for 
integer-class types

[iterator.concept.winc]/11 says that std::numeric_limits should be

specialized for integer-class types, with each member defined
appropriately.

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h (numeric_limits<__max_size_type>):

New static data members.
(numeric_limits<__max_diff_type>): Likewise.
* testsuite/std/ranges/iota/max_size_type.cc: Check new members.

Co-authored-by: Jonathan Wakely 


diff --git a/libstdc++-v3/include/bits/max_size_type.h 
b/libstdc++-v3/include/bits/max_size_type.h
index 73a6d141d5bc..3ac2b8e6b878 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -775,6 +775,9 @@ namespace ranges
   static constexpr bool is_signed = false;
   static constexpr bool is_integer = true;
   static constexpr bool is_exact = true;
+  static constexpr bool is_bounded = true;
+  static constexpr bool is_modulo = true;
+  static constexpr int radix = 2;
   static constexpr int digits
= __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
   static constexpr int digits10
@@ -802,6 +805,9 @@ namespace ranges
   static constexpr bool is_signed = true;
   static constexpr bool is_integer = true;
   static constexpr bool is_exact = true;
+  static constexpr bool is_bounded = true;
+  static constexpr bool is_modulo = false;
+  static constexpr int radix = 2;
   static constexpr int digits = numeric_limits<_Sp>::digits - 1;
   static constexpr int digits10
= static_cast(digits * numbers::ln2 / numbers::ln10);
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
index 4739d9e2f790..fc5284594c7e 100644
--- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
+++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
@@ -352,6 +352,9 @@ static_assert(numeric_limits::is_specialized);
 static_assert(!numeric_limits::is_signed);
 static_assert(numeric_limits::is_integer);
 static_assert(numeric_limits::is_exact);
+static_assert(numeric_limits::is_bounded);
+static_assert(numeric_limits::is_modulo);
+static_assert(numeric_limits::radix == 2);
 // We can't unconditionally use numeric_limits here because __int128 is an
 // integral type only in GNU mode.
 #if __SIZEOF_INT128__
@@ -379,6 +382,9 @@ static_assert(numeric_limits::is_specialized);
 static_assert(numeric_limits::is_signed);
 static_assert(numeric_limits::is_integer);
 static_assert(numeric_limits::is_exact);
+static_assert(numeric_limits::is_bounded);
+static_assert(!numeric_limits::is_modulo);
+static_assert(numeric_limits::radix == 2);
 static_assert(numeric_limits::digits
  == numeric_limits::digits - 1);
 static_assert(numeric_limits::digits10



Re: [Fortran, Patch, PR120843, v3] Fix reject valid, because of inconformable coranks

2025-07-02 Thread Damian Rouson
git branch
gir checkout
git add
git commit
git rebase
git push

It’s time to move beyond emailing patches!  (Please.)

Damian

On Wed, Jul 2, 2025 at 03:17 Andre Vehreschild  wrote:

> Hi all,
>
> I successfully created a big mess with the previous patch. First of all by
> applying an outdated one and secondly by adding the conformance checks for
> coranks in a3f1cdd8ed46f9816b31ab162ae4dac547d34ebc. Checking the standard
> even
> using AI (haha) to figure if coranks of an expression have restrictions on
> them, failed. I found nothing. AI fantasized about restrictions that did
> not
> exist. Therefore the current approach is to remove the conformance check
> and
> just use the computed coranks in expressions to prevent recomputaion
> whenever
> they needed.
>
> Jerry, Harald: Sorry for all the bother and all my mistakes. I am really
> sorry
> to have wasted your time.
>
> The patch has been regtested fine on x86_64-pc-linux-gnu / F41. Ok for
> mainline
> and later backport to gcc-15?
>
> Regards,
> Andre
>
> On Tue, 1 Jul 2025 11:17:58 +0200
> Andre Vehreschild  wrote:
>
> > Hi Harald,
> >
> > thanks for the review. Committed as gcc-16-1885-g1b0930e9046.
> >
> > Will backport to gcc-15 in about a week.
> >
> > Thanks again.
> >
> > Regards,
> >   Andre
> >
> > On Mon, 30 Jun 2025 22:31:08 +0200
> > Harald Anlauf  wrote:
> >
> > > Am 30.06.25 um 15:25 schrieb Andre Vehreschild:
> > > > Hi all,
> > > >
> > > > here now the version of the patch that seems to be more complete.
> > > >
> > > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline and later
> > > > backport to gcc-15?
> > >
> > > This looks good to me.  OK for both.
> > >
> > > Thanks for the patch!
> > >
> > > Harald
> > >
> > > > Regards,
> > > >   Andre
> > > >
> > > > On Fri, 27 Jun 2025 15:44:20 +0200
> > > > Andre Vehreschild  wrote:
> > > >
> > > >> I take this patch back. It seems to be incomplete.
> > > >>
> > > >> - Andre
> > > >>
> > > >> On Fri, 27 Jun 2025 14:45:36 +0200
> > > >> Andre Vehreschild  wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> this patch fixes a reject valid when the coranks of two operands
> do not
> > > >>> match and no coindex is given. I.e. when only an implicit
> this_image
> > > >>> co-ref is used.
> > > >>>
> > > >>> Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> > > >>>
> > > >>> Regards,
> > > >>> Andre
> > > >>
> > > >>
> > > >
> > > >
> > >
> >
> >
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [PATCH v2] libstdc++: Lift locale initialization in main chrono format loop [PR110739]

2025-07-02 Thread Jonathan Wakely

On 01/07/25 16:54 +0200, Tomasz Kamiński wrote:

This patch lifts locale initialization from locale-specific handling methods
into _M_format_to function, and pass the locale by const reference.
To avoid unnecessary computation of locale::classic(), we use _Optional_locale,
and emplace into it only for localized formatting (_M_spec._M_localized) or if
chrono-spec contains locale specific specifiers (_M_spec._M_locale_specific).
The later is constructs locale::classic() in more cases that strictly necessary,
as only subset of locale specific specifiers (%a, %A, %b, %B, %c, %p, %r) needs
locale, while _M_locale_specific is also set for %x,%X and when O/E modifiers 
are
used. However, none of default outputs are affects, so I believe this is
acceptable.

In _M_S we no longer guard querying of numpuct facet, with check that requires
potentially equally expensive construction of locale::classic. We also mark
localized path as unlikely.

The _M_locale method is no longer used in __formatter_chrono, and thus was
moved to __formatter_duration.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Compute locale and pass it to specifiers method.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
(__formatter_chrono::_M_c, __formatter_chrono::_M_p)
(__formatter_chrono::_M_r): Accept locale instead of format context.
(__formatter_chrono::_M_subsecs): Call __ctx.locale() directly,
instead of _M_locale and do not compare with locale::classic().
Add [[unlikely]] attributes.
(__formatter_chrono::_M_locale): Move to __formatter_duration.
(__formatter_duration::_M_locale): Moved from __formatter_chrono.
---
v2 updates the commit message text only in hope to make it more readable.

libstdc++-v3/include/bits/chrono_io.h | 71 ---
1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index bcf9830fb9e..a25cb9ada01 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -964,10 +964,16 @@ namespace __format
return std::move(__out);
  };

+ _Optional_locale __loc;

We could add:

  bool __loc_is_classic = false;


+ if (_M_spec._M_localized)
+   __loc = __fc.locale();


and set it to __loc == locale::classic() here,


+ else if (_M_spec._M_locale_specific)
+   __loc = locale::classic();


and set it to true here.


+
  struct tm __tm{};
  bool __use_locale_fmt = false;
  if (_M_spec._M_localized && _M_spec._M_locale_specific)
-   if (__fc.locale() != locale::classic())
+   if (__loc.value() != locale::classic())


Then we could just test __loc_is_classic here. That would avoid a
second call to classic() for the case where we explicitly set __loc to
classic().


  {
__use_locale_fmt = true;

@@ -1004,7 +1010,7 @@ namespace __format
{
  _CharT __c = *__first++;
  if (__use_locale_fmt && _S_localized_spec(__c, __mod)) 
[[unlikely]]
-   __out = _M_locale_fmt(std::move(__out), __fc.locale(),
+   __out = _M_locale_fmt(std::move(__out), __loc.value(),
  __tm, __c, __mod);
  else switch (__c)
{
@@ -1014,15 +1020,17 @@ namespace __format
  break;
case 'a':
case 'A':
- __out = _M_a_A(__t._M_weekday, std::move(__out), __fc, __c == 
'A');
+ __out = _M_a_A(__t._M_weekday, std::move(__out),
+__loc.value(), __c == 'A');
  break;
case 'b':
case 'h':
case 'B':
- __out = _M_b_B(__t._M_month, std::move(__out), __fc, __c == 
'B');
+ __out = _M_b_B(__t._M_month, std::move(__out),
+__loc.value(), __c == 'B');
  break;
case 'c':
- __out = _M_c(__t, std::move(__out), __fc);
+ __out = _M_c(__t, std::move(__out), __loc.value());
  break;
case 'C':
case 'y':
@@ -1058,7 +1066,7 @@ namespace __format
  __out = _M_M(__t._M_minutes, __print_sign());
  break;
case 'p':
- __out = _M_p(__t._M_hours, std::move(__out), __fc);
+ __out = _M_p(__t._M_hours, std::move(__out), __loc.value());
  break;
case 'q':
  __out = _M_q(__t._M_unit_suffix, std::move(__out));
@@ -1067,7 +1075,7 @@ namespace __format
  __out = _M_Q(__t, __print_sign(), __fc);
  break;
case 'r':
- __out = _M_r(__t, __print_sign(), __fc);
+ 

Re: [PATCH] libstdc++: Format chrono %a/%A/%b/%B/%p using locale's time_put [PR117214]

2025-07-02 Thread Tomasz Kaminski
On Wed, Jul 2, 2025 at 9:13 AM XU Kailiang  wrote:

>
> C++ formatting locale could have a custom time_put that performs
> differently from the C locale, so do not use __timepunct directly.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/117214
> * include/bits/chrono_io.h (__formatter_chrono::_M_a_A,
> __formatter_chrono::_M_b_B, __formatter_chrono::_M_p): use
> _M_locale_fmt to format %a/%A/%b/%B/%p.
> * testsuite/std/time/format/pr117214_custom_timeput.cc: New
> test.
>
> Signed-off-by: XU Kailiang 
> ---
>  libstdc++-v3/include/bits/chrono_io.h | 31 ++--
>  .../time/format/pr117214_custom_timeput.cc| 36 +++
>  2 files changed, 47 insertions(+), 20 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index abbf4efcc3b..8358105c26b 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -905,14 +905,10 @@ namespace __format
> }
>
>   locale __loc = _M_locale(__ctx);
> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
> - const _CharT* __days[7];
> - if (__full)
> -   __tp._M_days(__days);
> - else
> -   __tp._M_days_abbreviated(__days);
> - __string_view __str(__days[__wd.c_encoding()]);
> - return _M_write(std::move(__out), __loc, __str);
> + struct tm __tm{};
> + __tm.tm_wday = __wd.c_encoding();
> + return _M_locale_fmt(std::move(__out), __loc, __tm,
> +  __full ? 'A' : 'a', 0);
>
I have recently removed all call to _M_locale_fmt from inside the
specifiers,
into the format loop, instead of calling it per each specifier individually.
I think, we should follow same approach here, by updating _S_localized_spec
to
return true for all above specifiers.
Then we can remove calls to _M_write and replace it with __format_writes, as
this functions will be only used for C locale.

> }
>
>template
> @@ -936,14 +932,10 @@ namespace __format
> }
>
>   locale __loc = _M_locale(__ctx);
> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
> - const _CharT* __months[12];
> - if (__full)
> -   __tp._M_months(__months);
> - else
> -   __tp._M_months_abbreviated(__months);
> - __string_view __str(__months[(unsigned)__m - 1]);
> - return _M_write(std::move(__out), __loc, __str);
> + struct tm __tm{};
> + __tm.tm_mon = (unsigned)__m - 1;
> + return _M_locale_fmt(std::move(__out), __loc, __tm,
> +  __full ? 'B' : 'b', 0);
> }
>
>template
> @@ -1329,10 +1321,9 @@ namespace __format
> __hi %= 24;
>
>   locale __loc = _M_locale(__ctx);
> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
> - const _CharT* __ampm[2];
> - __tp._M_am_pm(__ampm);
> - return _M_write(std::move(__out), __loc, __ampm[__hi >= 12]);
> + struct tm __tm{};
> + __tm.tm_hour = __hi;
> + return _M_locale_fmt(std::move(__out), __loc, __tm, 'p', 0);
> }
>
>template
> diff --git
> a/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
> b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
> new file mode 100644
> index 000..8c9f3d29bc6
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
> @@ -0,0 +1,36 @@
> +// { dg-do run { target c++20 } }
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct custom_time_put : std::time_put
> +{
> +  iter_type
> +  do_put(iter_type out, std::ios_base& io, char_type fill, const tm* t,
> +char format, char modifier) const override
> +  {
> +using Base = std::time_put;
> +
> +switch (format) {
> +  case 'a': case 'A': case 'b': case 'B': case 'p':
> +   *out++ = '[';
> +   *out++ = format;
> +   *out++ = ']';
> +}
> +return Base::do_put(out, io, fill, t, format, modifier);
> +  }
> +};
> +
> +int main()
> +{
> +  using namespace std::chrono;
> +  std::locale loc(std::locale::classic(), new custom_time_put);
> +#define test(t, fmt, exp) VERIFY( std::format(loc, fmt, t) == exp )
> +  test(Monday,  "{:L%a}", "[a]Mon");
> +  test(Monday,  "{:L%A}", "[A]Monday");
> +  test(January, "{:L%b}", "[b]Jan");
> +  test(January, "{:L%B}", "[B]January");
> +  test(1h,  "{:L%p}", "[p]AM");
> +}
> --
> 2.50.0
>
>


Re: [PATCH] libstdc++: Format chrono %a/%A/%b/%B/%p using locale's time_put [PR117214]

2025-07-02 Thread Jonathan Wakely
On Wed, 2 Jul 2025 at 10:30, Tomasz Kaminski  wrote:
>
>
>
> On Wed, Jul 2, 2025 at 9:13 AM XU Kailiang  wrote:
>>
>>
>> C++ formatting locale could have a custom time_put that performs
>> differently from the C locale, so do not use __timepunct directly.
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/117214
>> * include/bits/chrono_io.h (__formatter_chrono::_M_a_A,
>> __formatter_chrono::_M_b_B, __formatter_chrono::_M_p): use
>> _M_locale_fmt to format %a/%A/%b/%B/%p.
>> * testsuite/std/time/format/pr117214_custom_timeput.cc: New
>> test.
>>
>> Signed-off-by: XU Kailiang 
>> ---
>>  libstdc++-v3/include/bits/chrono_io.h | 31 ++--
>>  .../time/format/pr117214_custom_timeput.cc| 36 +++
>>  2 files changed, 47 insertions(+), 20 deletions(-)
>>  create mode 100644 
>> libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>>
>> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
>> b/libstdc++-v3/include/bits/chrono_io.h
>> index abbf4efcc3b..8358105c26b 100644
>> --- a/libstdc++-v3/include/bits/chrono_io.h
>> +++ b/libstdc++-v3/include/bits/chrono_io.h
>> @@ -905,14 +905,10 @@ namespace __format
>> }
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __days[7];
>> - if (__full)
>> -   __tp._M_days(__days);
>> - else
>> -   __tp._M_days_abbreviated(__days);
>> - __string_view __str(__days[__wd.c_encoding()]);
>> - return _M_write(std::move(__out), __loc, __str);
>> + struct tm __tm{};
>> + __tm.tm_wday = __wd.c_encoding();
>> + return _M_locale_fmt(std::move(__out), __loc, __tm,
>> +  __full ? 'A' : 'a', 0);
>
> I have recently removed all call to _M_locale_fmt from inside the specifiers,
> into the format loop, instead of calling it per each specifier individually.
> I think, we should follow same approach here, by updating _S_localized_spec to
> return true for all above specifiers.

Yes please.

> Then we can remove calls to _M_write and replace it with __format_writes, as
> this functions will be only used for C locale.
>>
>> }
>>
>>template
>> @@ -936,14 +932,10 @@ namespace __format
>> }
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __months[12];
>> - if (__full)
>> -   __tp._M_months(__months);
>> - else
>> -   __tp._M_months_abbreviated(__months);
>> - __string_view __str(__months[(unsigned)__m - 1]);
>> - return _M_write(std::move(__out), __loc, __str);
>> + struct tm __tm{};
>> + __tm.tm_mon = (unsigned)__m - 1;
>> + return _M_locale_fmt(std::move(__out), __loc, __tm,
>> +  __full ? 'B' : 'b', 0);
>> }
>>
>>template
>> @@ -1329,10 +1321,9 @@ namespace __format
>> __hi %= 24;
>>
>>   locale __loc = _M_locale(__ctx);
>> - const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
>> - const _CharT* __ampm[2];
>> - __tp._M_am_pm(__ampm);
>> - return _M_write(std::move(__out), __loc, __ampm[__hi >= 12]);
>> + struct tm __tm{};
>> + __tm.tm_hour = __hi;
>> + return _M_locale_fmt(std::move(__out), __loc, __tm, 'p', 0);
>> }
>>
>>template
>> diff --git 
>> a/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc 
>> b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>> new file mode 100644
>> index 000..8c9f3d29bc6
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
>> @@ -0,0 +1,36 @@
>> +// { dg-do run { target c++20 } }
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +struct custom_time_put : std::time_put
>> +{
>> +  iter_type
>> +  do_put(iter_type out, std::ios_base& io, char_type fill, const tm* t,
>> +char format, char modifier) const override
>> +  {
>> +using Base = std::time_put;
>> +
>> +switch (format) {
>> +  case 'a': case 'A': case 'b': case 'B': case 'p':
>> +   *out++ = '[';
>> +   *out++ = format;
>> +   *out++ = ']';
>> +}
>> +return Base::do_put(out, io, fill, t, format, modifier);
>> +  }
>> +};
>> +
>> +int main()
>> +{
>> +  using namespace std::chrono;
>> +  std::locale loc(std::locale::classic(), new custom_time_put);
>> +#define test(t, fmt, exp) VERIFY( std::format(loc, fmt, t) == exp )
>> +  test(Monday,  "{:L%a}", "[a]Mon");
>> +  test(Monday,  "{:L%A}", "[A]Monday");
>> +  test(January, "{:L%b}", "[b]Jan");
>> +  test(January, "{:L%B}", "[B]January");
>> +  test(1h,  "{:L%p}", "[p]AM");
>> +}
>> --
>> 2.50.0
>>



Re: [PATCH v2] libstdc++: Lift locale initialization in main chrono format loop [PR110739]

2025-07-02 Thread Tomasz Kaminski
On Wed, Jul 2, 2025 at 11:27 AM Jonathan Wakely  wrote:

> On 01/07/25 16:54 +0200, Tomasz Kamiński wrote:
> >This patch lifts locale initialization from locale-specific handling
> methods
> >into _M_format_to function, and pass the locale by const reference.
> >To avoid unnecessary computation of locale::classic(), we use
> _Optional_locale,
> >and emplace into it only for localized formatting (_M_spec._M_localized)
> or if
> >chrono-spec contains locale specific specifiers
> (_M_spec._M_locale_specific).
> >The later is constructs locale::classic() in more cases that strictly
> necessary,
> >as only subset of locale specific specifiers (%a, %A, %b, %B, %c, %p, %r)
> needs
> >locale, while _M_locale_specific is also set for %x,%X and when O/E
> modifiers are
> >used. However, none of default outputs are affects, so I believe this is
> >acceptable.
> >
> >In _M_S we no longer guard querying of numpuct facet, with check that
> requires
> >potentially equally expensive construction of locale::classic. We also
> mark
> >localized path as unlikely.
> >
> >The _M_locale method is no longer used in __formatter_chrono, and thus was
> >moved to __formatter_duration.
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
> >   Compute locale and pass it to specifiers method.
> >   (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_p)
> >   (__formatter_chrono::_M_r): Accept locale instead of format
> context.
> >   (__formatter_chrono::_M_subsecs): Call __ctx.locale() directly,
> >   instead of _M_locale and do not compare with locale::classic().
> >   Add [[unlikely]] attributes.
> >   (__formatter_chrono::_M_locale): Move to __formatter_duration.
> >   (__formatter_duration::_M_locale): Moved from __formatter_chrono.
> >---
> >v2 updates the commit message text only in hope to make it more readable.
> >
> > libstdc++-v3/include/bits/chrono_io.h | 71 ---
> > 1 file changed, 43 insertions(+), 28 deletions(-)
> >
> >diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> >index bcf9830fb9e..a25cb9ada01 100644
> >--- a/libstdc++-v3/include/bits/chrono_io.h
> >+++ b/libstdc++-v3/include/bits/chrono_io.h
> >@@ -964,10 +964,16 @@ namespace __format
> >   return std::move(__out);
> > };
> >
> >+_Optional_locale __loc;
> We could add:
>
>bool __loc_is_classic = false;
>
> >+if (_M_spec._M_localized)
> >+  __loc = __fc.locale();
>
> and set it to __loc == locale::classic() here,
>
> >+else if (_M_spec._M_locale_specific)
> >+  __loc = locale::classic();
>
> and set it to true here.
>
> >+
> > struct tm __tm{};
> > bool __use_locale_fmt = false;
> > if (_M_spec._M_localized && _M_spec._M_locale_specific)
> >-  if (__fc.locale() != locale::classic())
> >+  if (__loc.value() != locale::classic())
>
> Then we could just test __loc_is_classic here. That would avoid a
> second call to classic() for the case where we explicitly set __loc to
> classic().
>
I thought about it, but I think that the number of calls to
locale::classic() remains the same,
and we call it only once in both implementations.
My reasoning, the above is performed only if _M_spec._M_localized is true,
in that
case we initialize __loc with __fc.locale(), and would perform that check
anyway.
So I do not think extracting bool is worth the complexity.

Or in other words, we never compare __loc with locale::classic() in case we
initialize
it with it.

>
> > {
> >   __use_locale_fmt = true;
> >
> >@@ -1004,7 +1010,7 @@ namespace __format
> >   {
> > _CharT __c = *__first++;
> > if (__use_locale_fmt && _S_localized_spec(__c, __mod))
> [[unlikely]]
> >-  __out = _M_locale_fmt(std::move(__out), __fc.locale(),
> >+  __out = _M_locale_fmt(std::move(__out), __loc.value(),
> > __tm, __c, __mod);
> > else switch (__c)
> >   {
> >@@ -1014,15 +1020,17 @@ namespace __format
> > break;
> >   case 'a':
> >   case 'A':
> >-__out = _M_a_A(__t._M_weekday, std::move(__out), __fc,
> __c == 'A');
> >+__out = _M_a_A(__t._M_weekday, std::move(__out),
> >+   __loc.value(), __c == 'A');
> > break;
> >   case 'b':
> >   case 'h':
> >   case 'B':
> >-__out = _M_b_B(__t._M_month, std::move(__out), __fc, __c
> == 'B');
> >+__out = _M_b_B(__t._M_month, std::move(__out),
> >+   __loc.value(), __c == 'B');
> > break;
> >   case 'c':
> >-__out = _M_c(__t, std::move(__out), __fc);
> >+   

Re: [PATCH v2] vect: Misalign checks for gather/scatter.

2025-07-02 Thread Richard Biener
On Wed, 2 Jul 2025, Robin Dapp wrote:

> > The else (get_group_load_store_type) can end up returning
> > VMAT_GATHER_SCATTER and thus require the above checking as well.
> 
> Isn't this already covered by
> 
>  if (*memory_access_type == VMAT_ELEMENTWISE
>  || (*memory_access_type == VMAT_GATHER_SCATTER
> && GATHER_SCATTER_LEGACY_P (*gs_info))
>  || *memory_access_type == VMAT_STRIDED_SLP
>  || *memory_access_type == VMAT_INVARIANT)
>{
>  *alignment_support_scheme = dr_unaligned_supported;
>  *misalignment = DR_MISALIGNMENT_UNKNOWN;
>}
>  else
>{
>  *misalignment = dr_misalignment (first_dr_info, vectype, *poffset);
>  *alignment_support_scheme
>   = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype,
>*misalignment);
>}
> 
> (now that non-legacy gather/scatter is not exempt from the alignment check any
> more)?

I'm not sure?  I'd prefer some refactoring to make this more obvious
(and the split between the two functions doesn't help ...).

If you're sure it's all covered then ignore this comment, I can do
the refactoring as followup.  It just wasn't obvious to me.

Richard.


Re: [PATCH v4 1/6] c-family: add btf_type_tag and btf_decl_tag attributes

2025-07-02 Thread Richard Biener
On Tue, Jun 10, 2025 at 11:40 PM David Faust  wrote:
>
> Add two new c-family attributes, "btf_type_tag" and "btf_decl_tag"
> along with a simple shared handler for them.
>
> gcc/c-family/
> * c-attribs.cc (c_common_attribute_table): Add btf_decl_tag and
> btf_type_tag attributes.
> (handle_btf_tag_attribute): New handler for both new attributes.
> ---
>  gcc/c-family/c-attribs.cc | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 5a0e3d328ba..cc1efaeaaec 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -189,6 +189,8 @@ static tree handle_fd_arg_attribute (tree *, tree, tree, 
> int, bool *);
>  static tree handle_flag_enum_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_null_terminated_string_arg_attribute (tree *, tree, tree, 
> int, bool *);
>
> +static tree handle_btf_tag_attribute (tree *, tree, tree, int, bool *);
> +
>  /* Helper to define attribute exclusions.  */
>  #define ATTR_EXCL(name, function, type, variable)  \
>{ name, function, type, variable }
> @@ -640,7 +642,11 @@ const struct attribute_spec c_common_gnu_attributes[] =
>{ "flag_enum", 0, 0, false, true, false, false,
>   handle_flag_enum_attribute, NULL },
>{ "null_terminated_string_arg", 1, 1, false, true, true, false,
> - handle_null_terminated_string_arg_attribute, 
> NULL}
> + handle_null_terminated_string_arg_attribute, 
> NULL},
> +  { "btf_type_tag",  1, 1, false, true, false, false,
> + handle_btf_tag_attribute, NULL},
> +  { "btf_decl_tag",  1, 1, true, false, false, false,
> + handle_btf_tag_attribute, NULL}
>  };
>
>  const struct scoped_attribute_specs c_common_gnu_attribute_table =
> @@ -5101,6 +5107,23 @@ handle_null_terminated_string_arg_attribute (tree 
> *node, tree name, tree args,
>return NULL_TREE;
>  }
>
> +/* Handle the "btf_decl_tag" and "btf_type_tag" attributes.  */
> +
> +static tree
> +handle_btf_tag_attribute (tree * ARG_UNUSED (node), tree name, tree args,
> + int ARG_UNUSED (flags), bool *no_add_attrs)
> +{
> +  if (!args)
> +*no_add_attrs = true;
> +  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
> +{
> +  error ("%qE attribute requires a string", name);
> +  *no_add_attrs = true;
> +}
> +

So with respect to the dwarf2out patch discussion I think attribute
handling should
be similar to how we handle the aligned attribute which makes sure to
build a new
type variant to apply the attribute to if not ATTR_FLAG_TYPE_IN_PLACE.

Richard.

> +  return NULL_TREE;
> +}
> +
>  /* Handle the "nonstring" variable attribute.  */
>
>  static tree
> --
> 2.47.2
>


Re: [PATCH 1/2] doc: Clarify mode of else operand for vec_mask_load_lanesmn

2025-07-02 Thread Richard Biener
On Tue, 1 Jul 2025, Alex Coplan wrote:

> This extends the documentation of the vec_mask_load_lanes optab to
> explicitly state that the mode of the else operand is n, i.e. the mode
> of a single subvector.
> 
> OK to install?

OK.

> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   * doc/md.texi (Standard Names): Clarify mode of else operand for
>   vec_mask_load_lanesmn optab.
> ---
>  gcc/doc/md.texi | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] vect: Misalign checks for gather/scatter.

2025-07-02 Thread Robin Dapp

The else (get_group_load_store_type) can end up returning
VMAT_GATHER_SCATTER and thus require the above checking as well.


Isn't this already covered by

 if (*memory_access_type == VMAT_ELEMENTWISE
 || (*memory_access_type == VMAT_GATHER_SCATTER
  && GATHER_SCATTER_LEGACY_P (*gs_info))
 || *memory_access_type == VMAT_STRIDED_SLP
 || *memory_access_type == VMAT_INVARIANT)
   {
 *alignment_support_scheme = dr_unaligned_supported;
 *misalignment = DR_MISALIGNMENT_UNKNOWN;
   }
 else
   {
 *misalignment = dr_misalignment (first_dr_info, vectype, *poffset);
 *alignment_support_scheme
= vect_supportable_dr_alignment (vinfo, first_dr_info, vectype,
 *misalignment);
   }

(now that non-legacy gather/scatter is not exempt from the alignment check any 
more)?


--
Regards
Robin



Re: [PATCH 2/2] aarch64: Drop const_int from aarch64_maskload_else_operand

2025-07-02 Thread Kyrylo Tkachov



> On 1 Jul 2025, at 18:37, Alex Coplan  wrote:
> 
> The "else operand" to maskload should always be a const_vector, never a
> const_int.
> 
> This was just an issue I noticed while looking through the code, I don't
> have a testcase which shows a concrete problem due to this.
> 
> Testing of that change alone showed ICEs with load lanes vectorization
> and SVE.  That turned out to be because the backend pattern was missing
> a mode for the else operand (causing the middle-end to choose a
> const_int during expansion), fixed thusly.  That in turn exposed an
> issue with the unpredicated load lanes expander which was using the
> wrong mode for the else operand, so fixed that too.
> 
> Bootstrapped/tested on aarch64-linux-gnu, OK for trunk?
> 

Ok.
Thanks,
Kyrill

> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve.md
> (vec_load_lanes): Expand else operand in
> subvector mode, as per optab documentation.
> (vec_mask_load_lanes): Add missing mode for
> operand 3.
> * config/aarch64/predicates.md (aarch64_maskload_else_operand):
> Remove const_int.
> ---
> gcc/config/aarch64/aarch64-sve.md | 4 ++--
> gcc/config/aarch64/predicates.md  | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> <0002-aarch64-Drop-const_int-from-aarch64_maskload_else_op.patch>



Re: [PATCH] [RISC-V] Fix shift type for RVV interleaved stepped patterns [PR120356]

2025-07-02 Thread Alexey Merzlyakov
On Tue, Jul 01, 2025 at 09:28:34AM +0200, Robin Dapp wrote:
> > It corrects the shift type of interleaved stepped patterns for const vector
> > expanding in LRA. The shift instruction was initially LSHIFTRT, and it seems
> > still should be the same type for both LRA and other cases.
> 
> This is OK, thanks.
> 
> -- 
> Regards
> Robin
> 

CI-testing was failed:
https://github.com/ewlu/gcc-precommit-ci/issues/3585#issuecomment-3022157670
for sat_u_add-5-u32.c and vect-reduc-sad-1.c. These failures are compile issues
appeared due to afdo-crossmodule-1b.c file. For some reason, in both cases
the following snippets are being inserted into the compile lines:
  /home/ewlu/.../testsuite/gcc.target/riscv/sat/afdo-crossmodule-1b.c -dumpbase 
""
  for sat_u_add-5-u32.c
  /home/ewlu/.../testsuite/gcc.dg/vect/afdo-crossmodule-1b.c -dumpbase ""
  for vect-reduc-sad-1.c
... which causes the failures.

I've tried to reproduce it locally, but both test-cases are passing
in 100% of runs.

Could we try to re-trigger CI-testing for the patch?

Best regards,
Merzlyakov Alexey


[PATCH v2] c++: Fix FMV return type ambiguation

2025-07-02 Thread Alfie Richards
Hi Jason,

Thanks for the feedback, see below an updated patch.

Again reg-tested on Aarch64 and x86.

Thanks,
Alfie

-- >8 --

Add logic for the case of two FMV annotated functions with identical
signature other than the return type.

Previously this was ignored, this changes the behavior to emit a diagnostic.

gcc/cp/ChangeLog:
PR c++/119498
* decl.cc (duplicate_decls): Change logic to not always exclude FMV
annotated functions in cases of return type non-ambiguation.

gcc/testsuite/ChangeLog:
PR c++/119498
* g++.target/aarch64/pr119498.C: New test.
---
 gcc/cp/decl.cc  |  6 --
 gcc/testsuite/g++.target/aarch64/pr119498.C | 19 +++
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/pr119498.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 83c8e283b56..be26bd39b22 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2014,8 +2014,10 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
}
  /* For function versions, params and types match, but they
 are not ambiguous.  */
- else if ((!DECL_FUNCTION_VERSIONED (newdecl)
-   && !DECL_FUNCTION_VERSIONED (olddecl))
+ else if (((!DECL_FUNCTION_VERSIONED (newdecl)
+&& !DECL_FUNCTION_VERSIONED (olddecl))
+   || !same_type_p (fndecl_declared_return_type (newdecl),
+fndecl_declared_return_type (olddecl)))
   /* Let constrained hidden friends coexist for now, we'll
  check satisfaction later.  */
   && !member_like_constrained_friend_p (newdecl)
diff --git a/gcc/testsuite/g++.target/aarch64/pr119498.C 
b/gcc/testsuite/g++.target/aarch64/pr119498.C
new file mode 100644
index 000..03f1659068d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/pr119498.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
+
+__attribute__ ((target_version ("default"))) int
+foo ();
+
+__attribute__ ((target_version ("default"))) int
+foo () { return 1; } /* { dg-message "old declaration" } */
+
+__attribute__ ((target_version ("dotprod"))) float
+foo () { return 3; } /* { dg-error "ambiguating new declaration" } */
+
+__attribute__ ((target_version ("sve"))) int
+foo2 () { return 1; } /* { dg-message "old declaration" } */
+
+__attribute__ ((target_version ("dotprod"))) float
+foo2 () { return 3; } /* { dg-error "ambiguating new declaration of" } */
-- 
2.34.1



[PATCH] Do not query further vector epilogues after a masked epilogue

2025-07-02 Thread Richard Biener
When doing --param vect-partial-vector-usage=1 we'd continue querying
the target whether it wants more vector epilogues, but when it comes
back with a suggestion we then might iterate endlessly.  Do not
even ask the target when we decided for the last epilogue to be
one with partial vectors.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This also affects 15.

* tree-vect-loop.cc (vect_analyze_loop): Stop querying
further epilogues after one with partial vectors.
---
 gcc/tree-vect-loop.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fd6e0f91214..9cee5195077 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3809,6 +3809,7 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
 suggests to have another one.  */
   masked_p = -1;
   if (!unlimited_cost_model (loop)
+ && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (orig_loop_vinfo)
  && (orig_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p)
  != VOIDmode))
{
-- 
2.43.0


[PATCH v8 2/9] AArch64: reformat branch instruction rules

2025-07-02 Thread Karl Meakin
Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 84 +--
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fcc24e300e6..25286add0c8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -714,14 +714,14 @@ (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+  operands[2]);
+operands[2] = const0_rtx;
+  }
 )
 
 (define_expand "cbranch4"
@@ -729,30 +729,31 @@ (define_expand "cbranch4"
(match_operator 0 "aarch64_comparison_operator"
 [(match_operand:GPF_F16 1 "register_operand")
  (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
+   (label_ref (match_operand 3))
(pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+  operands[2]);
+operands[2] = const0_rtx;
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
@@ -789,10 +790,9 @@ (define_insn "condjump"
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2))
   (pc)))]
   "!aarch64_move_imm (INTVAL (operands[1]), mode)
&& !aarch64_plus_operand (operands[1], mode)
@@ -816,8 +816,8 @@ (define_insn_and_split "*compare_condjump"
 
 (define_insn "aarch64_cb1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
-   (const_int 0))
-  (label_ref (match_operand 1 "" ""))
+   (const_int 0))
+  (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
@@ -841,8 +841,8 @@ (define_insn "aarch64_cb1"
 
 (define_insn "*cb1"
   [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
-(const_int 0))
-  (label_ref (match_operand 1 "" ""))
+(const_int 0))
+  (label_ref (match_operand 1))
   (pc)))
(clobber (reg:CC CC_REGNUM))]
   "!aarch64_track_speculation

[pushed] c++: uninitialized TARGET_EXPR and constexpr [PR120684]

2025-07-02 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In r15-7532 for PR118856 I introduced a TARGET_EXPR with a
TARGET_EXPR_INITIAL of void_node to express that no initialization is done.
And indeed evaluating that doesn't store a value for the TARGET_EXPR_SLOT
variable.

But then at the end of the full-expression, destroy_value stores void_node
to express that its lifetime has ended.  If we evaluate the same
full-expression again, global_ctx->values still holds the void_node, causing
confusion when we try to destroy it again.  So clear out any value before
evaluating a TARGET_EXPR_INITIAL of void_type.

PR c++/120684
PR c++/118856

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) [TARGET_EXPR]: Clear
the value first if is_complex.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/range-for10.C: New test.
---
 gcc/cp/constexpr.cc  | 10 --
 gcc/testsuite/g++.dg/cpp23/range-for10.C | 23 +++
 2 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/range-for10.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 704d936f2ec..f9066bc7932 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8114,14 +8114,20 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
ctx->global->put_value (new_ctx.object, new_ctx.ctor);
ctx = &new_ctx;
  }
+
+   /* If the initializer is complex, evaluate it to initialize slot.  */
+   bool is_complex = target_expr_needs_replace (t);
+   if (is_complex)
+ /* In case no initialization actually happens, clear out any
+void_node from a previous evaluation.  */
+ ctx->global->put_value (slot, NULL_TREE);
+
/* Pass vc_prvalue because this indicates
   initialization of a temporary.  */
r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), vc_prvalue,
  non_constant_p, overflow_p);
if (*non_constant_p)
  break;
-   /* If the initializer is complex, evaluate it to initialize slot.  */
-   bool is_complex = target_expr_needs_replace (t);
if (!is_complex)
  {
r = unshare_constructor (r);
diff --git a/gcc/testsuite/g++.dg/cpp23/range-for10.C 
b/gcc/testsuite/g++.dg/cpp23/range-for10.C
new file mode 100644
index 000..96eab006c32
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/range-for10.C
@@ -0,0 +1,23 @@
+// PR c++/120684
+// { dg-do compile { target c++20 } }
+
+struct basic_string {
+  constexpr ~basic_string() {}
+};
+template  struct lazy_split_view {
+  _Vp _M_base;
+  constexpr int* begin() { return nullptr; }
+  constexpr int* end() { return nullptr; }
+};
+constexpr void test_with_piping() {
+  basic_string input;
+  for (auto e : lazy_split_view(input))
+;
+}
+constexpr bool main_test() {
+  test_with_piping();
+  test_with_piping();
+  return true;
+}
+//int main() { main_test(); }
+static_assert(main_test());

base-commit: dc2797bb44333d5588c14d51c918df51c664d46c
-- 
2.49.0



[PATCH] x86-64: Add RDI clobber to tls_local_dynamic_64 patterns

2025-07-02 Thread H.J. Lu
*tls_local_dynamic_64_ uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_local_dynamic_64 patterns to show it.

PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_local_dynamic_64.
* config/i386/i386.md (*tls_local_dynamic_64_): Add RDI
clobber and use it to generate LEA.
(@tls_local_dynamic_64_): Add a clobber.

OK for master?

-- 
H.J.
From fcd3aedec394b514855a7a408fd20d394f39bbeb Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 3 Jul 2025 10:54:39 +0800
Subject: [PATCH] x86-64: Add RDI clobber to tls_local_dynamic_64 patterns

*tls_local_dynamic_64_ uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_local_dynamic_64 patterns to show it.

	PR target/120908
	* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
	gen_tls_local_dynamic_64.
	* config/i386/i386.md (*tls_local_dynamic_64_): Add RDI
	clobber and use it to generate LEA.
	(@tls_local_dynamic_64_): Add a clobber.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 3 ++-
 gcc/config/i386/i386.md | 8 +---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9657c6ae31f..24aedc136a6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -12616,12 +12616,13 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  if (TARGET_64BIT)
 	{
 	  rtx rax = gen_rtx_REG (Pmode, AX_REG);
+	  rtx rdi = gen_rtx_REG (Pmode, DI_REG);
 	  rtx_insn *insns;
 	  rtx eqv;
 
 	  start_sequence ();
 	  emit_call_insn
-		(gen_tls_local_dynamic_base_64 (Pmode, rax, caddr));
+		(gen_tls_local_dynamic_base_64 (Pmode, rax, caddr, rdi));
 	  insns = end_sequence ();
 
 	  /* Attach a unique REG_EQUAL, to allow the RTL optimizers to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 370e79bb511..07d9a4cb653 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23318,11 +23318,12 @@ (define_insn "*tls_local_dynamic_base_64_"
 	(call:P
 	 (mem:QI (match_operand 1 "constant_call_address_operand" "Bz"))
 	 (match_operand 2)))
-   (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)]
+   (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)
+   (clobber (match_operand:P 3 "register_operand" "=D"))]
   "TARGET_64BIT"
 {
   output_asm_insn
-("lea{q}\t{%&@tlsld(%%rip), %%rdi|rdi, %&@tlsld[rip]}", operands);
+("lea{q}\t{%&@tlsld(%%rip), %q3|%q3, %&@tlsld[rip]}", operands);
   if (TARGET_SUN_TLS)
 return "call\t%p1@plt";
   if (flag_plt || !HAVE_AS_IX86_TLS_GET_ADDR_GOT)
@@ -23359,7 +23360,8 @@ (define_expand "@tls_local_dynamic_base_64_"
 	   (call:P
 	(mem:QI (match_operand 1))
 	(const_int 0)))
-  (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)])]
+  (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)
+  (clobber (match_operand:P 2 "register_operand"))])]
   "TARGET_64BIT"
   "ix86_tls_descriptor_calls_expanded_in_cfun = true;")
 
-- 
2.50.0



[PATCH v4] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

2025-07-02 Thread jeevitha
Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

Changes from V3:
* Replaced named operands with positional operands in inline assembly for 
better readability.
* Considered using _ADDR[0] and _ADDR[1] to make memory reads more explicit to 
the compiler.
* Cleaned up formatting to enhance code clarity. 

Changes from V2:
Replaced eight consecutive spaces with tabs in amo6.c and amo7.c.

Changes from V1:
Corrected the ISA version in the test cases.

Changes to amo.h include the addition of the following load atomic operations:
Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment
Equal, and Fetch and Decrement Bounded. Additionally, Store Twin is added for
store atomic operations.

2025-06-30  Peter Bergner  
Jeevitha Palanisamy  

gcc/
* config/rs6000/amo.h: Add missing atomic memory operations.
* doc/extend.texi (PowerPC Atomic Memory Operation Functions):
Document new functions.

gcc/testsuite/
* gcc.target/powerpc/amo3.c: New test.
* gcc.target/powerpc/amo4.c: Likewise.
* gcc.target/powerpc/amo5.c: Likewise.
* gcc.target/powerpc/amo6.c: Likewise.
* gcc.target/powerpc/amo7.c: Likewise.

diff --git a/gcc/config/rs6000/amo.h b/gcc/config/rs6000/amo.h
index 25ab1c7b4c4..d6743ec8a8a 100644
--- a/gcc/config/rs6000/amo.h
+++ b/gcc/config/rs6000/amo.h
@@ -71,6 +71,51 @@ NAME (TYPE *_PTR, TYPE _VALUE)   
\
   return _RET; \
 }
 
+/* Implementation of the LWAT/LDAT operations that take two input registers
+   and modify one word or double-word of memory and return the value that was
+   previously in the memory location.  The destination and two source
+   registers are encoded with only one register number, so we need three
+   consecutive GPR registers and there is no C/C++ type that will give
+   us that, so we have to use register asm variables to achieve that.
+
+   The LWAT/LDAT opcode requires the address to be a single register,
+   and that points to a suitably aligned memory location.  */
+
+#define _AMO_LD_CMPSWP(NAME, TYPE, OPCODE, FC) \
+static __inline__ TYPE \
+NAME (TYPE *_ADDR, TYPE _COND, TYPE _VALUE)\
+{  \
+  register TYPE _ret asm ("r8");   \
+  register TYPE _cond asm ("r9") = _COND;  \
+  register TYPE _value asm ("r10") = _VALUE;   \
+  __asm__ volatile (OPCODE " %0,%P1,%4\n"  \
+   : "=r" (_ret), "+Q" (*_ADDR)\
+   : "r" (_cond), "r" (_value), "n" (FC)); \
+  return _ret; \
+}
+
+#define _AMO_LD_INCREMENT(NAME, TYPE, OPCODE, FC)  \
+static __inline__ TYPE \
+NAME (TYPE *_ADDR) \
+{  \
+  TYPE _RET;   \
+  __asm__ volatile (OPCODE " %0,%P1,%3\n"  \
+   : "=r" (_RET), "+Q" (_ADDR[0])  \
+   : "Q" (_ADDR[1]), "n" (FC));\
+  return _RET; \
+}
+
+#define _AMO_LD_DECREMENT(NAME, TYPE, OPCODE, FC)  \
+static __inline__ TYPE \
+NAME (TYPE *_ADDR) \
+{  \
+  TYPE _RET;   \
+  __asm__ volatile (OPCODE " %0,%P1,%3\n"  \
+   : "=r" (_RET), "+Q" (_ADDR[1])  \
+   : "Q" (_ADDR[0]), "n" (FC));\
+  return _RET; \
+}
+
 _AMO_LD_SIMPLE (amo_lwat_add,   uint32_t, "lwat", _AMO_LD_ADD)
 _AMO_LD_SIMPLE (amo_lwat_xor,   uint32_t, "lwat", _AMO_LD_XOR)
 _AMO_LD_SIMPLE (amo_lwat_ior,   uint32_t, "lwat", _AMO_LD_IOR)
@@ -78,11 +123,19 @@ _AMO_LD_SIMPLE (amo_lwat_and,   uint32_t, "lwat", 
_AMO_LD_AND)
 _AMO_LD_SIMPLE (amo_lwat_umax,  uint32_t, "lwat", _AMO_LD_UMAX)
 _AMO_LD_SIMPLE (amo_lwat_umin,  uint32_t, "lwat", _AMO_LD_UMIN)
 _AMO_LD_SIMPLE (amo_lwat_swap,  uint32_t, "lwat", _AMO_LD_SWAP)
+_AMO_LD_CMPSWP(amo_lwat_cas_neq, uint32_t, "lwat", _AMO_LD_CS_NE)
+_AMO_LD_INCREMENT (amo_lwat_inc_eq,  uint32_t, "lwat", _AMO_LD_INC_EQUAL)
+_AMO_LD_INCREMENT (amo_lwat_inc_bounded, uint3

[PATCH v3] RISC-V: Mips P8700 Conditional Move Support.

2025-07-02 Thread Umesh Kalappa
Indentation are updated accordingly and no regress found.

gcc/ChangeLog:

*config/riscv/riscv-cores.def(RISCV_CORE): Updated the supported march.
*config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
New file added for mips conditional mov extension.
*config/riscv/riscv-ext.def: Likewise.
*config/riscv/t-riscv: Generates riscv-ext.opt
*config/riscv/riscv-ext.opt: Generated file.
*config/riscv/riscv.cc(riscv_expand_conditional_move): Updated for mips 
cmov
and outlined some code that handle arch cond move.
*config/riscv/riscv.md(movcc): updated expand for MIPS CCMOV.
*config/riscv/mips-insn.md: New file for mips-p8700 ccmov insn.
*gcc/doc/riscv-ext.texi: Updated for mips cmov.

gcc/testsuite/ChangeLog:

*testsuite/gcc.target/riscv/mipscondmov.c: Test file for mips.ccmov 
insn.
---
 gcc/config/riscv/mips-insn.md|  36 +++
 gcc/config/riscv/riscv-cores.def |   3 +-
 gcc/config/riscv/riscv-ext-mips.def  |  35 ++
 gcc/config/riscv/riscv-ext.def   |   1 +
 gcc/config/riscv/riscv-ext.opt   |   4 +
 gcc/config/riscv/riscv.cc| 107 +--
 gcc/config/riscv/riscv.md|   3 +-
 gcc/config/riscv/t-riscv |   3 +-
 gcc/doc/riscv-ext.texi   |   4 +
 gcc/testsuite/gcc.target/riscv/mipscondmov.c |  30 ++
 10 files changed, 189 insertions(+), 37 deletions(-)
 create mode 100644 gcc/config/riscv/mips-insn.md
 create mode 100644 gcc/config/riscv/riscv-ext-mips.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c

diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
new file mode 100644
index 000..de53638d587
--- /dev/null
+++ b/gcc/config/riscv/mips-insn.md
@@ -0,0 +1,36 @@
+;; Machine description for MIPS custom instructions.
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_insn "*movcc_bitmanip"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (if_then_else:GPR
+ (any_eq:X (match_operand:X 1 "register_operand" "r")
+(match_operand:X 2 "const_0_operand" "J"))
+(match_operand:GPR 3 "reg_or_0_operand" "rJ")
+(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
+  "TARGET_XMIPSCMOV"
+{
+  enum rtx_code code = ;
+  if (code == NE)
+return "mips.ccmov\t%0,%1,%z3,%z4";
+  else
+return "mips.ccmov\t%0,%1,%z4,%z3";
+}
+[(set_attr "type" "condmove")
+ (set_attr "mode" "")])
diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 2096c0095d4..98f347034fb 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -169,7 +169,6 @@ RISCV_CORE("xiangshan-kunminghu",   
"rv64imafdcbvh_sdtrig_sha_shcounterenw_"
  "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
  "xiangshan-kunminghu")
 
-RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
- "zaamo_zalrsc_zba_zbb",
+RISCV_CORE("mips-p8700",  "rv64imfd_zicsr_zifencei_zalrsc_zba_zbb",
  "mips-p8700")
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-ext-mips.def 
b/gcc/config/riscv/riscv-ext-mips.def
new file mode 100644
index 000..f24507139f6
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-mips.def
@@ -0,0 +1,35 @@
+/* MIPS extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to make sure updated anything.
+
+Format of DEFINE_RISCV_EXT, please refer to riscv-ext.def.  */
+
+DEFINE

Re: [PATCH v6 1/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 1) [PR109071, PR85788, PR88771, PR106762,

2025-07-02 Thread Richard Biener
On Tue, Jul 1, 2025 at 5:17 PM Qing Zhao  wrote:
>
>
>
> > On Jul 1, 2025, at 03:14, Richard Biener  wrote:
> >
> > On Mon, Jun 30, 2025 at 10:37 PM Qing Zhao  wrote:
> >>
> >> Hi, David,
> >>
> >> Thank you for the info.
> >>
> >> Yes, this does sound like a general issue in this area.
> >>
> >> Is there any mechanism in GCC currently that records such original source 
> >> code information for IRs
> >> after the compiler transformation?
> >>
> >> If not, shall we add such information in the IR for this purpose? Is doing 
> >> this very expensive?
> >>
> >>> On Jun 30, 2025, at 12:23, David Malcolm  wrote:
> >>>
> >>> On Mon, 2025-06-30 at 16:47 +, Qing Zhao wrote:
> >>>
> >>> [...snip...]
> >>>
>  The output with -fdiagnostics-show-context=1 is:
> 
>  /home/opc/Work/GCC/latest-gcc-
>  write/gcc/testsuite/gcc.dg/pr109071_7.c: In function ‘foo’:
>  /home/opc/Work/GCC/latest-gcc-
>  write/gcc/testsuite/gcc.dg/pr109071_7.c:12:6: warning: array
>  subscript -1 is below array bounds of ‘int[10]’ [-Warray-bounds=]
>    12 | a[i] = -1; /* { dg-warning "is below array bounds of" }
>  */
>   | ~^~~
>   ‘foo’: events 1-2
>    11 |   if (i == -1)
>   |  ^
>   |  |
>   |  (1) when the condition is evaluated to true
> >>>
> >>> Looks great, but one caution: presumably "true" in this context refers
> >>> to the state of the IR when the warning is emitted, rather than what
> >>> the user wrote.  I've run into this in the analyzer;
> >>> see PR analyzer/100116 (which I don't have a good solution for, alas).
> >>>
> >>> Is there a way to get at the original sense of the condition, in terms
> >>> of what the user wrote?  I'm guessing that this has been canonicalized
> >>> away long before the middle-end warnings see this.
> >>
> >> As long as such information is lost during transformation, there is no way 
> >> to get
> >> the original source code, I guess.
> >
> > We could try to track condition inversions done with a flag on the gcond.
> > For example we canonicalize
> >
> >  if (bool_var == 0)
> >
> > to
> >
> > if (bool_var != 0)
> >
> > with the true/false sense of the edges swapped.
> One flag is enough for such transformation.
> >
> > Instead of "when the condition is evaluated to true" we could also
> > print "when execution continues to" and then point to the first
> > stmt location we can find on the path executed.
>
> This is doable, but I guess the information might be confusing to user too.
> >
> > Note we'll also turn
> >
> >   if (a)
> > if (b)
> >   {
> >   ...
> >
> > into
> >
> >  tem = a && b;
> >  if (tem)
> >{
> > ...
> >
> > where the location of the if retained is usually the outer one and the
> > split out part might or might not have sensible locations.  That's also
> > an issue for debugging of course.
>
> However, for such cases, we might need a new data structure to record
> the original condition statements and attach it to the gcond  in order to
> get the original source code information.
>
> Each transformation that might change the condition need to be investigated 
> and
> maintain such new data structure.  This might be quite tedious and 
> error-prone.
> >
> > Caret locations and thus "paths" will always have this kind of issues.
> > Another variant might be to collect a predicate expression up to
> > user variables and print "when `a && b' evaluates to true/false"
> > for the above case instead.
>
> Yes, this is a good workaround too.
>
> And there should be other transformations that might change the conditions,
> in addition to the existing compiler transformations,  there might be new 
> transformations
>  that will be added later to change the conditions too.
>
> >
> > I think we need to amend the documentation a bit about these
> > kind of pitfalls.
>
> Yes. This is necessary.
>
> > And fixup locations we put on stmts eventually.
>
> Yes, looks like a big project…

As one says, the perfect is the enemy of the good.

Richard.

>
> Qing
>
>
> >
> > Richard.
> >
> >> Not sure whether the LOCATION_T can include such information?
> >>>
> >>> Or perhaps the message for (1) could say exactly what condition it
> >>> considers to be true e.g.
> >>> (1) when the condition "i == -1" is evaluated to true
> >>> or somesuch, which might give the user a better clue if the sense of
> >>> the conditional has been reversed during canonicalization/optimization.
> >>
> >> This looks like a reasonable workaround based on the current IR.
> >>>
> >>> Caveat: I didn't reread the latest version of your patch, but am just
> >>> assuming it's looking at the cfg edge flags when the warning is
> >>> emitted; sorry if I'm making a false assumption here.
> >>
> >> Yes, that’s right.  You made good assumptions here -:).
> >>
> >> thanks.
> >>
> >> Qing
> >>>
> 
> >>> Dave
>
>


[PATCH] libstdc++: Format chrono %a/%A/%b/%B/%p using locale's time_put [PR117214]

2025-07-02 Thread XU Kailiang


C++ formatting locale could have a custom time_put that performs
differently from the C locale, so do not use __timepunct directly.

libstdc++-v3/ChangeLog:

PR libstdc++/117214
* include/bits/chrono_io.h (__formatter_chrono::_M_a_A,
__formatter_chrono::_M_b_B, __formatter_chrono::_M_p): use
_M_locale_fmt to format %a/%A/%b/%B/%p.
* testsuite/std/time/format/pr117214_custom_timeput.cc: New
test.

Signed-off-by: XU Kailiang 
---
 libstdc++-v3/include/bits/chrono_io.h | 31 ++--
 .../time/format/pr117214_custom_timeput.cc| 36 +++
 2 files changed, 47 insertions(+), 20 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index abbf4efcc3b..8358105c26b 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -905,14 +905,10 @@ namespace __format
}
 
  locale __loc = _M_locale(__ctx);
- const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
- const _CharT* __days[7];
- if (__full)
-   __tp._M_days(__days);
- else
-   __tp._M_days_abbreviated(__days);
- __string_view __str(__days[__wd.c_encoding()]);
- return _M_write(std::move(__out), __loc, __str);
+ struct tm __tm{};
+ __tm.tm_wday = __wd.c_encoding();
+ return _M_locale_fmt(std::move(__out), __loc, __tm,
+  __full ? 'A' : 'a', 0);
}
 
   template
@@ -936,14 +932,10 @@ namespace __format
}
 
  locale __loc = _M_locale(__ctx);
- const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
- const _CharT* __months[12];
- if (__full)
-   __tp._M_months(__months);
- else
-   __tp._M_months_abbreviated(__months);
- __string_view __str(__months[(unsigned)__m - 1]);
- return _M_write(std::move(__out), __loc, __str);
+ struct tm __tm{};
+ __tm.tm_mon = (unsigned)__m - 1;
+ return _M_locale_fmt(std::move(__out), __loc, __tm,
+  __full ? 'B' : 'b', 0);
}
 
   template
@@ -1329,10 +1321,9 @@ namespace __format
__hi %= 24;
 
  locale __loc = _M_locale(__ctx);
- const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
- const _CharT* __ampm[2];
- __tp._M_am_pm(__ampm);
- return _M_write(std::move(__out), __loc, __ampm[__hi >= 12]);
+ struct tm __tm{};
+ __tm.tm_hour = __hi;
+ return _M_locale_fmt(std::move(__out), __loc, __tm, 'p', 0);
}
 
   template
diff --git a/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc 
b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
new file mode 100644
index 000..8c9f3d29bc6
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/format/pr117214_custom_timeput.cc
@@ -0,0 +1,36 @@
+// { dg-do run { target c++20 } }
+
+#include 
+#include 
+#include 
+#include 
+
+struct custom_time_put : std::time_put
+{
+  iter_type
+  do_put(iter_type out, std::ios_base& io, char_type fill, const tm* t,
+char format, char modifier) const override
+  {
+using Base = std::time_put;
+
+switch (format) {
+  case 'a': case 'A': case 'b': case 'B': case 'p':
+   *out++ = '[';
+   *out++ = format;
+   *out++ = ']';
+}
+return Base::do_put(out, io, fill, t, format, modifier);
+  }
+};
+
+int main()
+{
+  using namespace std::chrono;
+  std::locale loc(std::locale::classic(), new custom_time_put);
+#define test(t, fmt, exp) VERIFY( std::format(loc, fmt, t) == exp )
+  test(Monday,  "{:L%a}", "[a]Mon");
+  test(Monday,  "{:L%A}", "[A]Monday");
+  test(January, "{:L%b}", "[b]Jan");
+  test(January, "{:L%B}", "[B]January");
+  test(1h,  "{:L%p}", "[p]AM");
+}
-- 
2.50.0



[PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-02 Thread Xi Ruoyao
The register_operand predicate can match subreg, then we'd have a subreg
of subreg and it's invalid.  Use lowpart_subreg to avoid the nested
 subreg.

gcc/ChangeLog:

* config/loongarch/loongarch.md (crc_combine): Avoid nested
subreg.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr120708.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk and
releases/gcc-15?

 gcc/config/loongarch/loongarch.md |  3 ++-
 .../gcc.c-torture/compile/pr120708.c  | 20 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr120708.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a13398fdff4..8cf2ac90c64 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4603,9 +4603,10 @@ (define_insn_and_split "*crc_combine"
   "&& true"
   [(set (match_dup 3) (match_dup 2))
(set (match_dup 0)
-   (unspec:SI [(match_dup 3) (subreg:SI (match_dup 1) 0)] CRC))]
+   (unspec:SI [(match_dup 3) (match_dup 1)] CRC))]
   {
 operands[3] = gen_reg_rtx (mode);
+operands[1] = lowpart_subreg (SImode, operands[1], DImode);
   })
 
 ;; With normal or medium code models, if the only use of a pc-relative
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr120708.c 
b/gcc/testsuite/gcc.c-torture/compile/pr120708.c
new file mode 100644
index 000..9b37e608d7f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr120708.c
@@ -0,0 +1,20 @@
+typedef __UINT8_TYPE__ uint8_t;
+typedef __UINT32_TYPE__ uint32_t;
+
+typedef struct
+{
+  uint32_t dword[2];
+  uint8_t byte[8];
+} reg64_t;
+reg64_t TestF20F_opgd, TestF20F_oped;
+
+void
+TestF20F ()
+{
+  TestF20F_opgd.dword[0] ^= TestF20F_oped.byte[0];
+  for (int i = 0; i < 8; i++)
+if (TestF20F_opgd.dword[0] & 1)
+  TestF20F_opgd.dword[0] = TestF20F_opgd.dword[0] >> 1 ^ 
(uint32_t)2197175160UL;
+else
+  TestF20F_opgd.dword[0] = TestF20F_opgd.dword[0] >> 1;
+}
-- 
2.50.0



Re: [PATCH v4 2/6] dwarf: create annotation DIEs for btf tags

2025-07-02 Thread Richard Biener
On Tue, Jul 1, 2025 at 11:20 PM David Faust  wrote:
>
>
>
> On 7/1/25 01:02, Richard Biener wrote:
> > On Mon, Jun 30, 2025 at 9:12 PM David Faust  wrote:
> >>
> >>
> >>
> >> On 6/30/25 06:11, Richard Biener wrote:
>  +static void
>  +gen_btf_decl_tag_dies (tree t, dw_die_ref target, dw_die_ref 
>  context_die)
>  +{
>  +  if (t == NULL_TREE || !DECL_P (t) || !target)
>  +return;
>  +
>  +  tree attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
>  +  if (attr == NULL_TREE)
>  +return;
>  +
>  +  gen_btf_tag_dies (attr, target, context_die);
>  +
>  +  /* Strip the decl tag attribute once we have created the annotation 
>  DIEs
>  + to avoid attempting process it multiple times.  Global variable
>  + declarations may reach this function more than once.  */
>  +  DECL_ATTRIBUTES (t)
>  += remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> >>> I do not like modifying trees as part of dwarf2out.  You should be able to
> >>> see whether a DIE already has the respective attribute applied?
> >>
> >> Yes, you're right. For decl_tag the case is simple and better handled by
> >> consulting the hash table. Simple fix and this remove_attribute can be
> >> deleted.
> >>
> >> Understood re: modifying trees in dwarf2out. I agree it's not ideal.
> >>
> >> For this case the remove_attribute can be deleted. For the two below,
> >> one is already immediately restored and the other could be as well so
> >> that there are no lasting changes in the tree at all.
> >>
> >> I will explain the reasoning some more below.
> >>
> >>>
>  +}
>  +
>   /* Given a pointer to an arbitrary ..._TYPE tree node, return a 
>  debugging
>  entry that chains the modifiers specified by CV_QUALS in front of the
>  given type.  REVERSE is true if the type is to be interpreted in the
>  @@ -13674,6 +13894,7 @@ modified_type_die (tree type, int cv_quals, bool 
>  reverse,
> tree item_type = NULL;
> tree qualified_type;
> tree name, low, high;
>  +  tree tags;
> dw_die_ref mod_scope;
> struct array_descr_info info;
> /* Only these cv-qualifiers are currently handled.  */
>  @@ -13783,10 +14004,62 @@ modified_type_die (tree type, int cv_quals, 
>  bool reverse,
>    dquals &= cv_qual_mask;
>    if ((dquals & ~cv_quals) != TYPE_UNQUALIFIED
>    || (cv_quals == dquals && DECL_ORIGINAL_TYPE (name) != 
>  type))
>  -   /* cv-unqualified version of named type.  Just use
>  -  the unnamed type to which it refers.  */
>  -   return modified_type_die (DECL_ORIGINAL_TYPE (name), 
>  cv_quals,
>  - reverse, context_die);
>  +   {
>  + tree dtags = lookup_attribute ("btf_type_tag",
>  +TYPE_ATTRIBUTES (dtype));
>  + if ((tags = lookup_attribute ("btf_type_tag",
>  +   TYPE_ATTRIBUTES (type)))
>  + && !attribute_list_equal (tags, dtags))
>  +   {
>  + /* Use of a typedef with additional btf_type_tags.
>  +Create a new typedef DIE to which we can attach the
>  +additional type_tag DIEs without disturbing other 
>  users of
>  +the underlying typedef.  */
>  + dw_die_ref mod_die = modified_type_die (dtype, 
>  cv_quals,
>  + reverse, 
>  context_die);
>  + mod_die = clone_die (mod_die);
>  + add_child_die (comp_unit_die (), mod_die);
>  + if (!lookup_type_die (type))
>  +   equate_type_number_to_die (type, mod_die);
>  +
>  + /* 'tags' is an accumulated list of type_tag attributes
>  +for the typedef'd type on both sides of the typedef.
>  +'dtags' is the set of type_tag attributes only 
>  appearing
>  +in the typedef itself.
>  +Find the set of type_tags only on the _use_ of the
>  +typedef, i.e. (tags - dtags).  By construction these
>  +additional type_tags have been chained onto the 
>  head of
>  +the attribute list of the original typedef.  */
>  + tree t = tags;
>  + bool altered_chain = false;
>  + while (t)
>  +   {
>  + if (TREE_CHAIN (t) == dtags)
>  +   {
>  + TREE_CHAIN (t) = NULL_TREE;
>  + altered_chain = true;

[PATCH] tree-optimization/118669 - fixup wrongly aligned loads/stores

2025-07-02 Thread Richard Biener
The vectorizer tracks alignment of datarefs with dr_aligned
and dr_unaligned_supported but that's aligned with respect to
the target alignment which can be less aligned than the mode
used for the access.  The following fixes this discrepancy
for vectorizing loads and stores.  The issue is visible for
aarch64 SVE and risc-v where VLA vector modes have larger than
element alignment but the target handles element alignment
just fine.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

PR tree-optimization/118669
* tree-vect-stmts.cc (vectorizable_load): Emit loads
with proper (element) alignment.
(vectorizable_store): Likewise.
---
 gcc/tree-vect-stmts.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 29368c00c8d..c7e160fa29b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9682,7 +9682,8 @@ vectorizable_store (vec_info *vinfo,
  data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
  dataref_offset ? dataref_offset
  : build_int_cst (ref_type, 0));
- if (alignment_support_scheme == dr_aligned)
+ if (alignment_support_scheme == dr_aligned
+ && align >= TYPE_ALIGN_UNIT (vectype))
;
  else
TREE_TYPE (data_ref)
@@ -11715,7 +11716,8 @@ vectorizable_load (vec_info *vinfo,
  {
data_ref = fold_build2 (MEM_REF, ltype,
dataref_ptr, offset);
-   if (alignment_support_scheme == dr_aligned)
+   if (alignment_support_scheme == dr_aligned
+   && align >= TYPE_ALIGN_UNIT (ltype))
  ;
else
  TREE_TYPE (data_ref)
-- 
2.43.0


[PATCH] libstdc++: Make VERIFY a variadic macro

2025-07-02 Thread Jonathan Wakely
This defines the testsuite assertion macro VERIFY so that it allows
un-parenthesized expressions containing commas. This matches how assert
is defined in C++26, following the approval of P2264R7.

The primary motivation is to allow expressions that the preprocessor
splits into multiple arguments, e.g.
VERIFY( vec == std::vector{1,2,3,4} );

To achieve this, VERIFY is redefined as a variadic macro and then the
arguments are grouped together again through the use of __VA_ARGS__.

The implementation is complex due to the following points:

- The arguments __VA_ARGS__ are contextually-converted to bool, so that
  scoped enums and types that are not contextually convertible to bool
  cannot be used with VERIFY.
- bool(__VA_ARGS__) is used so that multiple arguments (i.e. those which
  are separated by top-level commas) are ill-formed. Nested commas are
  allowed, but likely mistakes such as VERIFY( cond, "some string" ) are
  ill-formed.
- The bool(__VA_ARGS__) expression needs to be unevaluated, so that we
  don't evaluate __VA_ARGS__ more than once. The simplest way to do that
  would be just sizeof bool(__VA_ARGS__), without parentheses to avoid a
  vexing parse for VERIFY(bool(i)). However that wouldn't work for e.g.
  VERIFY( []{ return true; }() ), because lambda expressions are not
  allowed in unevaluated contexts until C++20. So we use another
  conditional expression with bool(__VA_ARGS__) as the unevaluated
  operand.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_hooks.h (VERIFY): Define as variadic
macro.
* testsuite/ext/verify_neg.cc: New test.
---

Tested powerpc64le-linux.

 libstdc++-v3/testsuite/ext/verify_neg.cc  | 28 +++
 libstdc++-v3/testsuite/util/testsuite_hooks.h | 17 +--
 2 files changed, 35 insertions(+), 10 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/ext/verify_neg.cc

diff --git a/libstdc++-v3/testsuite/ext/verify_neg.cc 
b/libstdc++-v3/testsuite/ext/verify_neg.cc
new file mode 100644
index ..ce033741beeb
--- /dev/null
+++ b/libstdc++-v3/testsuite/ext/verify_neg.cc
@@ -0,0 +1,28 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct X { explicit operator void*() const { return nullptr; } };
+
+void
+test_VERIFY(int i)
+{
+  // This should not be parsed as a function type bool(bool(i)):
+  VERIFY( bool(i) );
+
+  // This should not produce warnings about lambda in unevaluated context:
+  VERIFY( []{ return 1; }() );
+
+  // Only one expression allowed:
+  VERIFY(1, 2); // { dg-error "in expansion of macro" }
+  // { dg-error "compound expression in functional cast" "" { target *-*-* } 0 
}
+
+  // A scoped enum is not contextually convertible to bool:
+  enum class E { E0 };
+  VERIFY( E::E0 ); // { dg-error "could not convert" }
+
+  // explicit conversion to void* is not contextually convertible to bool:
+  X x;
+  VERIFY( x ); // { dg-error "in expansion of macro" }
+  // { dg-error "invalid cast .* to type 'bool'" "" { target *-*-* } 0 }
+}
diff --git a/libstdc++-v3/testsuite/util/testsuite_hooks.h 
b/libstdc++-v3/testsuite/util/testsuite_hooks.h
index faa01ba6abd8..bf34fd121c1b 100644
--- a/libstdc++-v3/testsuite/util/testsuite_hooks.h
+++ b/libstdc++-v3/testsuite/util/testsuite_hooks.h
@@ -58,16 +58,13 @@
 # define _VERIFY_PRINT(S, F, L, P, C) __builtin_printf(S, F, L, P, C)
 #endif
 
-#define VERIFY(fn)  \
-  do\
-  { \
-if (! (fn))
\
-  {
\
-   _VERIFY_PRINT("%s:%d: %s: Assertion '%s' failed.\n",\
- __FILE__, __LINE__, __PRETTY_FUNCTION__, #fn);\
-   __builtin_abort();  \
-  }
\
-  } while (false)
+#define VERIFY(...)\
+   ((void)((__VA_ARGS__)   \
+? (void)(true ? true : bool(__VA_ARGS__))  \
+: (_VERIFY_PRINT("%s:%d: %s: Assertion '%s' failed.\n",\
+ __FILE__, __LINE__, __PRETTY_FUNCTION__,  \
+ #__VA_ARGS__),\
+   __builtin_abort(
 
 #ifdef _GLIBCXX_HAVE_UNISTD_H
 # include 
-- 
2.50.0



[PATCH v8 7/9] AArch64: precommit test for CMPBR instructions

2025-07-02 Thread Karl Meakin
Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1991 ++
 gcc/testsuite/lib/target-supports.exp|   14 +-
 2 files changed, 1999 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..4b2408fdc84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1991 @@
+// Test that the instructions added by FEAT_CMPBR are emitted
+// { dg-do compile }
+// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } }
+// { dg-options "-march=armv9.5-a+cmpbr -O2" }
+// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } 
{\.L[0-9]+} } }
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return (x0 op rhs) ? taken() : not_taken();
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// Comparisons against the immediate 0 can be done for all types,
+// because we can use the wzr/xzr register as one of the operands.
+// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible,
+// because they have larger range.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+// CBB and CBH cannot have immediate operands.
+// Instead we have to do a MOV+CB.
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 64 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead,
+//   because B has a longer range than CB.
+COMPARE_ALL(u8, i8, 64);
+COMPARE_ALL(u16, i16, 64);
+COMPARE_ALL(u32, i32, 64);
+COMPARE_ALL(u64, i64, 64);
+
+// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 
12
+// bits), but it can be materialized in a single MOV.
+COMPARE_ALL(u16, i16, 4098);
+COMPARE_ALL(u32, i32, 4098);
+COMPARE_ALL(u64, i64, 4098);
+
+// If the branch destination is out of range (1KiB), we have to generate an
+// extra B instruction (which can handle larger displacements) and branch 
around
+// it
+
+// clang-format off
+#define STORE_1()   z = 0;
+#define STORE_2()   STORE_1()   STORE_1()
+#define STORE_4()   STORE_2()   STORE_2()
+#define STORE_8()   STORE_4()   STORE_4()
+#define STORE_16()  STORE_8()   STORE_8()
+#define STORE_32()  STORE_16()  STORE_16()
+#define STORE_64()  STORE_32()  STORE_32()
+#define STORE_128() STORE_64()  STORE_64()
+#define STORE_256() STORE_128() STORE_128()
+// clang-format on
+
+#define FAR_BRANCH(ty, rhs)
\
+  int far_branch_##ty##_x0_eq_##rhs(ty x0, ty x1) {
\
+volatile int z = 0;
\
+if (__builtin_expect(x0 == rhs, 1)) {  
\
+  STORE_256(); 
\
+}  

[PATCH v8 4/9] AArch64: add constants for branch displacements

2025-07-02 Thread Karl Meakin
Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
(BRANCH_LEN_N_128MiB): Likewise.
(BRANCH_LEN_P_1MiB): Likewise.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
---
 gcc/config/aarch64/aarch64.md | 60 +--
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8ce991e2f35..3f37ea6cff7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -704,7 +704,19 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+;; Maximum PC-relative positive/negative displacements for various branching
+;; instructions.
+(define_constants
+  [
+;; +/- 1MiB.  Used by B., CBZ, CBNZ.
+(BRANCH_LEN_P_1MiB  1048572)
+(BRANCH_LEN_N_1MiB -1048576)
 
+;; +/- 32KiB.  Used by TBZ, TBNZ.
+(BRANCH_LEN_P_32KiB  32764)
+(BRANCH_LEN_N_32KiB -32768)
+  ]
+)
 
 ;; ---
 ;; Conditional jumps
@@ -769,13 +781,17 @@ (define_insn "aarch64_bcond"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -830,13 +846,17 @@ (define_insn "aarch64_cbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -870,13 +890,17 @@ (define_insn "*aarch64_tbz1"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768))
-  (lt (minus (match_dup 1) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
@@ -931,13 +955,17 @@ (define_insn "@aarch64_tbz"
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768))
-  (lt (minus (match_dup 2) (pc)) (const_int 32764)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_i

[PATCH v3] tree-optimization/120780: Support object size for containing objects

2025-07-02 Thread Siddhesh Poyarekar
MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access.  Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.

gcc/ChangeLog:

PR tree-optimization/120780
* tree-object-size.cc (inner_at_offset,
get_wholesize_for_memref): New functions.
(addr_object_size): Call GET_WHOLESIZE_FOR_MEMREF.

gcc/testsuite/ChangeLog:

PR tree-optimization/120780
* gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:
* Skip over sub-byte offsets

Changes from v1:
* Use byte_position to get byte position of a field

Testing:
- x86_64 bootstrap and test
- i686 build and test
- config=ubsan bootstrap

 .../builtin-dynamic-object-size-pr120780.c| 233 ++
 gcc/tree-object-size.cc   |  90 ++-
 2 files changed, 322 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
new file mode 100644
index 000..0d6593ec828
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -0,0 +1,233 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+typedef __SIZE_TYPE__ size_t;
+#define NUM_MCAST_RATE 6
+
+#define MIN(a,b) ((a) < (b) ? (a) : (b))
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+
+struct inner
+{
+  int dummy[4];
+};
+
+struct container
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct inner mesh;
+};
+
+static void
+test1_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container *sdata =
+(struct container *) ((void *) ifmsh
+ - __builtin_offsetof (struct container, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (&sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test1 (size_t sz)
+{
+  struct container *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->mesh;
+
+  test1_child (ifmsh,
+  (sz > sizeof (sdata->mcast_rate)
+   ? sz - sizeof (sdata->mcast_rate) : 0));
+
+  __builtin_free (sdata);
+}
+
+struct container2
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  union
+{
+  int dummy;
+  double dbl;
+  struct inner mesh;
+} u;
+};
+
+static void
+test2_child (struct inner *ifmsh, size_t sz)
+{ 
+  struct container2 *sdata =
+(struct container2 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container2, u.mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  size_t diff = sizeof (*sdata) - sz;
+  size_t expected = MIN(sizeof (double), MAX (sizeof (sdata->u), diff) - diff);
+
+  if (__builtin_dynamic_object_size (&sdata->u.dbl, 1) != expected)
+FAIL ();
+
+  expected = MAX (sizeof (sdata->u.mesh), diff) - diff;
+  if (__builtin_dynamic_object_size (&sdata->u.mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test2 (size_t sz)
+{
+  struct container2 *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->u.mesh;
+
+  test2_child (ifmsh, sz);;
+
+  __builtin_free (sdata);
+}
+
+struct container3
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  char mesh[8];
+};
+
+static void
+test3_child (char ifmsh[], size_t expected)
+{ 
+  struct container3 *sdata =
+(struct container3 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container3, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test3 (size_t sz)
+{
+  struct container3 *sdata = __builtin_malloc (sz);
+  char *ifmsh = sdata->mesh;
+  size_t diff = sizeof (*sdata) - sz;
+
+  test3_child (ifmsh, MAX(sizeof (sdata->mesh), diff) - diff);
+
+  __builtin_free (sdata);
+}
+
+
+struct container4
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct
+{
+  int dummy;
+  struct inner mesh;
+} s;
+};
+
+static void
+test4_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container4 *sdata =
+(struct container4 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container4, s.mesh));
+
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (&sdata->s.mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test4 (size_t 

[PATCH v8 8/9] AArch64: rules for CMPBR instructions

2025-07-02 Thread Karl Meakin
Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch4): Rename to ...
(cbranch4): ...here, and emit CMPBR if possible.
(cbranch4): New expand rule.
(aarch64_cb): New insn rule.
(aarch64_cb): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c:
---
 gcc/config/aarch64/aarch64-protos.h   |   2 +
 gcc/config/aarch64/aarch64.cc |  33 +
 gcc/config/aarch64/aarch64.md |  95 ++-
 gcc/config/aarch64/constraints.md |  18 +
 gcc/config/aarch64/iterators.md   |  30 +
 gcc/testsuite/gcc.target/aarch64/cmpbr-far.c  |  52 ++
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 749 +++---
 gcc/testsuite/gcc.target/aarch64/cmpbr.h  |  16 +
 .../gcc.target/aarch64/sve/mask_store.c   |  28 +
 gcc/testsuite/gcc.target/aarch64/sve/sqlite.c | 205 +
 10 files changed, 766 insertions(+), 462 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr-far.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_store.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sqlite.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 31f2f5b8bd2..e946e8da11d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, 
vec,
 unsigned int, tree, unsigned int,
 tree *);
 
+bool aarch64_cb_rhs (rtx_code op_code, rtx rhs);
+
 namespace aarch64 {
   void report_non_ice (location_t, tree, unsigned int);
   void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2cd03b941bd..f3ce3a15b09 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern)
   gcc_unreachable ();
 }
 
+/* Return true if RHS is an operand suitable for a CB (immediate)
+   instruction.  OP_CODE determines the type of the comparison.  */
+bool
+aarch64_cb_rhs (rtx_code op_code, rtx rhs)
+{
+  if (!CONST_INT_P (rhs))
+return REG_P (rhs);
+
+  HOST_WIDE_INT rhs_val = INTVAL (rhs);
+
+  switch (op_code)
+{
+case EQ:
+case NE:
+case GT:
+case GTU:
+case LT:
+case LTU:
+  return IN_RANGE (rhs_val, 0, 63);
+
+case GE:  /* CBGE:   signed greater than or equal */
+case GEU: /* CBHS: unsigned greater than or equal */
+  return IN_RANGE (rhs_val, 1, 64);
+
+case LE:  /* CBLE:   signed less than or equal */
+case LEU: /* CBLS: unsigned less than or equal */
+  return IN_RANGE (rhs_val, -1, 62);
+
+default:
+  return false;
+}
+}
+
 /* Return the location of a piece that is known to be passed or returned
in registers.  FIRST_ZR is the first unused vector argument register
and FIRST_PR is the first unused predicate argument register.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0169ec5cf24..c50c41753a7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -713,6 +713,10 @@ (define_constants
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
@@ -720,7 +724,7 @@ (define_constants
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
@@ -728,12 +732,29 @@ (define_expand "cbranch4"
   (pc)))]
   ""
   {
-operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-  operands[2]);
-operands[2] = const0_rtx;
+if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2]))
+  {
+   /* The branch is supported natively.  */
+  }
+else
+  {
+operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+ 

Re: [PATCH v7 8/9] AArch64: rules for CMPBR instructions

2025-07-02 Thread Richard Sandiford
Karl Meakin  writes:
> On 01/07/2025 11:02, Richard Sandiford wrote:
>> Karl Meakin  writes:
>>> @@ -763,6 +784,68 @@ (define_expand "cbranchcc4"
>>> ""
>>>   )
>>>   
>>> +;; Emit a `CB (register)` or `CB (immediate)` instruction.
>>> +;; The immediate range depends on the comparison code.
>>> +;; Comparisons against immediates outside this range fall back to
>>> +;; CMP + B.
>>> +(define_insn "aarch64_cb"
>>> +  [(set (pc) (if_then_else (INT_CMP
>>> +(match_operand:GPI 0 "register_operand" "r")
>>> +(match_operand:GPI 1 "nonmemory_operand"
>>> +   "r"))
>>> +  (label_ref (match_operand 2))
>>> +  (pc)))]
>>> +  "TARGET_CMPBR && aarch64_cb_rhs (, operands[1])"
>>> +  {
>>> +if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
>>> +  return aarch64_gen_far_branch (operands, 2, "L",
>>> + "cb\\t%0, 
>>> %1, ");
>>> +else
>>> +  return "cb\\t%0, %1, %l2";
>>> +  }
>>> +  [(set_attr "type" "branch")
>>> +   (set (attr "length")
>>> +   (if_then_else (and (ge (minus (match_dup 2) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 2) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_int 4)
>>> + (const_int 8)))
>>> +   (set (attr "far_branch")
>>> +   (if_then_else (and (ge (minus (match_dup 2) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 2) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_string "no")
>>> + (const_string "yes")))]
>>> +)
>>> +
>>> +;; Emit a `CBB (register)` or `CBH (register)` instruction.
>>> +(define_insn "aarch64_cb"
>>> +  [(set (pc) (if_then_else (INT_CMP
>>> +(match_operand:SHORT 0 "register_operand" "r")
>>> +(match_operand:SHORT 1 "aarch64_reg_or_zero" "rZ"))
>>> +  (label_ref (match_operand 2))
>>> +  (pc)))]
>>> +  "TARGET_CMPBR"
>>> +  "cb\\t%0, %1, %l2"
>> This instruction also needs to handle far branches, in a similar way
>> to the GPI one.  (It would be good to have a test for that too).
>>
>> Why does the code for u32_x0_uge_64 etc. not change?  I would have
>> expected 64 to be in range for that, whether it's treated as >= 64
>> or as > 63.
> Because the comparison is normalized to `x0 <= 63` which does not fit in 
> the range for `CBLS` (-1 to 62)

Ah, ok.  But then the names seem a bit counter-intuitive.  For:

+return (x0 op rhs) ? taken() : not_taken();   

a natural implementation would be:

cmpx0 op rhs
branch-if-false 1f
b  taken
1f:
b  not_taken

where the branch will be taken if the condition is false rather than true.

So how about:

 return __builtin_expect (x0 op rhs, 0) ? taken() : not_taken();

?  That way we should get:

cmpx0 op rhs
branch-if-true 1f
b  not_taken
1f:
b  taken

Thanks,
Richard


[PATCH v8 0/9] AArch64: CMPBR support

2025-07-02 Thread Karl Meakin
This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Changelog:
* v8:
  - Support far branches for the `CBB` and `CBH` instructions, and add tests 
for them.
  - Mark the branch in the far branch tests likely, so that the optimizer does
not invert the condition.
  - Use regex captures for register and label names so that the tests are less 
fragile.
  - Minor formatting fixes.
* v7:
  - Support far branches and add a test for them.
  - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`.
  - Delete the new predicates that aren't needed anymore.
  - Minor formatting and comment fixes.
* v6:
  - Correct the constraint string for immediate operands.
  - Drop the commit for adding `%j` format specifiers. The suffix for
the `cb` instruction is now calculated by the `cmp_op` code
attribute.
* v5:
  - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
CMPBR...). Every commit in the series should now produce a correct
compiler.
  - Reduce excessive diff context by not passing `--function-context` to
`git format-patch`.
* v4:
  - Added a commit to use HS/LO instead of CS/CC mnemonics.
  - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
CBHS, CBLE and CBLS have different ranges of allowed immediates than
the other comparisons.

Karl Meakin (9):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |   39 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  570 --
 gcc/config/aarch64/constraints.md |   18 +
 gcc/config/aarch64/iterators.md   |   30 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr-far.c  |   52 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1824 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.h  |   16 +
 .../gcc.target/aarch64/sve/mask_store.c   |   28 +
 gcc/testsuite/gcc.target/aarch64/sve/sqlite.c |  205 ++
 gcc/testsuite/lib/target-supports.exp |   14 +-
 16 files changed, 2586 insertions(+), 224 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr-far.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_store.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sqlite.c

--
2.48.1


[PATCH v8 5/9] AArch64: make `far_branch` attribute a boolean

2025-07-02 Thread Karl Meakin
The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz1): Likewise.
(*aarch64_tbz1): Likewise.
(@aarch64_tbz): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3f37ea6cff7..0169ec5cf24 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -569,9 +569,7 @@ (define_attr "enabled" "no,yes"
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -792,8 +790,8 @@ (define_insn "aarch64_bcond"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -857,8 +855,8 @@ (define_insn "aarch64_cbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
@@ -872,7 +870,7 @@ (define_insn "*aarch64_tbz1"
   {
 if (get_attr_length (insn) == 8)
   {
-   if (get_attr_far_branch (insn) == 1)
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
  return aarch64_gen_far_branch (operands, 1, "Ltb",
 "\\t%0, , ");
else
@@ -901,8 +899,8 @@ (define_insn "*aarch64_tbz1"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; ---
@@ -966,8 +964,8 @@ (define_insn "@aarch64_tbz"
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 
 )
 
-- 
2.48.1



[PATCH v8 3/9] AArch64: rename branch instruction rules

2025-07-02 Thread Karl Meakin
Give the `define_insn` rules used in lowering `cbranch4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump): Rename to ...
(*aarch64_bcond_wide_imm): ...here.
(aarch64_cb): Rename to ...
(aarch64_cbz1): ...here.
(*cb1): Rename to ...
(*aarch64_tbz1): ...here.
(@aarch64_tb): Rename to ...
(@aarch64_tbz): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/aarch64/aarch64-sme.md  |  2 +-
 gcc/config/aarch64/aarch64.cc  |  6 +++---
 gcc/config/aarch64/aarch64.md  | 23 +--
 4 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index af574d5bb0a..8de79caa86d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3966,7 +3966,7 @@ (define_expand "cbranch4"
 
   rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
   rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[3]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index f7958c90eae..b8bb4cc14b6 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -391,7 +391,7 @@ (define_insn_and_split "aarch64_restore_za"
 auto label = gen_label_rtx ();
 auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
 emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
-auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+auto jump = emit_likely_jump_insn (gen_aarch64_cbznedi1 (tpidr2, label));
 JUMP_LABEL (jump) = label;
 
 aarch64_restore_za (operands[0]);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index abbb97768f5..2cd03b941bd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2884,10 +2884,10 @@ aarch64_gen_test_and_branch (rtx_code code, rtx x, int 
bitnum,
   emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask));
   rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM);
   rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx);
-  return gen_condjump (x, cc_reg, label);
+  return gen_aarch64_bcond (x, cc_reg, label);
 }
-  return gen_aarch64_tb (code, mode, mode,
-x, gen_int_mode (bitnum, mode), label);
+  return gen_aarch64_tbz (code, mode, mode,
+  x, gen_int_mode (bitnum, mode), label);
 }
 
 /* Consider the operation:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 25286add0c8..8ce991e2f35 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -749,7 +749,8 @@ (define_expand "cbranchcc4"
   ""
 )
 
-(define_insn "condjump"
+;; Emit `B`, assuming that the condition is already in the CC register.
+(define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
@@ -789,7 +790,7 @@ (define_insn "condjump"
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
-(define_insn_and_split "*compare_condjump"
+(define_insn_and_split "*aarch64_bcond_wide_imm"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(match_operand:GPI 1 "aarch64_imm24" "n"))
   (label_ref:P (match_operand 2))
@@ -809,12 +810,13 @@ (define_insn_and_split "*compare_condjump"
 rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
 rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
  cc_reg, const0_rtx);
-emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[2]));
 DONE;
   }
 )
 
-(define_insn "aarch64_cb1"
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
@@ -839,7 +841,8 @@ (define_insn "aarch64_cb1"
  (const_int 1)))]
 )
 
-(define_insn "*cb1"
+;; For an 

[PATCH v8 6/9] AArch64: recognize `+cmpbr` option

2025-07-02 Thread Karl Meakin
Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 3 +++
 gcc/doc/invoke.texi  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index dbbb021f05a..1c3e69799f5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -249,6 +249,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("cmpbr", CMPBR, (), (), (), "cmpbr")
+
 AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
 
 AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e8bd8c73c12..d5c4a42e96d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -410,6 +410,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 /* CSSC instructions are enabled through +cssc.  */
 #define TARGET_CSSC AARCH64_HAVE_ISA (CSSC)
 
+/* CB instructions are enabled through +cmpbr.  */
+#define TARGET_CMPBR AARCH64_HAVE_ISA (CMPBR)
+
 /* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs.  */
 #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8163c3a185c..ea26f45bb4d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -22461,6 +22461,9 @@ Enable the FlagM2 flag conversion instructions.
 Enable the Pointer Authentication Extension.
 @item cssc
 Enable the Common Short Sequence Compression instructions.
+@item cmpbr
+Enable the shorter compare and branch instructions, @code{cbb}, @code{cbh} and
+@code{cb}.
 @item sme
 Enable the Scalable Matrix Extension.  This is only supported when SVE2 is also
 enabled.
-- 
2.48.1



[PATCH v2] libstdc++: construct bitset from string_view (P2697) [PR119742]

2025-07-02 Thread Nathan Myers
Changes in V2:
* Generalize private member _M_check_initial_position for use with
  both string and string_view arguments.
* Remove unnecessary #if guards for version and hostedness.
* Remove redundant "std::" qualifications in new code.
* Improve Doxygen source readability.
* Clarify commit message text.
* Fix ChangeLog style.

Add a bitset constructor from string_view, per P2697. Fix existing
tests that would fail to detect incorrect exception behavior.

Argument checks that result in exceptions guarded by "#if HOSTED"
are made unguarded because the functions called to throw just call
terminate() in free-standing builds. Improve readability in Doxygen
comments. Generalize a private member argument-checking function
to work with string and string_view without mentioning either,
obviating need for guards.

The version.h symbol is not "hosted" because string_view, though
not specified to be available in free-standing builds, is defined
there and the feature is useful there.

libstdc++-v3/ChangeLog:
PR libstdc++/119742
* include/bits/version.def: Add preprocessor symbol.
* include/bits/version.h: Add preprocessor symbol.
* include/std/bitset: Add constructor.
* testsuite/20_util/bitset/cons/1.cc: Fix.
* testsuite/20_util/bitset/cons/6282.cc: Fix.
* testsuite/20_util/bitset/cons/string_view.cc: Test new ctor.
* testsuite/20_util/bitset/cons/string_view_wide.cc: Test new ctor.
---
 libstdc++-v3/include/bits/version.def |   8 ++
 libstdc++-v3/include/bits/version.h   |  10 ++
 libstdc++-v3/include/std/bitset   |  82 +++
 .../testsuite/20_util/bitset/cons/1.cc|   1 +
 .../testsuite/20_util/bitset/cons/6282.cc |   5 +-
 .../20_util/bitset/cons/string_view.cc| 132 ++
 .../20_util/bitset/cons/string_view_wide.cc   |   8 ++
 7 files changed, 215 insertions(+), 31 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/cons/string_view.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/bitset/cons/string_view_wide.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index f4ba501c403..b89b287e8e8 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -2030,6 +2030,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = bitset  // ...construct from string_view
+  values = {
+v = 202306;
+cxxmin = 26;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index dc8ac07be16..a70a7ede68c 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2273,4 +2273,14 @@
 #endif /* !defined(__cpp_lib_exception_ptr_cast) && 
defined(__glibcxx_want_exception_ptr_cast) */
 #undef __glibcxx_want_exception_ptr_cast
 
+#if !defined(__cpp_lib_bitset)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_bitset 202306L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_bitset)
+#   define __cpp_lib_bitset 202306L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_bitset) && defined(__glibcxx_want_bitset) */
+#undef __glibcxx_bitset
+
 #undef __glibcxx_want_all
diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 8b5d270c2a9..1c1e1670c33 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -61,8 +61,13 @@
 #endif
 
 #define __glibcxx_want_constexpr_bitset
+#define __glibcxx_want_bitset  // ...construct from string_view
 #include 
 
+#ifdef __cpp_lib_bitset // ...construct from string_view
+# include 
+#endif
+
 #define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_LONG__)
 #define _GLIBCXX_BITSET_WORDS(__n) \
   ((__n) / _GLIBCXX_BITSET_BITS_PER_WORD + \
@@ -752,7 +757,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  (Note that %bitset does @e not meet the formal requirements of a
*  container.  Mainly, it lacks iterators.)
*
-   *  The template argument, @a Nb, may be any non-negative number,
+   *  The template argument, `Nb`, may be any non-negative number,
*  specifying the number of bits (e.g., "0", "12", "1024*1024").
*
*  In the general unoptimized case, storage is allocated in word-sized
@@ -816,28 +821,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   typedef _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)> _Base;
   typedef unsigned long _WordT;
 
-#if _GLIBCXX_HOSTED
-  template
-  _GLIBCXX23_CONSTEXPR
-  void
-  _M_check_initial_position(const std::basic_string<_CharT, _Traits, 
_Alloc>& __s,
-   size_t __position) const
+  template
+  _GLIBCXX23_CONSTEXPR void
+  _M_check_initial_position(
+   const _Str& __s, typename _Str::size_type __position) const
   {
if (__position > __s.size())
- __throw_out_of_range_fmt(__N("bitset::bitset: __position "
-

[PATCH] c++: -fno-delete-null-pointer-checks constexpr addr comparison [PR71962]

2025-07-02 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

Here the flag -fno-delete-null-pointer-checks causes the trivial address
comparison in

  inline int a, b;
  static_assert(&a != &b);

to be rejected as non-constant because with the flag we can't assume
such weak symbols are non-NULL, which causes symtab/fold-const.cc to
punt on such comparisons.  Note this also affects -fsanitize=undefined
since it implies -fno-delete-null-pointer-checks.

This issue seems conceptually the same as PR96862 which was about
-frounding-math breaking some constexpr floating point arithmetic,
and we fixed that PR by disabling -frounding-math during manifestly
constant evaluation.  This patch proposes to do the same for
-fno-delete-null-pointer-checks, disabling it during maniestly constant
evaluation.  I opted to disable it narrowly around the relevant
fold_binary call which seems to address all reported constexpr failures,
but we could consider it disabling it more broadly as well.

PR c++/71962

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_binary_expression): Set
flag_delete_null_pointer_checks alongside folding_cxx_constexpr
during manifestly constant evaluation.

gcc/testsuite/ChangeLog:

* g++.dg/ext/constexpr-pr71962.C: New test.
* g++.dg/ubsan/pr71962.C: New test.
---
 gcc/cp/constexpr.cc  |  2 ++
 gcc/testsuite/g++.dg/ext/constexpr-pr71962.C | 18 ++
 gcc/testsuite/g++.dg/ubsan/pr71962.C |  5 +
 3 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
 create mode 100644 gcc/testsuite/g++.dg/ubsan/pr71962.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 704d936f2ec3..e8426b40c543 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4068,6 +4068,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, 
tree t,
  || TREE_CODE (type) != REAL_TYPE))
{
  auto ofcc = make_temp_override (folding_cxx_constexpr, true);
+ auto odnpc = make_temp_override (flag_delete_null_pointer_checks,
+  true);
  r = fold_binary_initializer_loc (loc, code, type, lhs, rhs);
}
   else
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C 
b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
new file mode 100644
index ..57cb14ac804e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
@@ -0,0 +1,18 @@
+// PR c++/71962
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fno-delete-null-pointer-checks" }
+
+struct A { void f(); };
+static_assert(&A::f != nullptr, "");
+
+#if __cpp_inline_variables
+inline int a, b;
+static_assert(&a != &b, "");
+static_assert(&a != nullptr, "");
+#endif
+
+int main() {
+  static int x, y;
+  static_assert(&x != &y, "");
+  static_assert(&x != nullptr, "");
+}
diff --git a/gcc/testsuite/g++.dg/ubsan/pr71962.C 
b/gcc/testsuite/g++.dg/ubsan/pr71962.C
new file mode 100644
index ..f17c825da449
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/pr71962.C
@@ -0,0 +1,5 @@
+// PR c++/71962
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fsanitize=undefined" }
+
+#include "../ext/constexpr-pr71962.C"
-- 
2.50.0.145.g83014dc05f