date:20250627

RE: [PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread Li, Pan2

> > +  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_SUB_FUNC(T), sat_add) \

> Shouldn't that be sat_sub here?

Oh, Yes, should be sat_sub, but happen to work for test, let me update it in 
v2. 

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 27, 2025 2:37 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Liu, Hongtao 
; Robin Dapp 
Subject: Re: [PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv 
combine case 0 with GR2VR cost 0, 2 and 15

Hi Pan,

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
> index 2932e189186..0af8b969f47 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
> @@ -282,9 +282,24 @@ DEF_SAT_U_ADD(uint16_t)
>  DEF_SAT_U_ADD(uint32_t)
>  DEF_SAT_U_ADD(uint64_t)
>  
> +#define DEF_SAT_U_SUB(T)   \
> +T  \
> +test_##T##_sat_sub (T a, T b)  \
> +{  \
> +  return (a - b) & (-(T)(a >= b)); \
> +}
> +
> +DEF_SAT_U_SUB(uint8_t)
> +DEF_SAT_U_SUB(uint16_t)
> +DEF_SAT_U_SUB(uint32_t)
> +DEF_SAT_U_SUB(uint64_t)
> +
>  #define SAT_U_ADD_FUNC(T) test_##T##_sat_add
>  #define SAT_U_ADD_FUNC_WRAP(T) SAT_U_ADD_FUNC(T)
>  
> +#define SAT_U_SUB_FUNC(T) test_##T##_sat_sub
> +#define SAT_U_SUB_FUNC_WRAP(T) SAT_U_SUB_FUNC(T)
> +
>  #define TEST_BINARY_VX_SIGNED_0(T)  \
>DEF_VX_BINARY_CASE_0_WRAP(T, +, add)  \
>DEF_VX_BINARY_CASE_0_WRAP(T, -, sub)  \
> @@ -313,6 +328,7 @@ DEF_SAT_U_ADD(uint64_t)
>DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_1_WARP(T), max)\
>DEF_VX_BINARY_CASE_2_WRAP(T, MIN_FUNC_0_WARP(T), min)\
>DEF_VX_BINARY_CASE_2_WRAP(T, MIN_FUNC_1_WARP(T), min)\
> -  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_ADD_FUNC(T), sat_add)
> +  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_ADD_FUNC(T), sat_add) \
> +  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_SUB_FUNC(T), sat_add) \

Shouldn't that be sat_sub here?

-- 
Regards
 Robin

Re: [PATCH][RFC] c/96570 - diagnostics for conversions to/from time_t

2025-06-27 Thread Richard Biener

On Thu, 26 Jun 2025, Joseph Myers wrote:

> On Thu, 26 Jun 2025, Richard Biener wrote:
> 
> > The following prototypes diagnostics for conversions to/from time_t
> > where the source/destination does not have sufficient precision for it.
> > I've lumped this into -Wconversion for the moment and didn't bother
> > fixing up the testcase for !ilp32 or the -Wconversion diagnostics that
> > happen.
> > 
> > Would -Wtime-conversion (or -Wtime_t-conversion?) be an appropriate
> > option?  I'd enable it with -Wconversion.
> 
> I think such a warning should be based on an attribute on the time_t type 
> that means "warn for implicit truncation of this type" (I'm less clear on 
> why warnings for implicit widening conversions *to* time_t are supposed to 
> be useful), rather than hardcoding it to be based on the time_t name.  
> It's hardly just time_t for which a warning about such implicit truncation 
> might be useful.

I agree it might be better to have a more general facility.  As of
the widening conversions to time_t, this is what the PR requested.
I guess it might catch Y2038 issues where truncations are not
visible, but I only have the artifical testcase from the PR.

Any suggestion for an attribute name?  no_implicit_truncation?
full_precision (when widenings should be diagnosed)?

> Such an attribute would of course be preserved by e.g. "typedef time_t 
> my_time_t;".  It would need composite type rules defined (probably the 
> composite type has the attribute if either of the two types does), and 
> rules for what happens to the attribute in integer promotions / usual 
> arithmetic conversions (I'm guessing that given "time_t x;", it's desired 
> to warn about truncation of x+1, for example, so the process of applying 
> usual arithmetic conversions to determine the type of x+1 should not have 
> lost the attribute; what's less clear is e.g. x+1LL if time_t is narrower 
> than long long).

Hmm.  Yes, I did think about following typedef chains.  It's of course
unhelpful when people do

 int64_t tem = <.. some time_t expr ..>;
 int time = tem;

so catching Y2038 problems solely by diagnosing conversions is
incomplete.  I was hoping that time_t + 1 gets us a time_t result
and not a 'long', but in the end that's the very same issue as
with an explicit 'long' temporary like above.  So maybe the
complexity with handling the cases you outline above can be
waived for an initial attemt (it's definitely over my C/C++ frontend fu)

Richard.

Re: [PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 08:17:46AM +0200, Tobias Burnus wrote:
> Hi Yuao,
> 
> 
> Yuao Ma wrote:
> > >//but the testcases don't seem to be conditionalized on this. Would the
> > >//new tests fail if gcc is built against an insufficiently recent version
> > >//of mpfr,
> …
> > The test case is indeed conditionalized, though in a different manner
> > than you
> > might expect. The condition depends on the version of MPFR we're using,
> > and
> > unfortunately, I haven't found a predefined macro that indicates which
> > MPFR
> > version GCC is linked against.
> 
> I think there is a detour way: The 'print_version' function (toplev.cc)
> prints the MPFR version, but only when not printing to stderr.
> 
> 
> Thus, I get the desired output with:
> 
> 
> gcc -S -fverbose-asm -o - -x c - < /dev/null
> 
> 
> [I think the /dev/null is not quite portable; possibly a pipe ("echo |...")
> or an empty file is more portable.]
> 
> 
> The output contains here "... MPFR version 4.2.2 ..."

Though, parsing MPFR version in tcl and determining what is later than 4.2.2
might be difficult.

I think the __builtin_constant_p(acospi(0.5)) approach is usable, but would
be much better done on the lib/target-supports.exp side.
So, have foldable_pi_based_trigonometry effective target, which would test
if __builtin_constant_p(acospi(0.5)) is 1.

The advantage of doing it that way is that it is visible in the test log
files whether it is supported (then the tests can PASS or perhaps FAIL) or
not (then the test will be UNSUPPORTED).

Jakub

[PATCH][committed][docs]: fix a typo in used attribute documentation

2025-06-27 Thread Tamar Christina

This fixes a small typo in the Label attributes docs.

Committed as obvious.

Thanks,
Tamar

gcc/ChangeLog:

* doc/extend.texi: Fix typo in unsed attribute docs.

---
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
69c6512074642ece47f1f9a3d7bdde20ec800d40..6e80ef8a2055c20159738fb0e1b5ca6ad699955a
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9177,7 +9177,7 @@ NoError:
 @item unused
 This feature is intended for program-generated code that may contain 
 unused labels, but which is compiled with @option{-Wall}.  It is
-not normally appropriate to use in it human-written code, though it
+not normally appropriate to use it in human-written code, though it
 could be useful in cases where the code that jumps to the label is
 contained within an @code{#ifdef} conditional.
 


-- 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 69c6512074642ece47f1f9a3d7bdde20ec800d40..6e80ef8a2055c20159738fb0e1b5ca6ad699955a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9177,7 +9177,7 @@ NoError:
 @item unused
 This feature is intended for program-generated code that may contain 
 unused labels, but which is compiled with @option{-Wall}.  It is
-not normally appropriate to use in it human-written code, though it
+not normally appropriate to use it in human-written code, though it
 could be useful in cases where the code that jumps to the label is
 contained within an @code{#ifdef} conditional.

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-27 Thread Richard Biener

On Thu, 26 Jun 2025, Robin Dapp wrote:

> > +  bool is_misaligned = scalar_align < inner_vectype_sz;
> > +  bool is_packed = scalar_align > 1 && is_misaligned;
> > +
> > +  *misalignment = !is_misaligned ? 0 : inner_vectype_sz - scalar_align;
> > +
> > +  if (targetm.vectorize.support_vector_misalignment
> > + (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed))
> >
> > the misalignment argument is meaningless, I think you want to
> > pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed
> > if the scalars acesses are not at least size aligned.
> 
> At least aarch64's (and loongarch's) support_vector_misalignment gives up
> right away if misalignment == -1 (before checking for !is_packed)
> and would thus get dr_unaligned_unsupported in case of strict alignment.
> 
> I used the same logic for riscv which made a proper value in *misalignment
> necessary.
> 
> We only have one other invocation of support_vector_misalignment in
> tree-vect-data-refs which only sets packed if DR_MISALIGNMENT_UNKOWN.
> So ISTM that
> if (!is_packed)
>   return true;
> should always be done before acting on DR_MISALIGNMENT_UNKOWN?
> 
> Or can there be instances where is_packed == false && DR_MISALIGNMENT_UNKNOWN
> and we don't support the misalignment?  Like if the target requires
> vector-sized alignment?

Yes, powerpc is a case like this.

> So my current plan would be to adjust the riscv hook to always support
> misalignment if !is_packed regardless of DR_MISALIGNMENT_UNKNOWN and do the
> same for aarch64, loongarch?

Maybe we can pass a scalar mode to the hook when we ask for
SCATTER/GATHER?  That might need fixups in other targets of course,
but it would make it clear what we're asking for?

Richard.

> I'll also change the hook docs to something like
> 
> diff --git a/gcc/target.def b/gcc/target.def
> index 38903eb567a..94ccf86233c 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -1926,7 +1926,8 @@ DEFHOOK
> store/load of a specific factor denoted in the @var{misalignment}\n\
> parameter.  The vector store/load should be of machine mode @var{mode} and\n\
> the elements in the vectors should be of type @var{type}.  @var{is_packed}\n\
> -parameter is true if the memory access is defined in a packed struct.",
> +parameter is true if the misalignment is unknown and the memory access is\n\
> +defined in a packed struct."
> 
> 
> > Note the hook really doesn't know whether you ask it for gather/scatter
> > or a contiguous vector load so I wonder whether the above fits
> > constraints on other platforms where scalar accesses might be
> > allowed to be packed but all unaligned vector accesses would need
> > to be element aligned?
> 
> We actually can have all four combinations of scalar and vector misalignment
> support on riscv :/
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644]

2025-06-27 Thread Nathaniel Shead

On Wed, Jun 25, 2025 at 11:52:14AM -0400, Jason Merrill wrote:
> On 6/25/25 9:02 AM, Nathaniel Shead wrote:
> > On Tue, Jun 24, 2025 at 12:10:09PM -0400, Patrick Palka wrote:
> > > On Tue, 24 Jun 2025, Jason Merrill wrote:
> > > 
> > > > On 6/23/25 5:41 PM, Nathaniel Shead wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > We were erroring because the TEMPLATE_DECL of the existing partial
> > > > > specialisation has an undeduced return type, but the imported
> > > > > declaration did not.
> > > > > 
> > > > > The root cause is similar to what was fixed in 
> > > > > r13-2744-g4fac53d6522189,
> > > > > where modules streaming code assumes that a TEMPLATE_DECL and its
> > > > > DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
> > > > > fixed the issue by ensuring that when the type of a variable is 
> > > > > deduced
> > > > > the TEMPLATE_DECL is updated as well, but this missed handling partial
> > > > > specialisations.
> > > > > 
> > > > > However, I don't think we actually care about that, since it seems 
> > > > > that
> > > > > only the type of the inner decl actually matters in practice.  
> > > > > Instead,
> > > > > this patch handles the issue on the modules side when deduping a
> > > > > streamed decl, by only comparing the inner type.
> > > > > 
> > > > >   PR c++/120644
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   * decl.cc (cp_finish_decl): Remove workaround.
> > > > 
> > > > Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL 
> > > > correct,
> > > > maybe we should always set it to NULL_TREE to make sure we only look at 
> > > > the
> > > > inner type.
> > > 
> > > FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL
> > > corresponding to a partial specialization via
> > > 
> > >   TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl)))
> > > 
> > > if we do want to end up keeping the two TREE_TYPEs in sync.
> > > 
> > 
> > Thanks.  On further reflection, maybe the safest approach is to just
> > ensure that the types are always consistent (including for partial
> > specs); this is what the following patch does.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is
> >   consistent with its template [PR120644]
> > 
> > We were erroring because the TEMPLATE_DECL of the existing partial
> > specialisation has an undeduced return type, but the imported
> > declaration did not.
> > 
> > The root cause is similar to what was fixed in r13-2744-g4fac53d6522189,
> > where modules streaming code assumes that a TEMPLATE_DECL and its
> > DECL_TEMPLATE_RESULT will always have the same TREE_TYPE.  That commit
> > fixed the issue by ensuring that when the type of a variable is deduced
> > the TEMPLATE_DECL is updated as well, but missed handling partial
> > specialisations.  This patch ensures that the same adjustment is made
> > there as well.
> > 
> > PR c++/120644
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (cp_finish_decl): Also propagate type to partial
> > templates.
> > * module.cc (trees_out::decl_value): Add assertion that the
> > TREE_TYPE of a streamed template decl matches its inner.
> > (trees_in::is_matching_decl): Clarify function return type
> > deduction should only occur for non-TEMPLATE_DECL.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/auto-7.h: New test.
> > * g++.dg/modules/auto-7_a.H: New test.
> > * g++.dg/modules/auto-7_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > Reviewed-by: Jason Merrill 
> > Reviewed-by: Patrick Palka 
> > ---
> >   gcc/cp/decl.cc  | 13 +
> >   gcc/cp/module.cc|  7 ++-
> >   gcc/testsuite/g++.dg/modules/auto-7.h   | 12 
> >   gcc/testsuite/g++.dg/modules/auto-7_a.H |  5 +
> >   gcc/testsuite/g++.dg/modules/auto-7_b.C |  5 +
> >   5 files changed, 37 insertions(+), 5 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H
> >   create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 4fe97ffbf8f..59701197e16 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -8923,10 +8923,15 @@ cp_finish_decl (tree decl, tree init, bool 
> > init_const_expr_p,
> > cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
> > /* Update the type of the corresponding TEMPLATE_DECL to match.  */
> > -  if (DECL_LANG_SPECIFIC (decl)
> > - && DECL_TEMPLATE_INFO (decl)
> > - && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl)
> > -   TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type;
> > +  if (DECL_LANG_SPECIFIC (decl) && DECL_TEMPLATE_INFO (decl))
> > +   {
> >

[PATCH v3 3/6] bitint: Allow unused bits when testing extended _BitInt ABIs

2025-06-27 Thread Yang Yujie

In LoongArch psABI, large _BitInt(N) (N > 64) objects are only
extended to fill the highest 8-byte chunk that contains any used bit,
but the size of such a large _BitInt type is a multiple of their
16-byte alignment.  So there may be an entire unused 8-byte
chunk that is not filled by extension, and this chunk shouldn't be
checked when testing if the object is properly extended.

The original bitintext.h assumed that all bits within
sizeof(_BitInt(N)) beyond used bits are filled by extension.
This patch changes that for LoongArch and possibly
any future ports with a similar behavior.

P.S. For encoding this test as well as type-generic programming,
it would be nice to have a builtin function to obtain "N" at
compile time from _BitInt(N)-typed expressions.  But here
we stick to existing ones (__builtin_clrsbg / __builtin_clzg).

gcc/testsuite/ChangeLog:

* gcc.dg/bitintext.h: Generalize BEXTC to only check extension
within PROMOTED_SIZE bits.
---
 gcc/testsuite/gcc.dg/bitintext.h | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/bitintext.h b/gcc/testsuite/gcc.dg/bitintext.h
index 99fedb32a9a..522b96ed715 100644
--- a/gcc/testsuite/gcc.dg/bitintext.h
+++ b/gcc/testsuite/gcc.dg/bitintext.h
@@ -4,6 +4,24 @@ do_copy (void *p, const void *q, __SIZE_TYPE__ r)
   __builtin_memcpy (p, q, r);
 }
 
+/* Obtain the value of N from a _BitInt(N)-typed expression X
+   at compile time.  */
+#define S(x) \
+  ((typeof (x)) -1 < 0   \
+   ? __builtin_clrsbg (__builtin_choose_expr ((typeof (x)) -1 < 0,   \
+ (typeof (x)) -1, -1)) + 1  \
+   : __builtin_popcountg (__builtin_choose_expr ((typeof (x)) -1 < 0,\
+0U, (typeof (x)) -1)))
+ 
+#define CEIL(x,y) (((x) + (y) - 1) / (y))
+
+/* Promote a _BitInt type to include its padding bits.  */
+#if defined (__s390x__) || defined(__arm__)
+#define PROMOTED_SIZE(x) sizeof (x)
+#elif defined(__loongarch__)
+#define PROMOTED_SIZE(x) (sizeof (x) > 8 ? CEIL (S (x), 64) * 8 : sizeof (x))
+#endif
+
 /* Macro to test whether (on targets where psABI requires it) _BitInt
with padding bits have those filled with sign or zero extension.  */
 #if defined(__s390x__) || defined(__arm__) || defined(__loongarch__)
@@ -11,14 +29,14 @@ do_copy (void *p, const void *q, __SIZE_TYPE__ r)
   do { \
 if ((typeof (x)) -1 < 0)   \
   {\
-   _BitInt(sizeof (x) * __CHAR_BIT__) __x; \
+   _BitInt(PROMOTED_SIZE (x) * __CHAR_BIT__) __x;  \
do_copy (&__x, &(x), sizeof (__x)); \
if (__x != (x)) \
  __builtin_abort ();   \
   }\
 else   \
   {\
-   unsigned _BitInt(sizeof (x) * __CHAR_BIT__) __x;\
+   unsigned _BitInt(PROMOTED_SIZE (x) * __CHAR_BIT__) __x; \
do_copy (&__x, &(x), sizeof (__x)); \
if (__x != (x)) \
  __builtin_abort ();   \
-- 
2.46.0

[PATCH v3 5/6] LoongArch: Prioritize target-specific makefile fragments

2025-06-27 Thread Yang Yujie

libgcc/ChangeLog:

* config.host: Remove unused code. Include LoongArch-specific
tmake_files after the OS-specific ones.
---
 libgcc/config.host | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/libgcc/config.host b/libgcc/config.host
index d36f0e34a3b..32e73c93aec 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -141,19 +141,6 @@ lm32*-*-*)
;;
 loongarch*-*)
cpu_type=loongarch
-   tmake_file="loongarch/t-loongarch"
-   if test "${libgcc_cv_loongarch_hard_float}" = yes; then
-   tmake_file="${tmake_file} t-hardfp-sfdf t-hardfp"
-   else
-   tmake_file="${tmake_file} t-softfp-sfdf"
-   fi
-   if test "${ac_cv_sizeof_long_double}" = 16; then
-   tmake_file="${tmake_file} loongarch/t-softfp-tf"
-   fi
-   if test "${host_address}" = 64; then
-   tmake_file="${tmake_file} loongarch/t-loongarch64"
-   fi
-   tmake_file="${tmake_file} t-softfp"
;;
 m32r*-*-*)
 cpu_type=m32r
@@ -1003,16 +990,22 @@ lm32-*-uclinux*)
;;
 loongarch*-linux*)
extra_parts="$extra_parts crtfastmath.o"
-   tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
-   case ${host} in
- *)
-   tmake_file="${tmake_file} t-slibgcc-libgcc"
-   ;;
-   esac
md_unwind_header=loongarch/linux-unwind.h
+   tmake_file="${tmake_file} loongarch/t-loongarch t-softfp-sfdf 
loongarch/t-softfp-tf"
+   if test "${host_address}" = 64; then
+   tmake_file="${tmake_file} loongarch/t-loongarch64"
+   fi
+   tmake_file="${tmake_file} t-softfp"
+   tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
+   tmake_file="${tmake_file} t-slibgcc-libgcc"
;;
 loongarch*-elf*)
extra_parts="$extra_parts crtfastmath.o"
+   tmake_file="${tmake_file} loongarch/t-loongarch t-softfp-sfdf 
loongarch/t-softfp-tf"
+   if test "${host_address}" = 64; then
+   tmake_file="${tmake_file} loongarch/t-loongarch64"
+   fi
+   tmake_file="${tmake_file} t-softfp"
tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
tmake_file="${tmake_file} t-slibgcc-libgcc"
;;
-- 
2.46.0

[PATCH v3 1/6] bitint: Allow mode promotion of _BitInt types

2025-06-27 Thread Yang Yujie

For targets that treat small _BitInts like the fundamental
integral types, we should allow their machine modes to be promoted
in the same way.

gcc/ChangeLog:

* explow.cc (promote_function_mode): Add a case for
small/medium _BitInts.
(promote_mode): Same.
---
 gcc/explow.cc | 24 
 1 file changed, 24 insertions(+)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index 7799a98053b..8f8ca7f011e 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -854,6 +854,18 @@ promote_function_mode (const_tree type, machine_mode mode, 
int *punsignedp,
 
   switch (TREE_CODE (type))
 {
+case BITINT_TYPE:
+  if (TYPE_MODE (type) == BLKmode)
+   return mode;
+
+  struct bitint_info info;
+  bool ok;
+  ok = targetm.c.bitint_type_info (TYPE_PRECISION (type), &info);
+  gcc_assert (ok);
+
+  if (!info.extended)
+   return mode;
+  /* FALLTHRU */
 case INTEGER_TYPE:   case ENUMERAL_TYPE:   case BOOLEAN_TYPE:
 case REAL_TYPE:  case OFFSET_TYPE: case FIXED_POINT_TYPE:
 case POINTER_TYPE:   case REFERENCE_TYPE:
@@ -893,6 +905,18 @@ promote_mode (const_tree type ATTRIBUTE_UNUSED, 
machine_mode mode,
 
   switch (code)
 {
+case BITINT_TYPE:
+  if (TYPE_MODE (type) == BLKmode)
+   return mode;
+
+  struct bitint_info info;
+  bool ok;
+  ok = targetm.c.bitint_type_info (TYPE_PRECISION (type), &info);
+  gcc_assert (ok);
+
+  if (!info.extended)
+   return mode;
+  /* FALLTHRU */
 case INTEGER_TYPE:   case ENUMERAL_TYPE:   case BOOLEAN_TYPE:
 case REAL_TYPE:  case OFFSET_TYPE: case FIXED_POINT_TYPE:
   /* Values of these types always have scalar mode.  */
-- 
2.46.0

[PATCH v3 4/6] bitint: Do not optimize away conversion to _BitInt before a VCE

2025-06-27 Thread Yang Yujie

A _BitInt value may rely on a conversion to become properly extended.
So a conversion to _BitInt is not trivially removable even if the
types of the result and the operand have the same precision and size.

This patches fixes gcc.dg/torture/bitint-64.c at -O2 on LoongArch,
which fails because extension of the result is dropped in a
compare-and-swap loop generated for incrementing an _Atomic _BitInt,
causing an ABI violation.

gcc/ChangeLog:

* match.pd: Preserve conversion to _BitInt before a VCE
if the _BitInt is extended.
---
 gcc/match.pd | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index f4416d9172c..1df52155a05 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5420,16 +5420,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(convert @0)))
 
 /* Strip inner integral conversions that do not change precision or size, or
-   zero-extend while keeping the same size (for bool-to-char).  */
+   zero-extend while keeping the same size (for bool-to-char).
+   However, keep this conversion if the result is an extended _BitInt,
+   since it may rely on this conversion to extend properly.  */
+
 (simplify
   (view_convert (convert@0 @1))
+  (with {
+bool extended_bitint = false;
+if (BITINT_TYPE_P (TREE_TYPE (@0)))
+  {
+   struct bitint_info info;
+   extended_bitint
+ = targetm.c.bitint_type_info (TYPE_PRECISION (TREE_TYPE (@0)),
+   &info);
+   extended_bitint = extended_bitint && info.extended;
+  }
+   }
   (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
&& (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE (@1)))
+   && !extended_bitint
&& TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
&& (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
   && TYPE_UNSIGNED (TREE_TYPE (@1)
-   (view_convert @1)))
+   (view_convert @1
 
 /* Simplify a view-converted empty or single-element constructor.  */
 (simplify
-- 
2.46.0

[PATCH v3 6/6] LoongArch: Add support for _BitInt [PR117599]

2025-06-27 Thread Yang Yujie

This patch adds support for C23's _BitInt for LoongArch.

>From the LoongArch psABI[1]:

> _BitInt(N) objects are stored in little-endian order in memory
> and are signed by default.
>
> For N ≤ 64, a _BitInt(N) object have the same size and alignment
> of the smallest fundamental integral type that can contain it.
> The unused high-order bits within this containing type are filled
> with sign or zero extension of the N-bit value, depending on whether
> the _BitInt(N) object is signed or unsigned. The _BitInt(N) object
> propagates its signedness to the containing type and is laid out
> in a register or memory as an object of this type.
>
> For N > 64, _BitInt(N) objects are implemented as structs of 64-bit
> integer chunks. The number of chunks is the smallest even integer M
> so that M * 64 ≥ N. These objects are of the same size of the struct
> containing the chunks, but always have 16-byte alignment. If there
> are unused bits in the highest-ordered chunk that contains used
> bits, they are defined as the sign- or zero- extension of the used
> bits depending on whether the _BitInt(N) object is signed or
> unsigned. If an entire chunk is unused, its bits are undefined.

[1] https://github.com/loongson/la-abi-specs

PR target/117599

gcc/ChangeLog:

* config/loongarch/loongarch.h: Define a PROMOTE_MODE case for
small _BitInts.
* config/loongarch/loongarch.cc (loongarch_promote_function_mode):
Same.
(loongarch_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Declare.

libgcc/ChangeLog:

* config/loongarch/t-softfp-tf: Enable _BitInt helper functions.
* config/loongarch/t-loongarch: Same.
* config/loongarch/libgcc-loongarch.ver: New file.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bitint-alignments.c: New test.
* gcc.target/loongarch/bitint-args.c: New test.
* gcc.target/loongarch/bitint-sizes.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 35 +++-
 gcc/config/loongarch/loongarch.h  |  4 +-
 .../gcc.target/loongarch/bitint-alignments.c  | 58 +
 .../gcc.target/loongarch/bitint-args.c| 81 +++
 .../gcc.target/loongarch/bitint-sizes.c   | 60 ++
 libgcc/config/loongarch/libgcc-loongarch.ver  | 26 ++
 libgcc/config/loongarch/t-loongarch   |  2 +
 libgcc/config/loongarch/t-softfp-tf   |  1 +
 8 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-alignments.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-args.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-sizes.c
 create mode 100644 libgcc/config/loongarch/libgcc-loongarch.ver

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index f62e4163c71..b1571f98378 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10850,9 +10850,9 @@ loongarch_expand_vec_cmp (rtx operands[])
to a fixed type.  */
 
 static machine_mode
-loongarch_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
+loongarch_promote_function_mode (const_tree type,
 machine_mode mode,
-int *punsignedp ATTRIBUTE_UNUSED,
+int *punsignedp,
 const_tree fntype ATTRIBUTE_UNUSED,
 int for_return ATTRIBUTE_UNUSED)
 {
@@ -11214,6 +11214,34 @@ loongarch_c_mode_for_suffix (char suffix)
   return VOIDmode;
 }
 
+/* Implement TARGET_C_BITINT_TYPE_INFO.
+   Return true if _BitInt(N) is supported and fill its details into *INFO.  */
+bool
+loongarch_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (n <= 8)
+info->limb_mode = QImode;
+  else if (n <= 16)
+info->limb_mode = HImode;
+  else if (n <= 32)
+info->limb_mode = SImode;
+  else if (n <= 64)
+info->limb_mode = DImode;
+  else if (n <= 128)
+info->limb_mode = TImode;
+  else
+info->limb_mode = DImode;
+
+  info->abi_limb_mode = info->limb_mode;
+
+  if (n > 64)
+info->abi_limb_mode = TImode;
+
+  info->big_endian = false;
+  info->extended = true;
+  return true;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -11488,6 +11516,9 @@ loongarch_c_mode_for_suffix (char suffix)
 #undef TARGET_C_MODE_FOR_SUFFIX
 #define TARGET_C_MODE_FOR_SUFFIX loongarch_c_mode_for_suffix
 
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO loongarch_bitint_type_info
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index d8977634b71..73372df838e 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -270,7 +270,9 @@ along with GCC; see the file COPYING3.

[Patch] Fortran/OpenACC: Permit PARAMETER as 'var' in clauses (+ ignore)

2025-06-27 Thread Tobias Burnus


Background: In real-world code, one can find:
  !$ACC DECLARE COPYIN(c1es, c2es, ...)
as here for the ICON weather model. This clearly implies that other
compilers accept and, potentially, require those. For better
compatibility with real-world use, the just released OpenACC 3.4 now
permits PARAMETER but permits compilers to ignore those (remove them
when doing optimizations).


Thus, this patch permits now named constants (PARAMETER) as 'var'
in OpenACC [with an off-by-default warning in all but one case
(device_resident, no warning)] but then ignores them later.


If you look at the following patch, I think the following is ponder about:

* Does skipping over PARAMETERS (named constants) in trans-openmp.cc
  clause handling will break some unrelated OpenACC or OpenMP code?
  (In principle, resolving an expression should remove the parameter,
  replacing it by its value. And the called trans-openmp.cc functions
  also should only deal with non-expressions.)

* Does this handle for OpenACC all cases (or did I miss one?)
  Does it handle too much for OpenACC (or OpenACC?)

* Do you think the warning handling is fine/consistent?

I think the patch should be fine, but, of course, I might have missed
something.


Comments, remarks, suggestions about this patch?

Tobias
Fortran/OpenACC: Permit PARAMETER as 'var' in clauses (+ ignore)

It turned out that other compilers permit (require?) named constants
to appear in clauses - and programs actually use this. OpenACC 3.4
added therefore the following:
  In this spec, a _var_ (in italics) is one of the following:
  ...
  * a named constant in Fortran.
plus
  If during an optimization phase _var_ is removed by the compiler,
  appearances of var in data clauses are ignored.

Thus, all errors related to PARAMETER are now downgreaded, most
to a -Wsurprising warning, but for 'acc declare device_resident'
(which kind of makes sense), no warning is printed.

In trans-openmp.cc, those are ignored, unless I missed some code
path. (If so, I hope the middle end removes them; but before
removing them for the covered cases, the program just compiled &
linked fine.)

Note that 'ignore PARAMETER inside clauses' in trans-openmp.cc
would in principle also apply to expressions ('if (var)') but
those should be evaluated during 'resolve.cc' + 'openmp.cc' to
their (numeric, logical, string) value such that there should
be no issue.

gcc/fortran/ChangeLog:

	* invoke.texi (-Wsurprising): Note about OpenACC warning
	related to PARAMATER.
	* openmp.cc (resolve_omp_clauses, gfc_resolve_oacc_declare):
	Accept PARAMETER for OpenACC but add surprising warning.
	* trans-openmp.cc (gfc_trans_omp_variable_list,
	gfc_trans_omp_clauses): Ignore PARAMETER inside clauses.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/parameter.f95: Add -Wsurprising flag
	and update expected diagnostic.
	* gfortran.dg/goacc/parameter-3.f90: New test.
	* gfortran.dg/goacc/parameter-4.f90: New test.

 gcc/fortran/invoke.texi |  4 
 gcc/fortran/openmp.cc   | 30 -
 gcc/fortran/trans-openmp.cc | 13 ---
 gcc/testsuite/gfortran.dg/goacc/parameter-3.f90 | 16 +
 gcc/testsuite/gfortran.dg/goacc/parameter-4.f90 | 26 +
 gcc/testsuite/gfortran.dg/goacc/parameter.f95   | 27 +++---
 6 files changed, 94 insertions(+), 22 deletions(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index da085d124f9..0b893e876a5 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1170,6 +1170,10 @@ A @code{CHARACTER} variable is declared with negative length.
 With @option{-fopenmp}, for fixed-form source code, when an @code{omx}
 vendor-extension sentinel is encountered. (The equivalent @code{ompx},
 used in free-form source code, is diagnosed by default.)
+
+@item
+With @option{-fopenacc}, when using named constances with clauses that
+take a variable as doing so has no effect.
 @end itemize
 
 @opindex Wtabs
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index fe0a47a6948..f1acc00f561 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8895,15 +8895,21 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	if (list == OMP_LIST_MAP
 	&& n->sym->attr.flavor == FL_PARAMETER)
 	  {
+	/* OpenACC since 3.4 permits for Fortran named constants, but
+	   permits removing then as optimization is not needed and such
+	   ignore them. Likewise below for FIRSTPRIVATE.  */
 	if (openacc)
-	  gfc_error ("Object %qs is not a variable at %L; parameters"
-			 " cannot be and need not be copied", n->sym->name,
-			 &n->where);
+	  gfc_warning (OPT_Wsurprising, "Clause for object %qs at %L is "
+			   "ignored as parameters need not be copied",
+			   n->sym->name, &n->where);
 	else
 	  gfc_error ("Object %qs is not a variable at %L; parameters"
 			 " cannot be and need not be mapped", n->sym->name,

Re: [PATCH] Fix misoptimization of CONSTRUCTOR with reverse SSO

2025-06-27 Thread Richard Biener

On Thu, Jun 26, 2025 at 12:34 PM Eric Botcazou  wrote:
>
> Hi,
>
> fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse
> storage order, but it can be invoked in a couple of places on a CONSTRUCTOR
> with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a
> type with reverse storage order; this would require a post adjustment that
> does not currently exist, thus yield wrong code for this admittedly quite
> pathological (but supported) case.
>
> Technically, this is a regression in GCC 10.x and later but, being quite
> pathological, at least in Ada, I don't think that we need to bother about it
> on earlier branches than gcc-13.
>
> Tested on x86-64/Linux, OK for mainline down to the gcc-13 branch?

OK.

Thanks,
Richard.

>
> 2025-06-26  Eric Botcazou  
>
> * gimple-fold.cc (fold_const_aggregate_ref_1) :
> Bail out immediately if the reference has reverse storage order.
> * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise.
>
>
> 2025-06-26  Eric Botcazou  
>
> * gnat.dg/sso20.adb: New test.
>
> --
> Eric Botcazou

[PATCH v3 2/6] expand: Reduce unneeded _BitInt extensions

2025-06-27 Thread Yang Yujie

For targets that set the "extended" flag in TARGET_C_BITINT_TYPE_INFO,
we assume small _BitInts to be internally extended after arithmetic
operations. In this case, an extra extension during RTL expansion
can be avoided.

gcc/ChangeLog:

* expr.cc (expand_expr_real_1): Do not call
reduce_to_bit_field_precision if the target assume the _BitInt
results to be already extended.
(EXTEND_BITINT): Same.
* expr.h (bitint_extended): Declare the cache variable.
* function.cc (prepare_function_start): Initialize it.
---
 gcc/expr.cc | 12 
 gcc/expr.h  |  4 
 gcc/function.cc |  4 
 3 files changed, 20 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ac4fdfaa218..97d833a33a6 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -76,6 +76,10 @@ along with GCC; see the file COPYING3.  If not see
the same indirect address eventually.  */
 int cse_not_expected;
 
+/* Cache of the "extended" flag in the target's _BitInt description
+   for use during expand.  */
+int bitint_extended;
+
 static bool block_move_libcall_safe_for_call_parm (void);
 static bool emit_block_move_via_pattern (rtx, rtx, rtx, unsigned, unsigned,
 HOST_WIDE_INT, unsigned HOST_WIDE_INT,
@@ -11280,6 +11284,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
  when reading from SSA_NAMEs of vars.  */
 #define EXTEND_BITINT(expr) \
   ((TREE_CODE (type) == BITINT_TYPE\
+&& !bitint_extended
\
 && reduce_bit_field
\
 && mode != BLKmode \
 && modifier != EXPAND_MEMORY   \
@@ -11291,6 +11296,13 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
   type = TREE_TYPE (exp);
   mode = TYPE_MODE (type);
   unsignedp = TYPE_UNSIGNED (type);
+  if (bitint_extended == -1 && TREE_CODE (type) == BITINT_TYPE)
+{
+  struct bitint_info info;
+  bool ok = targetm.c.bitint_type_info (TYPE_PRECISION (type), &info);
+  gcc_assert (ok);
+  bitint_extended = info.extended;
+}
 
   treeop0 = treeop1 = treeop2 = NULL_TREE;
   if (!VL_EXP_CLASS_P (exp))
diff --git a/gcc/expr.h b/gcc/expr.h
index 53ab625787e..060151df010 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -388,4 +388,8 @@ extern void expand_crc_table_based (rtx, rtx, rtx, rtx, 
machine_mode);
 extern void expand_reversed_crc_table_based (rtx, rtx, rtx, rtx, machine_mode,
 void (*) (rtx *));
 
+/* Cache of the "extended" flag in the target's _BitInt description
+   for use during expand.  */
+extern int bitint_extended;
+
 #endif /* GCC_EXPR_H */
diff --git a/gcc/function.cc b/gcc/function.cc
index 48167b0c207..502135c6f58 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -4965,6 +4965,10 @@ prepare_function_start (void)
 
   /* Indicate we have no need of a frame pointer yet.  */
   frame_pointer_needed = 0;
+
+  /* Reset the cache of the "extended" flag in the target's
+ _BitInt info struct.  */
+  bitint_extended = -1;
 }
 
 void
-- 
2.46.0

[PATCH v3 0/1] Implement default_accessor.

2025-06-27 Thread Luc Grosheintz

Changes since v2:

  * Check requirement that ElementType is neither an array type
nor an abstract class type.

Luc Grosheintz (1):
  libstdc++: Implement default_accessor from mdspan.

 libstdc++-v3/include/std/mdspan   | 31 ++
 libstdc++-v3/src/c++23/std.cc.in  |  3 +-
 .../23_containers/mdspan/accessors/default.cc | 59 +++
 .../mdspan/accessors/default_neg.cc   | 23 
 4 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

-- 
2.49.0

[PATCH v3 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz

libstdc++-v3/ChangeLog:

* include/std/mdspan (default_accessor): New class.
* src/c++23/std.cc.in: Register default_accessor.
* testsuite/23_containers/mdspan/accessors/default.cc: New test.
* testsuite/23_containers/mdspan/accessors/default_neg.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   | 31 ++
 libstdc++-v3/src/c++23/std.cc.in  |  3 +-
 .../23_containers/mdspan/accessors/default.cc | 59 +++
 .../mdspan/accessors/default_neg.cc   | 23 
 4 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..c72a64094b7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[no_unique_address]] _S_strides_t _M_strides;
 };
 
+  template
+struct default_accessor
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
 _GLIBCXX_END_NAMESPACE_VERSION
 }
 #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 9336118f5d9..e692caaa5f9 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1850,7 +1850,8 @@ export namespace std
   using std::layout_left;
   using std::layout_right;
   using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and mdspan
 }
 #endif
 
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
new file mode 100644
index 000..303833d4857
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,59 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+constexpr void
+test_ctor()
+{
+  static_assert(std::is_nothrow_constructible_v,
+   std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+}
+
+int
+main()
+{
+  test_accessor_policy>();
+  test_access();
+  static_assert(test_access());
+  test_offset();
+  static_assert(test_offset());
+  test_ctor();
+  return 0;
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
new file mode 100644
index 000..f8da2b569ca
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
@@ -0,0 +1,23 @@
+// { dg-do compile { target c++23 } }
+#include
+
+std::default_accessor a; // { dg-error "required from here" }
+
+class AbstractBase
+{
+  virtual void
+  foo() const = 0;
+};
+
+class Derived : public AbstractBase
+{
+  void
+  foo() const override
+  { }
+};
+
+std::default_accessor b_ok;
+std::default_accessor b_err; // { dg-error "required from here"}
+
+//

[PATCH v3 0/6] LoongArch: Add support for _BitInt [PR117599]

2025-06-27 Thread Yang Yujie

Main changes from v2:

1. Cache the result of info.extended during expansion of every function.

2. Do not insert extra conversion for _BitInt extension before
   __atomic_compare_exchange.  Instead, prevent the conversion
   generated with the compare-and-swap loop from being optimized away.

(Note: It seems that atomic fetch/op doesn't work for _BitInts even
they have the right size, CAS loop is always used when handling
modification.  Fixing this later.)

This series has been bootstrapped and regtested on
loongarch64-linux-gnu and x86_64-linux-gnu.


Yang Yujie (6):
  bitint: Allow mode promotion of _BitInt types
  expand: Reduce unneeded _BitInt extensions
  bitint: Allow unused bits when testing extended _BitInt ABIs
  bitint: Do not optimize away conversion to _BitInt before a VCE
  LoongArch: Prioritize target-specific makefile fragments
  LoongArch: Add support for _BitInt [PR117599]

 gcc/config/loongarch/loongarch.cc | 35 +++-
 gcc/config/loongarch/loongarch.h  |  4 +-
 gcc/explow.cc | 24 ++
 gcc/expr.cc   | 12 +++
 gcc/expr.h|  4 +
 gcc/function.cc   |  4 +
 gcc/match.pd  | 19 -
 gcc/testsuite/gcc.dg/bitintext.h  | 22 -
 .../gcc.target/loongarch/bitint-alignments.c  | 58 +
 .../gcc.target/loongarch/bitint-args.c| 81 +++
 .../gcc.target/loongarch/bitint-sizes.c   | 60 ++
 libgcc/config.host| 31 +++
 libgcc/config/loongarch/libgcc-loongarch.ver  | 26 ++
 libgcc/config/loongarch/t-loongarch   |  2 +
 libgcc/config/loongarch/t-softfp-tf   |  1 +
 15 files changed, 357 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-alignments.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-args.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitint-sizes.c
 create mode 100644 libgcc/config/loongarch/libgcc-loongarch.ver

-- 
2.46.0

Re: [PATCH v2 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz





On 6/27/25 08:53, Tomasz Kaminski wrote:

On Thu, Jun 26, 2025 at 3:40 PM Luc Grosheintz 
wrote:




On 6/13/25 12:40, Luc Grosheintz wrote:

libstdc++-v3/ChangeLog:

   * include/std/mdspan (default_accessor): New class.
   * src/c++23/std.cc.in: Register default_accessor.
   * testsuite/23_containers/mdspan/accessors/default.cc: New test.

Signed-off-by: Luc Grosheintz 
---
   libstdc++-v3/include/std/mdspan   | 26 
   libstdc++-v3/src/c++23/std.cc.in  |  3 +-
   .../23_containers/mdspan/accessors/default.cc | 59 +++
   3 files changed, 87 insertions(+), 1 deletion(-)
   create mode 100644

libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc


diff --git a/libstdc++-v3/include/std/mdspan

b/libstdc++-v3/include/std/mdspan

index 6dc2441f80b..2e85ba8e6cb 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 [[no_unique_address]] _S_strides_t _M_strides;
   };

+  template
+struct default_accessor
+{


It would be easy to check the two mandates: not abstract, not array
here. Would you like a v3, with the change?

https://eel.is/c++draft/views.multidim#mdspan.accessor.default.overview-2


Yes, I think that makes sense. Thanks.


Done: https://gcc.gnu.org/pipermail/libstdc++/2025-June/062197.html







+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+ requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+ constexpr
+ default_accessor(default_accessor<_OElementType>) noexcept
+ { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
   _GLIBCXX_END_NAMESPACE_VERSION
   }
   #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/

std.cc.in

index 109f590f1d1..e671aff68f8 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1843,7 +1843,8 @@ export namespace std
 using std::layout_left;
 using std::layout_right;
 using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor

and mdspan

+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded and mdspan
   }
   #endif

diff --git

a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc

new file mode 100644
index 000..303833d4857
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,59 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+constexpr void
+test_ctor()
+{
+

static_assert(std::is_nothrow_constructible_v,

+

  std::default_accessor>);

+  static_assert(std::is_convertible_v,
+   std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+  std::default_accessor>);
+}
+
+int
+main()
+{
+  test_accessor_policy>();
+  test_access();
+  static_assert(test_access());
+  test_offset();
+  static_assert(test_offset());
+  test_ctor();
+  return 0;
+}

Re: [PATCH v3 1/6] bitint: Allow mode promotion of _BitInt types

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:34PM +0800, Yang Yujie wrote:
> For targets that treat small _BitInts like the fundamental
> integral types, we should allow their machine modes to be promoted
> in the same way.
> 
> gcc/ChangeLog:
> 
>   * explow.cc (promote_function_mode): Add a case for
>   small/medium _BitInts.
>   (promote_mode): Same.

Ok for trunk.

Jakub

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-27 Thread Robin Dapp


Maybe we can pass a scalar mode to the hook when we ask for
SCATTER/GATHER?  That might need fixups in other targets of course,
but it would make it clear what we're asking for?


How about an additional argument bool gather_scatter to make it more explicit?  


Then we could just
if (gather_scatter)
  return true;

for every target but riscv.

I guess we're lucky that powerpc doesn't have gathers, though ;)

--
Regards
Robin

Re: [PATCH v3 2/6] expand: Reduce unneeded _BitInt extensions

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:35PM +0800, Yang Yujie wrote:
> @@ -11291,6 +11296,13 @@ expand_expr_real_1 (tree exp, rtx target, 
> machine_mode tmode,
>type = TREE_TYPE (exp);
>mode = TYPE_MODE (type);
>unsignedp = TYPE_UNSIGNED (type);
> +  if (bitint_extended == -1 && TREE_CODE (type) == BITINT_TYPE)

Please swap the && operands, so
  if (TREE_CODE (type) == BITINT_TYPE && bitint_extended == -1)
because not being BITINT_TYPE is going to be far more common.

Ok for trunk with that nit fixed.

Jakub

[pushed] libstdc++: Fix Darwin bootstrap by simplifying ver file syntax.

2025-06-27 Thread Iain Sandoe

Tested on x86_64-darwin, powerpc64le-linux, pushed to trunk to fix
bootstrap on Darwin, thanks,
Iain

--- 8< ---

The symbol parsing script does not handle the closing brace of a new
symbol group and the identifier for the inherited group to be on
different lines, which r16-1708-gaf5b72cf9f564 introduced. Fixed by
making the conditional encompass both the brace and the identifier.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Keep the closing brace of the
CXXABI_1.3.17 symbol group together with the identifier
for the inherited group.

Signed-off-by: Iain Sandoe 
---
 libstdc++-v3/config/abi/pre/gnu.ver | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index bba5705509c..73b6f338613 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2902,11 +2902,10 @@ CXXABI_1.3.16 {
 CXXABI_1.3.17 {
 # std::exception_ptr::_M_exception_ptr_cast
 
_ZNKSt15__exception_ptr13exception_ptr21_M_exception_ptr_castERKSt9type_info;
-}
 #ifdef __riscv
-CXXABI_1.3.16;
+} CXXABI_1.3.16;
 #else
-CXXABI_1.3.15;
+} CXXABI_1.3.15;
 #endif
 
 # Symbols in the support library (libsupc++) supporting transactional memory.
-- 
2.39.2 (Apple Git-143)

Re: [PATCH v3 4/6] bitint: Do not optimize away conversion to _BitInt before a VCE

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:37PM +0800, Yang Yujie wrote:
> A _BitInt value may rely on a conversion to become properly extended.
> So a conversion to _BitInt is not trivially removable even if the
> types of the result and the operand have the same precision and size.
> 
> This patches fixes gcc.dg/torture/bitint-64.c at -O2 on LoongArch,
> which fails because extension of the result is dropped in a
> compare-and-swap loop generated for incrementing an _Atomic _BitInt,
> causing an ABI violation.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Preserve conversion to _BitInt before a VCE
>   if the _BitInt is extended.

Ok for trunk.  Though, a bitintext.h infrastructure test without atomics
that FAILs without this change and passes with it would be really
appreciated.

Jakub

Re: [PATCH v3 5/6] LoongArch: Prioritize target-specific makefile fragments

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:38PM +0800, Yang Yujie wrote:
> libgcc/ChangeLog:
> 
>   * config.host: Remove unused code. Include LoongArch-specific
>   tmake_files after the OS-specific ones.

This should be reviewed by LoongArch maintainers.

Jakub

Re: [PATCH V3] x86: Enable separate shrink wrapping

2025-06-27 Thread H.J. Lu

On Tue, Jun 17, 2025 at 10:04 PM Cui, Lili  wrote:
>
> From: Lili Cui 
>
> Hi Uros,
>
> This is patch v3, the main changes are as follows.
>
> 1. Added a pro_epilogue_adjust_stack_add_nocc in i386.md to add memory 
> clobber for lea/mov.
> 2. Adjusted some formatting issues.
> 3. Added scan-rtl-dumps for ia32 in shrink_wrap_separate.C.
>
> Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance 
> regression was observed.
> For O2 multi-copy :
> 511.povray_r improved by 2.8% on ZNVER5.
> 511.povray_r improved by 4.2% on EMR
>
> Bootstrapped & regtested on x86-64-pc-linux-gnu.
> Use this patch to build the latest Linux kernel and boot successfully.
>
> Thanks,
> Lili.
>
>
> This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> enable separate shrink wrapping for function prologues/epilogues in
> x86.
>
> When performing separate shrink wrapping, we choose to use mov instead
> of push/pop, because using push/pop is more complicated to handle rsp
> adjustment and may lose performance, so here we choose to use mov, which
> has a small impact on code size, but guarantees performance.
>
> Using mov means we need to use sub/add to maintain the stack frame. In
> some special cases, we need to use lea to prevent affecting EFlags.
>
> Avoid inserting sub between test-je-jle to change EFlags, lea should be
> used here.
>
> foo:
> xorl%eax, %eax
> testl   %edi, %edi
> je  .L11
> sub $16, %rsp  --> leaq-16(%rsp), %rsp
> movq%r13, 8(%rsp)
> movl$1, %r13d
> jle .L4
>
> Tested against SPEC CPU 2017, this change always has a net-positive
> effect on the dynamic instruction count.  See the following table for
> the breakdown on how this reduces the number of dynamic instructions
> per workload on a like-for-like (with/without this commit):
>
> instruction count   basewith commit (commit-base)/commit
> 502.gcc_r   98666845943 96891561634 -1.80%
> 526.blender_r   6.21226E+11 6.12992E+11 -1.33%
> 520.omnetpp_r   1.1241E+11  1.11093E+11 -1.17%
> 500.perlbench_r 1271558717  1263268350  -0.65%
> 523.xalancbmk_r 2.20103E+11 2.18836E+11 -0.58%
> 531.deepsjeng_r 2.73591E+11 2.72114E+11 -0.54%
> 500.perlbench_r 64195557393 63881512409 -0.49%
> 541.leela_r 2.99097E+11 2.98245E+11 -0.29%
> 548.exchange2_r 1.27976E+11 1.27784E+11 -0.15%
> 527.cam4_r  88981458425 7334679 -0.11%
> 554.roms_r  2.60072E+11 2.59809E+11 -0.10%
>
> Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance 
> regression was observed.
>
> For O2 multi-copy :
> 511.povray_r improved by 2.8% on ZNVER5.
> 511.povray_r improved by 4% on EMR
> 511.povray_r improved by 3.3 % ~ 4.6% on ICELAKE.
>
> gcc/ChangeLog:
>
> * config/i386/i386-protos.h (ix86_get_separate_components):
> New function.
> (ix86_components_for_bb): Likewise.
> (ix86_disqualify_components): Likewise.
> (ix86_emit_prologue_components): Likewise.
> (ix86_emit_epilogue_components): Likewise.
> (ix86_set_handled_components): Likewise.
> * config/i386/i386.cc (save_regs_using_push_pop):
> Split from ix86_compute_frame_layout.
> (ix86_compute_frame_layout):
> Use save_regs_using_push_pop.
> (pro_epilogue_adjust_stack):
> Use gen_pro_epilogue_adjust_stack_add_nocc.
> (ix86_expand_prologue): Add some assertions and adjust
> the stack frame at the beginning of the prolog for shrink
> wrapping separate.
> (ix86_emit_save_regs_using_mov):
> Skip registers that are wrapped separately.
> (ix86_emit_restore_regs_using_mov): Likewise.
> (ix86_expand_epilogue): Add some assertions and set
> restore_regs_via_mov to true for shrink wrapping separate.
> (ix86_get_separate_components): New function.
> (ix86_components_for_bb): Likewise.
> (ix86_disqualify_components): Likewise.
> (ix86_emit_prologue_components): Likewise.
> (ix86_emit_epilogue_components): Likewise.
> (ix86_set_handled_components): Likewise.
> (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
> * config/i386/i386.h (struct machine_function):Add
> reg_is_wrapped_separately array for register wrapping
> information.
> * config/i386/i386.md
> (@pro_epilogue_adjust_stack_add_nocc): New.
>
> gcc/testsuite/ChangeLog:
>
> *

[PATCH v2 2/5] libstdc++: Check prerequite of extents::extents.

2025-06-27 Thread Luc Grosheintz

Previously the prerequite of the extents ctors that

static_extent(i) == dynamic_extent || extent(i) == other.extent(i).

was not checked. This commit add the __glibcxx_assert and test it.

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): Check prerequite of the ctor that
static_extent(i) == dynamic_extent || extent(i) == other.extent(i).
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc:
Test the implemented prerequite.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   | 13 
 .../mdspan/extents/class_mandates_neg.cc  | 31 +++
 2 files changed, 44 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 39d02ac08df..e198d65bba3 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -110,10 +110,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __se;
}
 
+   template
+ constexpr bool
+ _S_is_compatible_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   if constexpr (_OtherRank == _S_rank)
+ for (size_t __i = 0; __i < _S_rank; ++__i)
+   if (_Extents[__i] != dynamic_extent
+   && !cmp_equal(_Extents[__i], 
_S_int_cast(__get_extent(__i
+ return false;
+   return true;
+ }
+
template
  constexpr void
  _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
  {
+   
__glibcxx_assert(_S_is_compatible_extents<_OtherRank>(__get_extent));
for (size_t __i = 0; __i < _S_rank_dynamic; ++__i)
  {
size_t __di = __i;
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
index f9c1c019666..8179ff39962 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
@@ -3,10 +3,41 @@
 
 #include 
 
+constexpr size_t dyn = std::dynamic_extent;
+
 std::extents e1; // { dg-error "from here" }
 std::extents e2; // { dg-error "from here" }
 std::extents e3; // { dg-error "from here" }
 std::extents e4;   // { dg-error "from here" }
+
+constexpr bool
+test_dyn2sta_extents_mismatch_00()
+{
+  auto e0 = std::extents{1};
+  [[maybe_unused]] auto e1 = std::extents{e0};// { dg-error 
"expansion of" }
+  return true;
+}
+static_assert(test_dyn2sta_extents_mismatch_00());// { dg-error 
"expansion of" }
+
+constexpr bool
+test_dyn2sta_extents_mismatch_01()
+{
+  [[maybe_unused]] auto e = std::extents{2, 2}; // { dg-error 
"expansion of" }
+  return true;
+}
+static_assert(test_dyn2sta_extents_mismatch_01());   // { dg-error 
"expansion of" }
+
+constexpr bool
+test_dyn2sta_extents_mismatch_02()
+{
+  std::array exts{2, 2};
+  [[maybe_unused]] auto e = std::extents{exts}; // { dg-error 
"expansion of" }
+  return true;
+}
+static_assert(test_dyn2sta_extents_mismatch_02());   // { dg-error 
"expansion of" }
+
 // { dg-prune-output "dynamic or representable as IndexType" }
 // { dg-prune-output "signed or unsigned integer" }
 // { dg-prune-output "invalid use of incomplete type" }
+// { dg-prune-output "non-constant condition for static assertion" }
+// { dg-prune-output "__glibcxx_assert" }
-- 
2.49.0

[PATCH v2 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li

From: Pan Li 

This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)  \
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = FUNC (in[i], x);   \
  }

  T sat_sub(T a, T b)
  {
return (a - b) & (-(T)(a >= b));
  }

  DEF_VX_BINARY(uint32_t, sat_sub)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vssubu.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vssubu.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup):
* config/riscv/riscv.cc (riscv_rtx_costs):
* config/riscv/vector-iterators.md:

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 1 +
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 45dd9256d02..76fb1c36357 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5581,6 +5581,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 case SMIN:
 case UMIN:
 case US_PLUS:
+case US_MINUS:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bbc7547d385..f5d2b2e74ae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3996,6 +3996,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
case MOD:
case UMOD:
case US_PLUS:
+   case US_MINUS:
  *total = get_vector_binary_rtx_cost (op, scalar2vr_cost);
  break;
default:
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 0e1318d1447..782544423c4 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,7 +4042,7 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_v_vdup [
-  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus
+  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus 
us_minus
 ])
 
 (define_code_iterator any_int_binop_no_shift_vdup_v [
-- 
2.43.0

Re: [PATCH v4 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Tomasz Kaminski

Also, for single patch (not-patch series), you do not need to have [PATCH
0/N], simple [PATCH] and then [PATCH v2] also works.

On Fri, Jun 27, 2025 at 11:11 AM Tomasz Kaminski 
wrote:

>
>
> On Fri, Jun 27, 2025 at 11:06 AM Luc Grosheintz 
> wrote:
>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/mdspan (default_accessor): New class.
>> * src/c++23/std.cc.in: Register default_accessor.
>> * testsuite/23_containers/mdspan/accessors/default.cc: New test.
>> * testsuite/23_containers/mdspan/accessors/default_neg.cc: New
>> test.
>>
>> Signed-off-by: Luc Grosheintz 
>> ---
>>  libstdc++-v3/include/std/mdspan   | 31 
>>  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
>>  .../23_containers/mdspan/accessors/default.cc | 72 +++
>>  .../mdspan/accessors/default_neg.cc   | 23 ++
>>  4 files changed, 128 insertions(+), 1 deletion(-)
>>  create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>>  create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
>>
>> diff --git a/libstdc++-v3/include/std/mdspan
>> b/libstdc++-v3/include/std/mdspan
>> index 6dc2441f80b..c72a64094b7 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>[[no_unique_address]] _S_strides_t _M_strides;
>>  };
>>
>> +  template
>> +struct default_accessor
>> +{
>> +  static_assert(!is_array_v<_ElementType>,
>> +   "ElementType must not be an array type");
>> +  static_assert(!is_abstract_v<_ElementType>,
>> +   "ElementType must not be an abstract class type");
>> +
>> +  using offset_policy = default_accessor;
>> +  using element_type = _ElementType;
>> +  using reference = element_type&;
>> +  using data_handle_type = element_type*;
>> +
>> +  constexpr
>> +  default_accessor() noexcept = default;
>> +
>> +  template
>> +   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
>> +   constexpr
>> +   default_accessor(default_accessor<_OElementType>) noexcept
>> +   { }
>> +
>> +  constexpr reference
>> +  access(data_handle_type __p, size_t __i) const noexcept
>> +  { return __p[__i]; }
>> +
>> +  constexpr data_handle_type
>> +  offset(data_handle_type __p, size_t __i) const noexcept
>> +  { return __p + __i; }
>> +};
>> +
>>  _GLIBCXX_END_NAMESPACE_VERSION
>>  }
>>  #endif
>> diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
>> std.cc.in
>> index 9336118f5d9..e692caaa5f9 100644
>> --- a/libstdc++-v3/src/c++23/std.cc.in
>> +++ b/libstdc++-v3/src/c++23/std.cc.in
>> @@ -1850,7 +1850,8 @@ export namespace std
>>using std::layout_left;
>>using std::layout_right;
>>using std::layout_stride;
>> -  // FIXME layout_left_padded, layout_right_padded, default_accessor and
>> mdspan
>> +  using std::default_accessor;
>> +  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and
>> mdspan
>>  }
>>  #endif
>>
>> diff --git
>> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>> b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>> new file mode 100644
>> index 000..ecccda2b68e
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>> @@ -0,0 +1,72 @@
>> +// { dg-do run { target c++23 } }
>> +#include 
>> +
>> +#include 
>> +
>> +constexpr size_t dyn = std::dynamic_extent;
>> +
>> +template
>> +  constexpr void
>> +  test_accessor_policy()
>> +  {
>> +static_assert(std::copyable);
>> +static_assert(std::is_nothrow_move_constructible_v);
>> +static_assert(std::is_nothrow_move_assignable_v);
>> +static_assert(std::is_nothrow_swappable_v);
>> +  }
>> +
>> +constexpr bool
>> +test_access()
>> +{
>> +  std::default_accessor accessor;
>> +  std::array a{10, 11, 12, 13, 14};
>> +  VERIFY(accessor.access(a.data(), 0) == 10);
>> +  VERIFY(accessor.access(a.data(), 4) == 14);
>> +  return true;
>> +}
>> +
>> +constexpr bool
>> +test_offset()
>> +{
>> +  std::default_accessor accessor;
>> +  std::array a{10, 11, 12, 13, 14};
>> +  VERIFY(accessor.offset(a.data(), 0) == a.data());
>> +  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
>> +  return true;
>> +}
>> +
>> +class Base
>> +{ };
>> +
>> +class Derived : public Base
>> +{ };
>> +
>> +constexpr void
>> +test_ctor()
>> +{
>> +
>> static_assert(std::is_nothrow_constructible_v,
>> +
>>  std::default_accessor>);
>>
> Hi, sorry for being unclear before, and resulting in another patch.
> I would like to see a positive test case that cost-adjustment are allowed,
> i.e.:
> +  static_assert(std::is_convertible_v,
> + std::default_accessor double>>);
> And similar for Derived. This is important, as it allows passing mdspan
> to function accepting mdspan.
>
>> +  static_assert(std::is_co

[PATCH] libstdc++: Use runtime format for internal format calls in chrono [PR110739]

2025-06-27 Thread Tomasz Kamiński

This patch adjust all internal std::format call inside of __formatter_chrono,
to use runtime format stirng and thus avoid compile time checking of validity
of the format string. Majority of cases are covered by calling newly introduced
_S_empty_fs() function that returns __Runtime_format_string containing
_S_empty_spec, instead of passing later directly.

In case of _M_j we use _S_str_d3 function (extracted from _S_str_d2), 
eliminating
call to std::format outside of unlikely scenario in which day of year is greater
than 1000 (this may happen for year_month_day with month greater than 12). In
consequence, outside of handling subseconds, we no longer delegate to 
std::format
or construct temporary strings, when formatting chrono types with ok() values.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_S_empty_fs): Define.
(__formatter_chrono::_S_str_d2): Use _S_str_d3 for 3+ digits.
(__formatter_chrono::_S_str_d3): Extracted from _S_str_d2.
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_R_X): Replace
_S_empty_spec with _S_empty_fs().
(__formatter_chrono::_M_j): Likewise and use _S_str_d3 in common
case.
---
 libstdc++-v3/include/bits/chrono_io.h | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index bcfd51b9866..d6bc6c7cf2a 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -873,6 +873,11 @@ namespace __format
   static constexpr const _CharT* _S_minus_empty_spec = _S_chars + 17;
   static constexpr const _CharT* _S_empty_spec = _S_chars + 18;
 
+  [[__gnu__::__always_inline__]]
+  static _Runtime_format_string<_CharT>
+  _S_empty_fs()
+  { return _Runtime_format_string<_CharT>(_S_empty_spec); }
+
   // Return the formatting locale.
   template
std::locale
@@ -1411,7 +1416,7 @@ namespace __format
__i = 12;
}
  else if (__i >= 100) [[unlikely]]
-   return std::format_to(std::move(__out), _S_empty_spec, __i);
+   return std::format_to(std::move(__out), _S_empty_fs(), __i);
 
  return __format::__write(std::move(__out), _S_two_digits(__i));
}
@@ -1425,11 +1430,15 @@ namespace __format
  {
// Decimal number of days, without padding.
auto __d = chrono::floor(__t._M_hours).count();
-   return std::format_to(std::move(__out), _S_empty_spec, __d);
+   return std::format_to(std::move(__out), _S_empty_fs(), __d);
  }
 
- return std::format_to(std::move(__out), _GLIBCXX_WIDEN("{:03d}"),
-   __t._M_day_of_year.count());
+ auto __d = __t._M_day_of_year.count();
+ if (__d >= 1000) [[unlikely]]
+   return std::format_to(std::move(__out), _S_empty_fs(), __d);
+
+ _CharT __buf[3];
+ return __format::__write(std::move(__out), _S_str_d3(__buf, __d));
}
 
   template
@@ -1534,7 +1543,7 @@ namespace __format
 
  if (__hi >= 100) [[unlikely]]
{
- __out = std::format_to(std::move(__out), _S_empty_spec, __hi);
+ __out = std::format_to(std::move(__out), _S_empty_fs(), __hi);
  __sv.remove_prefix(2);
}
  else
@@ -1772,7 +1781,15 @@ namespace __format
   {
if (__n < 100) [[likely]]
  return _S_two_digits(__n);
+return _S_str_d3(__buf, __n);
+  }
 
+  [[__gnu__::__always_inline__]]
+  // Returns decimal representation of __n, padded to 3 digits.
+  // Returned string_view points to __buf.
+  static basic_string_view<_CharT>
+  _S_str_d3(span<_CharT, 3> __buf, unsigned __n)
+  {
_S_fill_two_digits(__buf.data(), __n / 10);
__buf[2] = _S_chars[__n % 10];
return __string_view(__buf.data(), 3);
-- 
2.49.0

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-27 Thread Richard Biener

On Fri, 27 Jun 2025, Robin Dapp wrote:

> > Maybe we can pass a scalar mode to the hook when we ask for
> > SCATTER/GATHER?  That might need fixups in other targets of course,
> > but it would make it clear what we're asking for?
> 
> How about an additional argument bool gather_scatter to make it more 
> explicit?  
> Then we could just
> if (gather_scatter)
>   return true;
> 
> for every target but riscv.

That works for me as well.

> I guess we're lucky that powerpc doesn't have gathers, though ;)
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v2 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 12 files changed, 24 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
index de10d66a1b2..afb5a8513a9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X8)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X8)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
 /* { dg-final { scan-assembler {vsaddu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
index 2e59da06c97..a907e9b7222 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X4)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X4)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -29,3 +30,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vremu.vx} } } */
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
index 064ed1f2e89..efabf9930f0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_W

Re: [PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread Robin Dapp


This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR.  The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.  There will be two cases for the combine:


OK.

--
Regards
Robin

Re: [PATCH v2 0/5] Implement mdspan.

2025-06-27 Thread Luc Grosheintz


After this series I think we should have completed the C++23
part of mdspan.

I'm not sure about the last few steps:

  1. Can I create a commit that sets the official FTM to 202207L?
  2. Who and when creates the commit that updates the page with
  the supported features [1]?

[1]: https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html

On 6/27/25 11:07, Luc Grosheintz wrote:

This is the second iteration of this patch series and the first also
sent to gcc-patches.

The individual patches can be restructured as desired.

The final patch is a proposal to strengthen some exception guarantees to
make mdspan nothrow movable. The standard doesn't require this; but I
felt it made sense to at least propose it.

Luc Grosheintz (5):
   libstdc++: Check prerequisites of layout_*::operator().
   libstdc++: Check prerequite of extents::extents.
   libstdc++: Restructure mdspan tests to reuse IntLike.
   libstdc++: Implement mdspan and tests.
   libstdc++: Make mdspan nothrow movable.

  libstdc++-v3/include/std/mdspan   | 305 +
  libstdc++-v3/src/c++23/std.cc.in  |   3 +-
  .../23_containers/mdspan/class_mandate_neg.cc |  58 ++
  .../mdspan/extents/class_mandates_neg.cc  |  31 +
  .../mdspan/extents/custom_integer.cc  |  27 +-
  .../23_containers/mdspan/extents/int_like.h   |  28 +
  .../23_containers/mdspan/layout_like.h|  63 ++
  .../mdspan/layouts/class_mandate_neg.cc   |  26 +
  .../testsuite/23_containers/mdspan/mdspan.cc  | 591 ++
  9 files changed, 1105 insertions(+), 27 deletions(-)
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/class_mandate_neg.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/int_like.h
  create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layout_like.h
  create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc

Re: [PATCH] libstdc++: Use runtime format for internal format calls in chrono [PR110739]

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 10:10, Tomasz Kaminski  wrote:
>
>
>
> On Fri, Jun 27, 2025 at 10:31 AM Tomasz Kamiński  wrote:
>>
>> This patch adjust all internal std::format call inside of __formatter_chrono,
>> to use runtime format stirng and thus avoid compile time checking of validity
>> of the format string. Majority of cases are covered by calling newly 
>> introduced
>> _S_empty_fs() function that returns __Runtime_format_string containing
>> _S_empty_spec, instead of passing later directly.
>>
>> In case of _M_j we use _S_str_d3 function (extracted from _S_str_d2), 
>> eliminating
>> call to std::format outside of unlikely scenario in which day of year is 
>> greater
>> than 1000 (this may happen for year_month_day with month greater than 12). In
>> consequence, outside of handling subseconds, we no longer delegate to 
>> std::format
>> or construct temporary strings, when formatting chrono types with ok() 
>> values.
>>
>> PR libstdc++/110739
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/chrono_io.h (__formatter_chrono::_S_empty_fs): Define.
>> (__formatter_chrono::_S_str_d2): Use _S_str_d3 for 3+ digits.
>> (__formatter_chrono::_S_str_d3): Extracted from _S_str_d2.
>> (__formatter_chrono::_M_H_I, __formatter_chrono::_M_R_X): Replace
>> _S_empty_spec with _S_empty_fs().
>> (__formatter_chrono::_M_j): Likewise and use _S_str_d3 in common
>> case.
>> ---
>>
>>
>
> I do not think this buys us much, but I think it is worth doing anyway.

I agree, I've never really liked doing recursive std::format calls
inside formatters (obviously sometimes it's unavoidable, e.g. for
custom duration rep types).

> It also finishes my side goal, of getting rid of temporary strings, and using 
> local buffers,
> that I applied to other specifiers in previous commits.
> Tested on x86_64-linux locally. The std/time* test passed with
> -D_GLIBCXX_USE_CXX11_ABI=0 and -D_GLIBCXX_DEBUG.

Just to be sure, it passes without those options too, right? :-)

> OK for trunk?

Please put the always_inline attribute after the comment on _S_str_d3.

OK for trunk with that change, thanks.


>
>>
>>  libstdc++-v3/include/bits/chrono_io.h | 27 ++-
>>  1 file changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
>> b/libstdc++-v3/include/bits/chrono_io.h
>> index bcfd51b9866..d6bc6c7cf2a 100644
>> --- a/libstdc++-v3/include/bits/chrono_io.h
>> +++ b/libstdc++-v3/include/bits/chrono_io.h
>> @@ -873,6 +873,11 @@ namespace __format
>>static constexpr const _CharT* _S_minus_empty_spec = _S_chars + 17;
>>static constexpr const _CharT* _S_empty_spec = _S_chars + 18;
>>
>> +  [[__gnu__::__always_inline__]]
>> +  static _Runtime_format_string<_CharT>
>> +  _S_empty_fs()
>> +  { return _Runtime_format_string<_CharT>(_S_empty_spec); }
>> +
>>// Return the formatting locale.
>>template
>> std::locale
>> @@ -1411,7 +1416,7 @@ namespace __format
>> __i = 12;
>> }
>>   else if (__i >= 100) [[unlikely]]
>> -   return std::format_to(std::move(__out), _S_empty_spec, __i);
>> +   return std::format_to(std::move(__out), _S_empty_fs(), __i);
>>
>>   return __format::__write(std::move(__out), _S_two_digits(__i));
>> }
>> @@ -1425,11 +1430,15 @@ namespace __format
>>   {
>> // Decimal number of days, without padding.
>> auto __d = chrono::floor(__t._M_hours).count();
>> -   return std::format_to(std::move(__out), _S_empty_spec, __d);
>> +   return std::format_to(std::move(__out), _S_empty_fs(), __d);
>>   }
>>
>> - return std::format_to(std::move(__out), _GLIBCXX_WIDEN("{:03d}"),
>> -   __t._M_day_of_year.count());
>> + auto __d = __t._M_day_of_year.count();
>> + if (__d >= 1000) [[unlikely]]
>> +   return std::format_to(std::move(__out), _S_empty_fs(), __d);
>> +
>> + _CharT __buf[3];
>> + return __format::__write(std::move(__out), _S_str_d3(__buf, __d));
>> }
>>
>>template
>> @@ -1534,7 +1543,7 @@ namespace __format
>>
>>   if (__hi >= 100) [[unlikely]]
>> {
>> - __out = std::format_to(std::move(__out), _S_empty_spec, __hi);
>> + __out = std::format_to(std::move(__out), _S_empty_fs(), __hi);
>>   __sv.remove_prefix(2);
>> }
>>   else
>> @@ -1772,7 +1781,15 @@ namespace __format
>>{
>> if (__n < 100) [[likely]]
>>   return _S_two_digits(__n);
>> +return _S_str_d3(__buf, __n);
>> +  }
>>
>> +  [[__gnu__::__always_inline__]]
>> +  // Returns decimal representation of __n, padded to 3 digits.
>> +  // Returned string_view points to __buf.
>> +  static basic_string_view<_CharT>
>> +  _S_str_d3(span<_CharT, 3> __buf, unsigned __n)
>> +

Re: [patch,avr] Turn on LRA per default

2025-06-27 Thread Jeff Law





On 6/27/25 7:08 AM, Georg-Johann Lay wrote:

This turns on -mlra per default on avr.

Ok for trunk?

Yes, definitely.  The more soak time it gets the better IMHO.

jeff

RE: [PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread Li, Pan2

Seem the CI only pick up the last commit to run, from the Apply Status of 
https://github.com/ewlu/gcc-precommit-ci/issues/3576#issuecomment-3012381157.

Is there anyway we can retrigger the test somewhere ? If no I can send a v3 
series with the commit reordered and see.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 27, 2025 9:08 PM
To: Robin Dapp ; Li, Pan2 ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to 
vssubu.vx on GR2VR cost

> OK.

Hmm, I'm still seeing test failures in the CI.  Could you check if those are 
valid?


-- 
Regards
 Robin

Re: [PATCH 4/8] libstdc++: Directly implement ranges::stable_sort [PR100795]

2025-06-27 Thread Jonathan Wakely


On 26/06/25 22:25 -0400, Patrick Palka wrote:

PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__move_merge): New,
based on the stl_algo.h implementation.
(__detail::__merge_sort_loop): Likewise.
(__detail::__chunk_insertion_sort): Likewise.
(__detail::__merge_sort_with_buffer): Likewise.
(__detail::__stable_sort_adaptive): Likewise.
(__detail::__stable_sort_adaptive_resize): Likewise.
(__detail::__inplace_stable_sort): Likewise.
(__stable_sort_fn::operator()): Reimplement in terms of the above.
* testsuite/25_algorithms/stable_sort/constrained.cc:
---
libstdc++-v3/include/bits/ranges_algo.h   | 207 +-
.../25_algorithms/stable_sort/constrained.cc  |  30 +++
2 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index b0357600adbc..7dfd4e7ed64c 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -2388,6 +2388,170 @@ namespace ranges

  inline constexpr __sort_fn sort{};

+  namespace __detail
+  {
+/// This is a helper function for the __merge_sort_loop routines.
+template
+  _Out
+  __move_merge(_Iter __first1, _Iter __last1,
+  _Iter __first2, _Iter __last2,
+  _Out __result, _Comp __comp)
+  {
+   while (__first1 != __last1 && __first2 != __last2)
+ {
+   if (__comp(*__first2, *__first1))
+ {
+   *__result = ranges::iter_move(__first2);
+   ++__first2;
+ }
+   else
+ {
+   *__result = ranges::iter_move(__first1);
+   ++__first1;
+ }
+   ++__result;
+ }
+   return ranges::move(__first2, __last2,
+   ranges::move(__first1, __last1, __result).out).out;
+  }
+
+template
+  void
+  __merge_sort_loop(_Iter __first, _Iter __last, _Out __result,
+   _Distance __step_size, _Comp __comp)
+  {
+   const _Distance __two_step = 2 * __step_size;
+
+   while (__last - __first >= __two_step)
+ {
+   __result = __detail::__move_merge(__first, __first + __step_size,
+ __first + __step_size,
+ __first + __two_step,
+ __result, __comp);
+   __first += __two_step;
+ }
+   __step_size = ranges::min(_Distance(__last - __first), __step_size);
+
+   __detail::__move_merge(__first, __first + __step_size,
+  __first + __step_size, __last, __result, __comp);
+  }
+
+template
+  constexpr void
+  __chunk_insertion_sort(_Iter __first, _Iter __last,
+_Distance __chunk_size, _Compare __comp)
+  {
+   while (__last - __first >= __chunk_size)
+ {
+   __detail::__insertion_sort(__first, __first + __chunk_size, __comp);
+   __first += __chunk_size;
+ }
+   __detail::__insertion_sort(__first, __last, __comp);
+  }
+
+template
+  void
+  __merge_sort_with_buffer(_Iter __first, _Iter __last,
+  _Pointer __buffer, _Comp __comp)
+  {
+   using _Distance = iter_difference_t<_Iter>;
+
+   const _Distance __len = __last - __first;
+   const _Pointer __buffer_last = __buffer + ptrdiff_t(__len);
+
+   constexpr int __chunk_size = 7;
+   _Distance __step_size = __chunk_size;
+   __detail::__chunk_insertion_sort(__first, __last, __step_size, __comp);
+
+   while (__step_size < __len)
+ {
+   __detail::__merge_sort_loop(__first, __last, __buffer,
+   __step_size, __comp);
+   __step_size *= 2;
+   __detail::__merge_sort_loop(__buffer, __buffer_last, __first,
+   ptrdiff_t(__step_size), __comp);
+   __step_size *= 2;
+ }
+  }
+
+template
+  void
+  __merge_adaptive(_Iter __first, _Iter __middle, _Iter __last,
+  iter_difference_t<_Iter> __len1,
+  iter_difference_t<_Iter> __len2,
+  _Pointer __buffer, _Comp __comp); // defined near 
inplace_merge
+
+template
+  void
+  __merge_adaptive_resize(_Iter __first, _Iter __middle, _Iter __last,
+ _Distance __len1, _Distance __len2,
+ _Pointer __buffer, _Distance __buffer_size,
+ _Comp __comp); // defined near inplace_merge
+
+template
+  constexpr void
+  __merge_without_buffer(_Iter __first, _Iter __middle, _Iter __last,
+_Distance __len1, _Distance __len2,
+

Re: [PATCH 5/8] libstdc++: Directly implement ranges::stable_partition [PR100795]

2025-06-27 Thread Jonathan Wakely


On 26/06/25 22:25 -0400, Patrick Palka wrote:

PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__find_if_not_n): New,
based on the stl_algo.h implementation.
(__detail::__stable_partition_adaptive): Likewise.
(__stable_partition_fn::operator()): Reimplement in terms of
the above.
* testsuite/25_algorithms/stable_partition/constrained.cc
(test03): New test.
---
libstdc++-v3/include/bits/ranges_algo.h   | 106 +-
.../stable_partition/constrained.cc   |  26 +
2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 7dfd4e7ed64c..a9924cd9c49e 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3133,6 +3133,81 @@ namespace ranges
  inline constexpr __partition_fn partition{};

#if _GLIBCXX_HOSTED
+  namespace __detail
+  {
+/// Like find_if_not(), but uses and updates a count of the
+/// remaining range length instead of comparing against an end
+/// iterator.
+template
+  constexpr _Iter
+  __find_if_not_n(_Iter __first, _Distance& __len, _Pred __pred)
+  {
+   for (; __len; --__len,  (void) ++__first)
+ if (!__pred(*__first))
+   break;
+   return __first;
+  }
+
+template
+  _GLIBCXX26_CONSTEXPR
+  subrange<_Iter>
+  __stable_partition_adaptive(_Iter __first, _Sent __last,
+ _Pred __pred, _Distance __len,
+ _Pointer __buffer,
+ _Distance __buffer_size)
+  {
+   if (__len == 1)
+ return {__first, ranges::next(__first, 1)};
+
+   if (__len <= __buffer_size)
+ {
+   _Iter __result1 = __first;
+   _Pointer __result2 = __buffer;
+
+   // The precondition guarantees that !__pred(__first), so
+   // move that element to the buffer before starting the loop.
+   // This ensures that we only call __pred once per element.
+   *__result2 = ranges::iter_move(__first);
+   ++__result2;
+   ++__first;
+   for (; __first != __last; ++__first)
+ if (__pred(*__first))
+   {
+ *__result1 = ranges::iter_move(__first);
+ ++__result1;
+   }
+ else
+   {
+ *__result2 = ranges::iter_move(__first);
+ ++__result2;
+   }
+
+   ranges::move(__buffer, __result2, __result1);
+   return {__result1, __first};
+ }
+
+   _Iter __middle = __first;
+   ranges::advance(__middle, __len / 2);
+   _Iter __left_split
+ = __detail::__stable_partition_adaptive(__first, __middle, __pred,
+ __len / 2, __buffer,
+ __buffer_size).begin();
+
+   // Advance past true-predicate values to satisfy this
+   // function's preconditions.
+   _Distance __right_len = __len - __len / 2;
+   _Iter __right_split = __detail::__find_if_not_n(__middle, __right_len, 
__pred);
+
+   if (__right_len)
+ __right_split
+   = __detail::__stable_partition_adaptive(__right_split, __last, 
__pred,
+   __right_len, __buffer, 
__buffer_size).begin();
+
+   return ranges::rotate(__left_split, __middle, __right_split);
+  }
+  } // namespace __detail
+
  struct __stable_partition_fn
  {
template _Sent,
@@ -3144,11 +3219,32 @@ namespace ranges
  operator()(_Iter __first, _Sent __last,
 _Pred __pred, _Proj __proj = {}) const
  {
-   auto __lasti = ranges::next(__first, __last);
-   auto __middle
- = std::stable_partition(std::move(__first), __lasti,
- __detail::__make_pred_proj(__pred, __proj));
-   return {std::move(__middle), std::move(__lasti)};
+   auto __pred_proj = __detail::__make_pred_proj(__pred, __proj);
+   __first = ranges::find_if_not(__first, __last, __pred_proj);


Does this end up going through another layer of
invoke(pred, invoke(proj, *i)) inside ranges::find_if_not?
Hopeuflly with the recent _Pred_proj changes that will get inlined,
but is there any reason to not just use:

__first = ranges::find_if_not(__first, __last, __pred, __proj);

here, and then use __pred_proj for the __stable_partition_adaptive
calls below?


+
+   if (__first == __last)
+ return {__first, __first};
+
+   using _DistanceType = iter_difference_t<_Iter>;
+   const _DistanceType __len = ranges::distance(__first, __last);
+
+#if __glibcxx_constexpr_algorithms >= 202306L // >= C++26
+   if consteval {
+ // Simulate a _Temporary_buffer of length 1:
+ iter_value_t<

Re: [PATCH 5/8] libstdc++: Directly implement ranges::stable_partition [PR100795]

2025-06-27 Thread Jonathan Wakely


On 26/06/25 22:25 -0400, Patrick Palka wrote:

PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__find_if_not_n): New,
based on the stl_algo.h implementation.
(__detail::__stable_partition_adaptive): Likewise.
(__stable_partition_fn::operator()): Reimplement in terms of
the above.
* testsuite/25_algorithms/stable_partition/constrained.cc
(test03): New test.
---
libstdc++-v3/include/bits/ranges_algo.h   | 106 +-
.../stable_partition/constrained.cc   |  26 +
2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 7dfd4e7ed64c..a9924cd9c49e 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3133,6 +3133,81 @@ namespace ranges
  inline constexpr __partition_fn partition{};

#if _GLIBCXX_HOSTED
+  namespace __detail
+  {
+/// Like find_if_not(), but uses and updates a count of the
+/// remaining range length instead of comparing against an end
+/// iterator.


Just '//' comments here I think. We don't want Doxygen to think it
needs to document this, do we?


+template
+  constexpr _Iter
+  __find_if_not_n(_Iter __first, _Distance& __len, _Pred __pred)
+  {
+   for (; __len; --__len,  (void) ++__first)
+ if (!__pred(*__first))
+   break;
+   return __first;
+  }
+
+template
+  _GLIBCXX26_CONSTEXPR
+  subrange<_Iter>
+  __stable_partition_adaptive(_Iter __first, _Sent __last,
+ _Pred __pred, _Distance __len,
+ _Pointer __buffer,
+ _Distance __buffer_size)
+  {
+   if (__len == 1)
+ return {__first, ranges::next(__first, 1)};
+
+   if (__len <= __buffer_size)
+ {
+   _Iter __result1 = __first;
+   _Pointer __result2 = __buffer;
+
+   // The precondition guarantees that !__pred(__first), so
+   // move that element to the buffer before starting the loop.
+   // This ensures that we only call __pred once per element.
+   *__result2 = ranges::iter_move(__first);
+   ++__result2;
+   ++__first;
+   for (; __first != __last; ++__first)
+ if (__pred(*__first))
+   {
+ *__result1 = ranges::iter_move(__first);
+ ++__result1;
+   }
+ else
+   {
+ *__result2 = ranges::iter_move(__first);
+ ++__result2;
+   }
+
+   ranges::move(__buffer, __result2, __result1);
+   return {__result1, __first};
+ }
+
+   _Iter __middle = __first;
+   ranges::advance(__middle, __len / 2);
+   _Iter __left_split
+ = __detail::__stable_partition_adaptive(__first, __middle, __pred,
+ __len / 2, __buffer,
+ __buffer_size).begin();
+
+   // Advance past true-predicate values to satisfy this
+   // function's preconditions.
+   _Distance __right_len = __len - __len / 2;
+   _Iter __right_split = __detail::__find_if_not_n(__middle, __right_len, 
__pred);
+
+   if (__right_len)
+ __right_split
+   = __detail::__stable_partition_adaptive(__right_split, __last, 
__pred,
+   __right_len, __buffer, 
__buffer_size).begin();
+
+   return ranges::rotate(__left_split, __middle, __right_split);
+  }
+  } // namespace __detail
+
  struct __stable_partition_fn
  {
template _Sent,
@@ -3144,11 +3219,32 @@ namespace ranges
  operator()(_Iter __first, _Sent __last,
 _Pred __pred, _Proj __proj = {}) const
  {
-   auto __lasti = ranges::next(__first, __last);
-   auto __middle
- = std::stable_partition(std::move(__first), __lasti,
- __detail::__make_pred_proj(__pred, __proj));
-   return {std::move(__middle), std::move(__lasti)};
+   auto __pred_proj = __detail::__make_pred_proj(__pred, __proj);
+   __first = ranges::find_if_not(__first, __last, __pred_proj);
+
+   if (__first == __last)
+ return {__first, __first};
+
+   using _DistanceType = iter_difference_t<_Iter>;
+   const _DistanceType __len = ranges::distance(__first, __last);
+
+#if __glibcxx_constexpr_algorithms >= 202306L // >= C++26
+   if consteval {
+ // Simulate a _Temporary_buffer of length 1:
+ iter_value_t<_Iter> __buf = ranges::iter_move(__first);
+ *__first = std::move(__buf);
+ return __detail::__stable_partition_adaptive(__first, __last,
+  __pred_proj,
+

Re: [PATCH 6/8] libstdc++: Directly implement ranges::nth_element [PR100795]

2025-06-27 Thread Jonathan Wakely


On 26/06/25 22:25 -0400, Patrick Palka wrote:

PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__introselect): New,
based on the stl_algo.h implementation.
(nth_element_fn::operator()): Reimplement in terms of the above.
* testsuite/25_algorithms/nth_element/constrained.cc:


OK for trunk.



---
libstdc++-v3/include/bits/ranges_algo.h   | 47 +--
.../25_algorithms/nth_element/constrained.cc  | 31 
2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index a9924cd9c49e..b12da2af1263 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -2805,6 +2805,33 @@ namespace ranges

  inline constexpr __is_sorted_fn is_sorted{};

+  namespace __detail
+  {
+template
+  constexpr void
+  __introselect(_Iter __first, _Iter __nth, _Iter __last,
+   iter_difference_t<_Iter> __depth_limit, _Comp __comp)
+  {
+   while (__last - __first > 3)
+ {
+   if (__depth_limit == 0)
+ {
+   __detail::__heap_select(__first, __nth + 1, __last, __comp);
+   // Place the nth largest element in its final position.
+   ranges::iter_swap(__first, __nth);
+   return;
+ }
+   --__depth_limit;
+   _Iter __cut = __detail::__unguarded_partition_pivot(__first, 
__last, __comp);
+   if (__cut <= __nth)
+ __first = __cut;
+   else
+ __last = __cut;
+ }
+   __detail::__insertion_sort(__first, __last, __comp);
+  }
+  } // namespace __detail
+
  struct __nth_element_fn
  {
template _Sent,
@@ -2814,11 +2841,21 @@ namespace ranges
  operator()(_Iter __first, _Iter __nth, _Sent __last,
 _Comp __comp = {}, _Proj __proj = {}) const
  {
-   auto __lasti = ranges::next(__first, __last);
-   _GLIBCXX_STD_A::nth_element(std::move(__first), std::move(__nth),
-   __lasti,
-   __detail::__make_comp_proj(__comp, __proj));
-   return __lasti;
+   if constexpr (!same_as<_Iter, _Sent>)
+ return (*this)(__first, __nth, ranges::next(__first, __last),
+std::move(__comp), std::move(__proj));
+   else
+ {
+   if (__first == __last || __nth == __last)
+ return __last;
+
+   auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
+   auto __n = __detail::__to_unsigned_like(__last - __first);
+   __detail::__introselect(__first, __nth, __last,
+   std::__bit_width(__n) * 2,
+   __comp_proj);
+   return __last;
+ }
  }

template
#include 
+#include 
#include 
#include 

@@ -67,9 +68,39 @@ test02()
  return x[3] == 4;
}

+constexpr bool
+test03()
+{
+  // PR libstdc++/100795 - ranges::sort should not use std::sort directly
+#if __SIZEOF_INT128__
+  auto v = std::views::iota(__int128(0), __int128(20));
+#else
+  auto v = std::views::iota(0ll, 20ll);
+#endif
+
+  int storage[20] = {2,5,4,3,1,6,7,9,10,8,11,14,12,13,15,16,18,0,19,17};
+  auto w = v | std::views::transform([&](auto i) -> int& { return storage[i]; 
});
+  using type = decltype(w);
+  using cat = 
std::iterator_traits>::iterator_category;
+  static_assert( std::same_as );
+  static_assert( std::ranges::random_access_range );
+
+  ranges::nth_element(w, w.begin() + 10);
+  VERIFY( w[10] == 10 );
+
+  ranges::nth_element(w, w.begin() + 5, std::ranges::greater{});
+  VERIFY( w[5] == 19 - 5 );
+
+  ranges::nth_element(w, w.begin() + 15, std::ranges::greater{}, 
std::negate{});
+  VERIFY( w[15] == 15 );
+
+  return true;
+}
+
int
main()
{
  test01();
  static_assert(test02());
+  static_assert(test03());
}
--
2.50.0.131.gcf6f63ea6b

Re: Remove early inlining from afdo pass

2025-06-27 Thread Jan Hubicka

Hi,
> 
> We can look into this. We do compare manually the IR dumps from both and it 
> is not ideal.
> What we should do is an additional (optional) pass that runs after 
> auto-profile to compare the annotations 
> using the profile-use. We will have to filter out any functions/path that 
> runs less than a threshold to reduce noise.
> Functions that are fully inlined are also not having any profile. 

With -fprofile-use there is already logic to compare static profile
prediction heuristics with the real data.  This is implemented by
running static profile predictor whithin profile pass if 
-fdump-ipa-profile-details is enabled and dump staistics which can be
then summarized by contrib/analyze_brprob.py pass.

We could use same strategy here.
combine_predictions_for_bb can dump banch probabilities with AFDO
quality as a fake predictor which will give us idea about coverage of
AFDO data and its quality.

We probably want to compare the AFDO counts to real counts and quantify
how far are from precise data.  I would start by remembering info about
basic blocks being hot/cold by AFDO and dumping info when real data show
that block is hot even if count is divided say by 1000 and AFDO still
marked it cold.  
Grepping the dump file can then point us to most important profile
problems.  Eventually we can get something more fancy...

With -fno-auto-profile-inlining it should be possible to produce two
binaries.  One with -g for auto-profile and one with -fprofile-generate
and then load both profiles to the compiler with -fauto-profile=xxx 
-fprofile-use
I think in this case read FDO should just overwrite auto-profile giving
us place to dump stats.

Honza
> 
> Thanks,
> Kugan
>  
> 
> > 
> > Honza
> 
>

Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-27 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, 27 Jun 2025, Richard Biener wrote:
>
>> On Thu, 26 Jun 2025, Richard Sandiford wrote:
>> 
>> > Richard Biener  writes:
>> > > The following fixes the computation of supports_partial_vectors which
>> > > is used to prune the set of modes to iterate over for epilog
>> > > vectorization.  The used partial_vectors_supported_p predicate
>> > > only looks for while_ult while also support predication when
>> > > mask modes are integer modes as for AVX512.
>> > >
>> > > I've noticed this isn't very effective on x86_64 anyway since
>> > > if the main loop mode is autodetected we skip re-analyzing
>> > > mode_i == 0, but then mode_i == 1 is usually the very same
>> > > large mode.
>> > >
>> > > Thus I do wonder if we should instead always (or when
>> > > --param vect-partial-vector-usage != 0, or when the target would
>> > > support predication in principle) perform main loop analysis
>> > > with partial vectors in mind (start with can_use_partial_vectors_p =
>> > > true), but only at the end honor the --param when deciding on
>> > > using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
>> > > for each analyzed mode and use that more specific info for the
>> > > pruning?
>> > 
>> > Yeah, sounds like that could work.  In principle, epilogue loops should
>> > be strictly easier to vectorise than main loops.  If you know that the
>> > epilogue "loop" never iterates, there could in principle be cases
>> > where we'd need to clear can_use_partial_vectors_p for the main loop
>> > but not for the epilogue loop.  I can't think of any situation like
>> > that off-hand though.  Likewise for unrolling.
>> 
>> So we already do analyze the main loop for partial vector usage when
>> --param vect-partial-vector-usage != 0, so for the purpose of
>> pruning epilogue analysis we should be able to use
>> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
>> 
>> As you say there might in theory be corner cases, like when
>> applying a suggested unroll factor to the main loop.  I can't
>> think of a reason for when we don't, so we can in principle
>> just remember the analysis result without if required.
>> 
>> But basically it would be like below, I'll post this separately
>> again so the CI can pick it up.
>> 
>> Would that be OK as-is or do you think we should be looking
>> to deal with the unrolled main loop case preventively?
>
> It's easy enough to do, like with the following.  So that's what
> I'm going to test.

Argh!  Sorry, just realised that this won't work for AArch64 after all.

LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P doesn't include information about
whether the loop control is supported (good), but it does still contain
information about whether masking is supported for individual operations,
with that information being specific to the current vector mode,
rather than being a general statement about the target as a whole.

So I think this would have the effect of preventing SVE epilogue loops
for Advanced SIMD main loops, which is something we currently support.
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P would be false for any nontrivial
Advanced SIMD loop due to the lack of masked load/store support.

Richard

>
> Richard.
>
> From b0ae2522e8ddb3381e7e22995c0ce3e700c53755 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 26 Jun 2025 11:08:04 +0200
> Subject: [PATCH] Fixup partial_vectors_supported_p use
> To: gcc-patches@gcc.gnu.org
>
> The following fixes the computation of supports_partial_vectors which
> is used to prune the set of modes to iterate over for epilog
> vectorization.  The used partial_vectors_supported_p predicate
> only looks for while_ult while also support predication when
> mask modes are integer modes as for AVX512.
>
> I've noticed this isn't very effective on x86_64 anyway since
> if the main loop mode is autodetected we skip re-analyzing
> mode_i == 0, but then mode_i == 1 is usually the very same
> large mode.  This is fixed by the next patch.
>
> The following simplifies the logic by simply re-using the
> already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
> the main loop to decide whether we can possibly use partial
> vectors for the epilogue (for the case of having the same VF).
> We remember the main loop analysis before a suggested unroll
> factor is applied to avoid possible differences from that.
>
>   * tree-vect-loop.cc (vect_analyze_loop_1): New parameter
>   to output whether the not unrolled loop can use partial
>   vectors.
>   (vect_analyze_loop): Use the main loop partial vector
>   analysis result to decide if epilogues with the same VF
>   can use partial vectors.
> ---
>  gcc/tree-vect-loop.cc | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index c824b5abaaf..fa022dfad42 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3474,7 +3474,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info

Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-27 Thread Richard Biener

On Fri, 27 Jun 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 27 Jun 2025, Richard Biener wrote:
> >
> >> On Thu, 26 Jun 2025, Richard Sandiford wrote:
> >> 
> >> > Richard Biener  writes:
> >> > > The following fixes the computation of supports_partial_vectors which
> >> > > is used to prune the set of modes to iterate over for epilog
> >> > > vectorization.  The used partial_vectors_supported_p predicate
> >> > > only looks for while_ult while also support predication when
> >> > > mask modes are integer modes as for AVX512.
> >> > >
> >> > > I've noticed this isn't very effective on x86_64 anyway since
> >> > > if the main loop mode is autodetected we skip re-analyzing
> >> > > mode_i == 0, but then mode_i == 1 is usually the very same
> >> > > large mode.
> >> > >
> >> > > Thus I do wonder if we should instead always (or when
> >> > > --param vect-partial-vector-usage != 0, or when the target would
> >> > > support predication in principle) perform main loop analysis
> >> > > with partial vectors in mind (start with can_use_partial_vectors_p =
> >> > > true), but only at the end honor the --param when deciding on
> >> > > using_partial_vectors_p.  We can then remember 
> >> > > can_use_partial_vectors_p
> >> > > for each analyzed mode and use that more specific info for the
> >> > > pruning?
> >> > 
> >> > Yeah, sounds like that could work.  In principle, epilogue loops should
> >> > be strictly easier to vectorise than main loops.  If you know that the
> >> > epilogue "loop" never iterates, there could in principle be cases
> >> > where we'd need to clear can_use_partial_vectors_p for the main loop
> >> > but not for the epilogue loop.  I can't think of any situation like
> >> > that off-hand though.  Likewise for unrolling.
> >> 
> >> So we already do analyze the main loop for partial vector usage when
> >> --param vect-partial-vector-usage != 0, so for the purpose of
> >> pruning epilogue analysis we should be able to use
> >> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
> >> 
> >> As you say there might in theory be corner cases, like when
> >> applying a suggested unroll factor to the main loop.  I can't
> >> think of a reason for when we don't, so we can in principle
> >> just remember the analysis result without if required.
> >> 
> >> But basically it would be like below, I'll post this separately
> >> again so the CI can pick it up.
> >> 
> >> Would that be OK as-is or do you think we should be looking
> >> to deal with the unrolled main loop case preventively?
> >
> > It's easy enough to do, like with the following.  So that's what
> > I'm going to test.
> 
> Argh!  Sorry, just realised that this won't work for AArch64 after all.
> 
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P doesn't include information about
> whether the loop control is supported (good), but it does still contain
> information about whether masking is supported for individual operations,
> with that information being specific to the current vector mode,
> rather than being a general statement about the target as a whole.
> 
> So I think this would have the effect of preventing SVE epilogue loops
> for Advanced SIMD main loops, which is something we currently support.
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P would be false for any nontrivial
> Advanced SIMD loop due to the lack of masked load/store support.

Oh, I see.  I'll go with the original fix for this then.

Richard.

> Richard
> 
> >
> > Richard.
> >
> > From b0ae2522e8ddb3381e7e22995c0ce3e700c53755 Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Thu, 26 Jun 2025 11:08:04 +0200
> > Subject: [PATCH] Fixup partial_vectors_supported_p use
> > To: gcc-patches@gcc.gnu.org
> >
> > The following fixes the computation of supports_partial_vectors which
> > is used to prune the set of modes to iterate over for epilog
> > vectorization.  The used partial_vectors_supported_p predicate
> > only looks for while_ult while also support predication when
> > mask modes are integer modes as for AVX512.
> >
> > I've noticed this isn't very effective on x86_64 anyway since
> > if the main loop mode is autodetected we skip re-analyzing
> > mode_i == 0, but then mode_i == 1 is usually the very same
> > large mode.  This is fixed by the next patch.
> >
> > The following simplifies the logic by simply re-using the
> > already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
> > the main loop to decide whether we can possibly use partial
> > vectors for the epilogue (for the case of having the same VF).
> > We remember the main loop analysis before a suggested unroll
> > factor is applied to avoid possible differences from that.
> >
> > * tree-vect-loop.cc (vect_analyze_loop_1): New parameter
> > to output whether the not unrolled loop can use partial
> > vectors.
> > (vect_analyze_loop): Use the main loop partial vector
> > analysis result to decide if epilogues with the same VF
> > can use partial vectors.
> > ---
> >  gcc/tree-vect-loop.cc |

Re: [PATCH] RISC-V: Refactor the function bitmap_union_of_preds_with_entry

2025-06-27 Thread Jin Ma

On Tue, 24 Jun 2025 14:05:54 +0200, "Robin Dapp" wrote:
> Hi Ma Jin,
> 
> thanks for looking into this, it has been on my todo list with very low 
> priority since the vsetvl rewrite.

Yes, I've noticed this for quite some time. While the logic itself is sound,
it strikes me as quite odd every time I see it, and I consistently receive a
warning during the Coverity checks. As a result, I decided to refactor it :)

> > +  /* Handle case with no predecessors (including ENTRY block).  */
> > +  if (EDGE_COUNT (b->preds) == 0)
> >  {
> > -  e = EDGE_PRED (b, ix);
> > -  bitmap_copy (dst, src[e->src->index]);
> > -  break;
> > +  bitmap_clear (dst);
> > +  return;
> 
> This is ok.
> 
> > +  /* Initialize with first predecessor's bitmap.  */
> > +  edge first_pred = EDGE_PRED (b, 0);
> > +  bitmap_copy (dst, src[first_pred->src->index]);
> > +
> > +  /* Union remaining predecessors' bitmaps.  */
> > +  for (unsigned ix = 1; ix < EDGE_COUNT (b->preds); ix++)
> > +{
> > +  edge e = EDGE_PRED (b, ix);
> > +  const sbitmap pred_src = src[e->src->index];
> > +
> > +  /* Perform bitmap OR operation element-wise.  */
> > +  for (unsigned i = 0; i < dst->size; i++)
> > +   dst->elms[i] |= pred_src->elms[i];
> > +}
> 
> To my taste this could be simplified further like
> 
>   FOR_EACH_EDGE (e, ei, b->preds)
> {
>   if (ei.index == 0)
>   {
> bitmap_copy (dst, src[e->src->index]);
> continue;
>   }
> 
>   bitmap_ior (dst, dst, src[e->src->index]);
> }
> 
> Does that work as well?

That's an excellent suggestion! I'll make the necessary adjustments and 
resubmit shortly. Thanks.

Best regards,
Jin Ma

> -- 
> Regards
>  Robin

Re: [PATCH v3 6/6] LoongArch: Add support for _BitInt [PR117599]

2025-06-27 Thread Yang Yujie

On Fri, Jun 27, 2025 at 10:00:00AM GMT, Jakub Jelinek wrote:
> On Fri, Jun 27, 2025 at 03:33:35PM +0800, Yang Yujie wrote:
> > @@ -11291,6 +11296,13 @@ expand_expr_real_1 (tree exp, rtx target, 
> > machine_mode tmode,
> >type = TREE_TYPE (exp);
> >mode = TYPE_MODE (type);
> >unsignedp = TYPE_UNSIGNED (type);
> > +  if (bitint_extended == -1 && TREE_CODE (type) == BITINT_TYPE)
> 
> Please swap the && operands, so
>   if (TREE_CODE (type) == BITINT_TYPE && bitint_extended == -1)
> because not being BITINT_TYPE is going to be far more common.
> 
> Ok for trunk with that nit fixed.
> 
>   Jakub

On Fri, Jun 27, 2025 at 10:02:50AM GMT, Jakub Jelinek wrote:
> Ok for trunk.  Though, a bitintext.h infrastructure test without atomics
> that FAILs without this change and passes with it would be really
> appreciated.
> 
>   Jakub

On Fri, Jun 27, 2025 at 10:09:34AM GMT, Jakub Jelinek wrote:
> Given the tf stuff here, shouldn't
> __fixtfbitint and __floatbitinttf be in the export list next to sf/df?
> 
>   Jakub

Ok, I will get these fixed.  Thanks you for the review.

Yujie

Re: [PATCH v4 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Tomasz Kaminski

On Fri, Jun 27, 2025 at 11:06 AM Luc Grosheintz 
wrote:

> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (default_accessor): New class.
> * src/c++23/std.cc.in: Register default_accessor.
> * testsuite/23_containers/mdspan/accessors/default.cc: New test.
> * testsuite/23_containers/mdspan/accessors/default_neg.cc: New
> test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan   | 31 
>  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
>  .../23_containers/mdspan/accessors/default.cc | 72 +++
>  .../mdspan/accessors/default_neg.cc   | 23 ++
>  4 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 6dc2441f80b..c72a64094b7 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>[[no_unique_address]] _S_strides_t _M_strides;
>  };
>
> +  template
> +struct default_accessor
> +{
> +  static_assert(!is_array_v<_ElementType>,
> +   "ElementType must not be an array type");
> +  static_assert(!is_abstract_v<_ElementType>,
> +   "ElementType must not be an abstract class type");
> +
> +  using offset_policy = default_accessor;
> +  using element_type = _ElementType;
> +  using reference = element_type&;
> +  using data_handle_type = element_type*;
> +
> +  constexpr
> +  default_accessor() noexcept = default;
> +
> +  template
> +   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
> +   constexpr
> +   default_accessor(default_accessor<_OElementType>) noexcept
> +   { }
> +
> +  constexpr reference
> +  access(data_handle_type __p, size_t __i) const noexcept
> +  { return __p[__i]; }
> +
> +  constexpr data_handle_type
> +  offset(data_handle_type __p, size_t __i) const noexcept
> +  { return __p + __i; }
> +};
> +
>  _GLIBCXX_END_NAMESPACE_VERSION
>  }
>  #endif
> diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
> std.cc.in
> index 9336118f5d9..e692caaa5f9 100644
> --- a/libstdc++-v3/src/c++23/std.cc.in
> +++ b/libstdc++-v3/src/c++23/std.cc.in
> @@ -1850,7 +1850,8 @@ export namespace std
>using std::layout_left;
>using std::layout_right;
>using std::layout_stride;
> -  // FIXME layout_left_padded, layout_right_padded, default_accessor and
> mdspan
> +  using std::default_accessor;
> +  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and
> mdspan
>  }
>  #endif
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> new file mode 100644
> index 000..ecccda2b68e
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> @@ -0,0 +1,72 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +
> +template
> +  constexpr void
> +  test_accessor_policy()
> +  {
> +static_assert(std::copyable);
> +static_assert(std::is_nothrow_move_constructible_v);
> +static_assert(std::is_nothrow_move_assignable_v);
> +static_assert(std::is_nothrow_swappable_v);
> +  }
> +
> +constexpr bool
> +test_access()
> +{
> +  std::default_accessor accessor;
> +  std::array a{10, 11, 12, 13, 14};
> +  VERIFY(accessor.access(a.data(), 0) == 10);
> +  VERIFY(accessor.access(a.data(), 4) == 14);
> +  return true;
> +}
> +
> +constexpr bool
> +test_offset()
> +{
> +  std::default_accessor accessor;
> +  std::array a{10, 11, 12, 13, 14};
> +  VERIFY(accessor.offset(a.data(), 0) == a.data());
> +  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
> +  return true;
> +}
> +
> +class Base
> +{ };
> +
> +class Derived : public Base
> +{ };
> +
> +constexpr void
> +test_ctor()
> +{
> +
> static_assert(std::is_nothrow_constructible_v,
> +
>  std::default_accessor>);
>
Hi, sorry for being unclear before, and resulting in another patch.
I would like to see a positive test case that cost-adjustment are allowed,
i.e.:
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
And similar for Derived. This is important, as it allows passing mdspan
to function accepting mdspan.

> +  static_assert(std::is_convertible_v,
> + std::default_accessor>);
> +  static_assert(!std::is_constructible_v,
> +std::default_accessor>);
> +  static_assert(!std::is_constructible_v,
> +std::default_accessor int>>);
> +  static_assert(!std::is_constructible_v,
> +

[PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li

From: Pan Li 

This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR.  The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vssubu.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vssubu.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vssubu.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (4):
  RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 
0, 2 and 15
  RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 
0, 1 and 2
  RISC-V: Reconcile the existing test due to cost model change

 gcc/config/riscv/riscv-v.cc   |   1 +
 gcc/config/riscv/riscv.cc |   1 +
 gcc/config/riscv/vector-iterators.md  |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u16.c   |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u32.c   |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u8.c|   2 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  18 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u8.c |  17 ++
 36 files changed, 323 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c

-- 
2.43.0

[PATCH v2 1/7] RISC-V: Add basic XAndes vendor extension support.

2025-06-27 Thread KuanLin Chen

Hi Jeff,

> Just a nit.  In several places you need to replace
> "UPPERCAE_NAME" with "UPPERCASE_NAME".

Fixed. Thanks for your review.

This is a patch series for Andes vender extension of RISC-V.
These patches are tested by riscv-gnu-toolchain gcc/g++ testsuite. And the
report is the same as without these patches.
   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64gc/  lp64d/ medlow |   26 /13 |   14 / 6 |  - |

This patch add basic support for the following XAndes ISA extensions:

XANDESPERF
XANDESBFHCVT
XANDESVBFHCVT
XANDESVSINTLOAD
XANDESVPACKFPH
XANDESVDOT

gcc/ChangeLog:

* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
(XANDESPERF) : New mask.
(XANDESBFHCVT): Ditto.
(XANDESVBFHCVT): Ditto.
(XANDESVSINTLOAD): Ditto.
(XANDESVPACKFPH): Ditto.
(XANDESVDOT): Ditto.
* config/riscv/t-riscv: Add riscv-ext-andes.def.
* doc/riscv-ext.texi: Regenerated.
* config/riscv/riscv-ext-andes.def: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandes-predef-1.c: New test.
* gcc.target/riscv/xandes-predef-2.c: New test.
* gcc.target/riscv/xandes-predef-3.c: New test.
* gcc.target/riscv/xandes-predef-4.c: New test.
* gcc.target/riscv/xandes-predef-5.c: New test.
* gcc.target/riscv/xandes-predef-6.c: New test.
From 8643c4f10187b7ddd166bde3626330467146ac5c Mon Sep 17 00:00:00 2001
From: Kuan-Lin Chen 
Date: Mon, 31 Mar 2025 15:27:41 +0800
Subject: [PATCH 1/7] RISC-V: Add basic XAndes vendor extension support.

This patch add basic support for the following XAndes ISA extensions:

XANDESPERF
XANDESBFHCVT
XANDESVBFHCVT
XANDESVSINTLOAD
XANDESVPACKFPH
XANDESVDOT

gcc/ChangeLog:

	* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
	* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
	(XANDESPERF) : New mask.
	(XANDESBFHCVT): Ditto.
	(XANDESVBFHCVT): Ditto.
	(XANDESVSINTLOAD): Ditto.
	(XANDESVPACKFPH): Ditto.
	(XANDESVDOT): Ditto.
	* config/riscv/t-riscv: Add riscv-ext-andes.def.
	* doc/riscv-ext.texi: Regenerated.
	* config/riscv/riscv-ext-andes.def: New file.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xandes-predef-1.c: New test.
	* gcc.target/riscv/xandes-predef-2.c: New test.
	* gcc.target/riscv/xandes-predef-3.c: New test.
	* gcc.target/riscv/xandes-predef-4.c: New test.
	* gcc.target/riscv/xandes-predef-5.c: New test.
	* gcc.target/riscv/xandes-predef-6.c: New test.

Co-author: Lino Hsing-Yu Peng (linopeng@andestech.com),
	   Kai Kai-Yi Weng (kaiweng@andestech.com).
---
 gcc/config/riscv/riscv-ext-andes.def  | 100 ++
 gcc/config/riscv/riscv-ext.def|   1 +
 gcc/config/riscv/riscv-ext.opt|  15 +++
 gcc/config/riscv/t-riscv  |   3 +-
 gcc/doc/riscv-ext.texi|  24 +
 .../gcc.target/riscv/xandes-predef-1.c|  14 +++
 .../gcc.target/riscv/xandes-predef-2.c|  14 +++
 .../gcc.target/riscv/xandes-predef-3.c|  14 +++
 .../gcc.target/riscv/xandes-predef-4.c|  14 +++
 .../gcc.target/riscv/xandes-predef-5.c|  14 +++
 .../gcc.target/riscv/xandes-predef-6.c|  14 +++
 11 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-ext-andes.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-6.c

diff --git a/gcc/config/riscv/riscv-ext-andes.def b/gcc/config/riscv/riscv-ext-andes.def
new file mode 100644
index ..4226e3ed86fe
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-andes.def
@@ -0,0 +1,100 @@
+/* Andes extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder

[PATCH 4/7 v2] RISC-V: Add support for the XAndesvbfhcvt ISA extension.

2025-06-27 Thread KuanLin Chen

Hi,

This patch add support for XAndesvbfhcvt ISA extension.
This extension defines instructions to perform vector floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit
single-precision floating-point (SP) data in a vector register.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt.
* config.gcc: Add extra_objs andes-vector-builtins-bases.o
and extra_headers andes_vector.h.
* config/riscv/riscv-vector-builtins.cc
(f32_to_bf16_nf_w_ops): New operand information.
(f32_to_bf16_nf_w_ops): New operand information.
(DEF_RVV_FUNCTION): New def.
* config/riscv/riscv-vector-builtins.def (bf16_s): Ditto.
(s_bf16): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT.
(required_extensions_specified): Ditto.
* config/riscv/t-riscv: Add andes-vector-builtins-functions.def,
andes-vector-builtins-bases.h and andes-vector-builtins-bases.o.
* config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator.
(NDS_V_DOUBLE_TRUNC_BF): New attr.
* config/riscv/andes-vector-builtins-bases.cc: New file.
* config/riscv/andes-vector-builtins-bases.h: New file.
* config/riscv/andes-vector-builtins-functions.def: New file.
* config/riscv/andes_vector.h: New file.
* config/riscv/andes_vector.md: New file.
* config/riscv/vector.md: Include andes_vector.md.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xandesvector/nds_vfwcvt.c: New test.


0004-RISC-V-Add-support-for-the-XAndesvbfhcvt-ISA-extensi.patch
Description: Binary data

[PATCH v2 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li

From: Pan Li 

Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  18 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u8.c |  17 ++
 18 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
index 21a207edce7..b064748fc14 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
index d1063adb0d6..e334bb3690b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
index 3d96503fd9a..3e8ca0570cd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
index 339a35c3f42..1f995cd8dc1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /

[PATCH v2 5/5] libstdc++: Make mdspan nothrow movable.

2025-06-27 Thread Luc Grosheintz

If all members of mdspan are nothrow movable, then mdspan can also
be nothrow movable. The standard doesn't specify that mdspan must be
nothrow movable (when possible). Nothrow movable enables containers
to use move operations even if they have a strong exception guarantee.

This commit strenghtens the exception guarantees for mdspan, making
it nothrow movable if all it's members are nothrow movable.

libstdc++-v3/ChangeLog:

* include/std/mdspan (mdspan): Make nothrow movable, if all
members are nothrow movable.
* testsuite/23_containers/mdspan/mdspan.cc: Test nothrow movable
property.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   | 10 +++-
 .../testsuite/23_containers/mdspan/mdspan.cc  | 53 ++-
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 852f881971e..563aa312f8a 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1119,7 +1119,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   mdspan(const mdspan& __other) = default;
 
   constexpr
-  mdspan(mdspan&& __other) = default;
+  mdspan(mdspan&& __other)
+  noexcept(is_nothrow_move_constructible_v
+ && is_nothrow_move_constructible_v
+ && is_nothrow_move_constructible_v) = default;
 
   template<__mdspan::__valid_index_type... _OIndexTypes>
requires ((sizeof...(_OIndexTypes) == rank()
@@ -1197,7 +1200,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator=(const mdspan& __other) = default;
 
   constexpr mdspan&
-  operator=(mdspan&& __other) = default;
+  operator=(mdspan&& __other)
+  noexcept(is_nothrow_move_assignable_v
+ && is_nothrow_move_assignable_v
+ && is_nothrow_move_assignable_v) = default;
 
   template<__mdspan::__valid_index_type... _OIndexTypes>
requires (sizeof...(_OIndexTypes) == rank())
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc
index d2672878d69..6d8f32ff103 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc
@@ -310,11 +310,29 @@ test_from_int_like()
   verify(std::mdspan(storage.data(), shape_view));
 }
 
-template
+template
   class OpaqueAccessor
   {
 struct Handle
 {
+  constexpr
+  Handle(T * other)
+  : ptr(other)
+  { }
+
+  constexpr
+  Handle(const Handle&) noexcept(NothrowConstructible) = default;
+
+  constexpr
+  Handle(Handle&&) noexcept(NothrowConstructible) = default;
+
+  constexpr Handle&
+  operator=(const Handle&) noexcept(NothrowAssignable) = default;
+
+  constexpr Handle&
+  operator=(Handle&&) noexcept(NothrowAssignable) = default;
+
   T * ptr;
 };
 
@@ -489,6 +507,37 @@ test_swap()
   return true;
 }
 
+template
+constexpr void
+test_nothrow_movable()
+{
+  using Layout = std::layout_left;
+  using Extents = std::dextents;
+  using Accessor = OpaqueAccessor;
+  using Handle = Accessor::data_handle_type;
+  static_assert(std::is_nothrow_move_assignable_v);
+  static_assert(std::is_nothrow_move_constructible_v);
+  static_assert(std::is_nothrow_move_assignable_v == Assignable);
+  static_assert(std::is_nothrow_move_constructible_v == Constructible);
+
+  using MDSpan = std::mdspan;
+  static_assert(std::is_nothrow_move_assignable_v == Assignable);
+  static_assert(std::is_nothrow_move_constructible_v == Constructible);
+}
+
+constexpr void
+test_nothrow_movable_all()
+{
+  using MDSpan = std::mdspan>;
+  static_assert(std::is_nothrow_move_assignable_v);
+  static_assert(std::is_nothrow_move_constructible_v);
+
+  test_nothrow_movable();
+  test_nothrow_movable();
+  test_nothrow_movable();
+  test_nothrow_movable();
+}
+
 int
 main()
 {
@@ -536,5 +585,7 @@ main()
 
   test_swap();
   static_assert(test_swap());
+
+  test_nothrow_movable_all();
   return 0;
 }
-- 
2.49.0

Re: [PATCH v4 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 10:39, Tomasz Kaminski  wrote:
>
> Also, for single patch (not-patch series), you do not need to have [PATCH 
> 0/N], simple [PATCH] and then [PATCH v2] also works.

Yeah, sending a 0/N cover letter is only useful to describe what a
multi-part patch series does.  For a single patch, you should be
describing what it does in that patch itself, and a cover letter just
adds noise.


>
> On Fri, Jun 27, 2025 at 11:11 AM Tomasz Kaminski  wrote:
>>
>>
>>
>> On Fri, Jun 27, 2025 at 11:06 AM Luc Grosheintz  
>> wrote:
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>> * include/std/mdspan (default_accessor): New class.
>>> * src/c++23/std.cc.in: Register default_accessor.
>>> * testsuite/23_containers/mdspan/accessors/default.cc: New test.
>>> * testsuite/23_containers/mdspan/accessors/default_neg.cc: New test.
>>>
>>> Signed-off-by: Luc Grosheintz 
>>> ---
>>>  libstdc++-v3/include/std/mdspan   | 31 
>>>  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
>>>  .../23_containers/mdspan/accessors/default.cc | 72 +++
>>>  .../mdspan/accessors/default_neg.cc   | 23 ++
>>>  4 files changed, 128 insertions(+), 1 deletion(-)
>>>  create mode 100644 
>>> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>>>  create mode 100644 
>>> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
>>>
>>> diff --git a/libstdc++-v3/include/std/mdspan 
>>> b/libstdc++-v3/include/std/mdspan
>>> index 6dc2441f80b..c72a64094b7 100644
>>> --- a/libstdc++-v3/include/std/mdspan
>>> +++ b/libstdc++-v3/include/std/mdspan
>>> @@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>[[no_unique_address]] _S_strides_t _M_strides;
>>>  };
>>>
>>> +  template
>>> +struct default_accessor
>>> +{
>>> +  static_assert(!is_array_v<_ElementType>,
>>> +   "ElementType must not be an array type");
>>> +  static_assert(!is_abstract_v<_ElementType>,
>>> +   "ElementType must not be an abstract class type");
>>> +
>>> +  using offset_policy = default_accessor;
>>> +  using element_type = _ElementType;
>>> +  using reference = element_type&;
>>> +  using data_handle_type = element_type*;
>>> +
>>> +  constexpr
>>> +  default_accessor() noexcept = default;
>>> +
>>> +  template
>>> +   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
>>> +   constexpr
>>> +   default_accessor(default_accessor<_OElementType>) noexcept
>>> +   { }
>>> +
>>> +  constexpr reference
>>> +  access(data_handle_type __p, size_t __i) const noexcept
>>> +  { return __p[__i]; }
>>> +
>>> +  constexpr data_handle_type
>>> +  offset(data_handle_type __p, size_t __i) const noexcept
>>> +  { return __p + __i; }
>>> +};
>>> +
>>>  _GLIBCXX_END_NAMESPACE_VERSION
>>>  }
>>>  #endif
>>> diff --git a/libstdc++-v3/src/c++23/std.cc.in 
>>> b/libstdc++-v3/src/c++23/std.cc.in
>>> index 9336118f5d9..e692caaa5f9 100644
>>> --- a/libstdc++-v3/src/c++23/std.cc.in
>>> +++ b/libstdc++-v3/src/c++23/std.cc.in
>>> @@ -1850,7 +1850,8 @@ export namespace std
>>>using std::layout_left;
>>>using std::layout_right;
>>>using std::layout_stride;
>>> -  // FIXME layout_left_padded, layout_right_padded, default_accessor and 
>>> mdspan
>>> +  using std::default_accessor;
>>> +  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and 
>>> mdspan
>>>  }
>>>  #endif
>>>
>>> diff --git 
>>> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc 
>>> b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>>> new file mode 100644
>>> index 000..ecccda2b68e
>>> --- /dev/null
>>> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>>> @@ -0,0 +1,72 @@
>>> +// { dg-do run { target c++23 } }
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +constexpr size_t dyn = std::dynamic_extent;
>>> +
>>> +template
>>> +  constexpr void
>>> +  test_accessor_policy()
>>> +  {
>>> +static_assert(std::copyable);
>>> +static_assert(std::is_nothrow_move_constructible_v);
>>> +static_assert(std::is_nothrow_move_assignable_v);
>>> +static_assert(std::is_nothrow_swappable_v);
>>> +  }
>>> +
>>> +constexpr bool
>>> +test_access()
>>> +{
>>> +  std::default_accessor accessor;
>>> +  std::array a{10, 11, 12, 13, 14};
>>> +  VERIFY(accessor.access(a.data(), 0) == 10);
>>> +  VERIFY(accessor.access(a.data(), 4) == 14);
>>> +  return true;
>>> +}
>>> +
>>> +constexpr bool
>>> +test_offset()
>>> +{
>>> +  std::default_accessor accessor;
>>> +  std::array a{10, 11, 12, 13, 14};
>>> +  VERIFY(accessor.offset(a.data(), 0) == a.data());
>>> +  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
>>> +  return true;
>>> +}
>>> +
>>> +class Base
>>> +{ };
>>> +
>>> +class Derived : public Base
>>> +{ };
>>> +
>>> +constexpr void
>>> +test_ctor()
>>> +{
>>> +  
>>> static_assert(std::is_nothrow_constructible_v,
>>> +

[PATCH v2 4/5] libstdc++: Implement mdspan and tests.

2025-06-27 Thread Luc Grosheintz

Implements the class mdspan as described in N4950, i.e. without P3029.
It also adds tests for mdspan.

libstdc++-v3/ChangeLog:

* include/std/mdspan (mdspan): New class.
* src/c++23/std.cc.in: Add std::mdspan.
* testsuite/23_containers/mdspan/class_mandate_neg.cc: New test.
* testsuite/23_containers/mdspan/mdspan.cc: New test.
* testsuite/23_containers/mdspan/layout_like.h: Add class
LayoutLike which models a user-defined layout.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   | 282 +
 libstdc++-v3/src/c++23/std.cc.in  |   3 +-
 .../23_containers/mdspan/class_mandate_neg.cc |  58 ++
 .../23_containers/mdspan/layout_like.h|  63 ++
 .../testsuite/23_containers/mdspan/mdspan.cc  | 540 ++
 5 files changed, 945 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layout_like.h
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index e198d65bba3..852f881971e 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1052,6 +1052,288 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __p + __i; }
 };
 
+  namespace __mdspan
+  {
+template
+  constexpr bool
+  __is_multi_index(const _Extents& __exts, span<_IndexType, _Nm> __indices)
+  {
+   static_assert(__exts.rank() == _Nm);
+   for (size_t __i = 0; __i < __exts.rank(); ++__i)
+ if (__indices[__i] >= __exts.extent(__i))
+   return false;
+   return true;
+  }
+  }
+
+  template>
+class mdspan
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+  static_assert(__mdspan::__is_extents<_Extents>,
+   "Extents must be a specialization of std::extents");
+  static_assert(is_same_v<_ElementType,
+ typename _AccessorPolicy::element_type>);
+
+public:
+  using extents_type = _Extents;
+  using layout_type = _LayoutPolicy;
+  using accessor_type = _AccessorPolicy;
+  using mapping_type = typename layout_type::template 
mapping;
+  using element_type = _ElementType;
+  using value_type = remove_cv_t;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using data_handle_type = typename accessor_type::data_handle_type;
+  using reference = typename accessor_type::reference;
+
+  static constexpr rank_type
+  rank() noexcept { return extents_type::rank(); }
+
+  static constexpr rank_type
+  rank_dynamic() noexcept { return extents_type::rank_dynamic(); }
+
+  static constexpr size_t
+  static_extent(rank_type __r) noexcept
+  { return extents_type::static_extent(__r); }
+
+  constexpr index_type
+  extent(rank_type __r) const noexcept { return extents().extent(__r); }
+
+  constexpr
+  mdspan()
+  requires (rank_dynamic() > 0 &&
+ is_default_constructible_v &&
+ is_default_constructible_v &&
+ is_default_constructible_v)
+  : _M_accessor{}, _M_mapping{}, _M_handle{}
+  { }
+
+  constexpr
+  mdspan(const mdspan& __other) = default;
+
+  constexpr
+  mdspan(mdspan&& __other) = default;
+
+  template<__mdspan::__valid_index_type... _OIndexTypes>
+   requires ((sizeof...(_OIndexTypes) == rank()
+  || sizeof...(_OIndexTypes) == rank_dynamic())
+   && is_constructible_v
+   && is_default_constructible_v)
+   constexpr explicit
+   mdspan(data_handle_type __handle, _OIndexTypes... __exts)
+   : _M_accessor{},
+ _M_mapping(_Extents(static_cast(std::move(__exts))...)),
+ _M_handle(std::move(__handle))
+   { }
+
+  template<__mdspan::__valid_index_type _OIndexType,
+  size_t _Nm>
+   requires ((_Nm == rank() || _Nm == rank_dynamic())
+  && is_constructible_v
+  && is_default_constructible_v)
+   constexpr explicit(_Nm != rank_dynamic())
+   mdspan(data_handle_type __handle, span<_OIndexType, _Nm> __exts)
+   : _M_accessor{}, _M_mapping(extents_type(__exts)),
+ _M_handle(std::move(__handle))
+   { }
+
+  template<__mdspan::__valid_index_type _OIndexType,
+  size_t _Nm>
+   requires ((_Nm == rank() || _Nm == rank_dynamic())
+  && is_constructible_v
+  && is_default_constructible_v)
+   constexpr explicit(_Nm != rank_dynamic())
+   mdspan(data_handle_type __handle, const array<_OIndexType, _Nm

Re: [to-be-committed][RISC-V][PR target/119971] Avoid losing shift count masking

2025-06-27 Thread Jeff Law




On 5/5/25 11:56 PM, Bernhard Reutner-Fischer wrote:

On 5 May 2025 20:42:34 CEST, Jeff Law  wrote:


diff --git a/gcc/testsuite/gcc.target/riscv/pr119971.c 
b/gcc/testsuite/gcc.target/riscv/pr119971.c
new file mode 100644
index 000..c3f23b05ec3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr119971.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-options "-march=rv64gcb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-Oz" "-Os" } } */
+


typo s/"-g"/"-Og"/ maybe?

Yes.  Fixed in the obvious way.

Jeffcommit de6124c9e5ed472f567b51fa76f18335cdddbbaf
Author: Jeff Law 
Date:   Fri Jun 27 07:00:15 2025 -0600

[RISC-V][PR target/119971] Avoid losing shift count masking

Fix typo spotted by Bernhard Reutner-Fischer.

PR target/119971

gcc/testsuite/
* gcc.target/riscv/pr119971.c: Fix typo.

diff --git a/gcc/testsuite/gcc.target/riscv/pr119971.c 
b/gcc/testsuite/gcc.target/riscv/pr119971.c
index c3f23b05ec3..0d73d4ca3f1 100644
--- a/gcc/testsuite/gcc.target/riscv/pr119971.c
+++ b/gcc/testsuite/gcc.target/riscv/pr119971.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target rv64 } } */
 /* { dg-options "-march=rv64gcb -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-Oz" "-Os" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Oz" "-Os" } } */
 
 __attribute__ ((noipa)) unsigned
 foo (unsigned b, unsigned e, unsigned i)

[PATCH v2] vect: Misalign checks for gather/scatter.

2025-06-27 Thread Robin Dapp


Hi,

Changes from v1:
- Add gather_scatter argument to support_vector_misalignment.
- Don't rely on DR_BASE_ALIGNMENT.
- Add IFN helpers and use them.
- Add gather/scatter helper macros.
- Clarify is_packed handling in docs.

This patch adds simple misalignment checks for gather/scatter
operations.  Previously, we assumed that those perform element accesses
internally so alignment does not matter.  The riscv vector spec however
explicitly states that vector operations are allowed to fault on
element-misaligned accesses.  Reasonable uarchs won't, but...

For gather/scatter we have two paths in the vectorizer:

(1) Regular analysis based on datarefs.  Here we can also create
strided loads.
(2) Non-affine access where each gather index is relative to the
initial address.

The assumption this patch works off is that once the alignment for the
first scalar is correct, all others will fall in line, as the index is
always a multiple of the first element's size.

For (1) we have a dataref and can check it for alignment as in other
cases.  For (2) this patch checks the object alignment of BASE and
compares it against the natural alignment of the current vectype's unit.

The patch also adds a pointer argument to the gather/scatter IFNs that
contains the necessary alignment.  Most of the patch is thus mechanical
in that it merely adjusts indices.

I tested the riscv version with a custom qemu version that faults on
element-misaligned vector accesses.  With this patch applied, there is
just a single fault left, which is due to PR120782 and which will be
addressed separately.

Bootstrapped and regtested on x86 and aarch64 and powerpc.
Regtested on rv64gcv_zvl512b with and without unaligned vector support.

Regards
Robin


gcc/ChangeLog:

* config/aarch64/aarch64.cc 
(aarch64_builtin_support_vector_misalignment):
Return true for gather_scatter.
* config/arm/arm.cc (arm_builtin_support_vector_misalignment):
Ditto.
* config/epiphany/epiphany.cc (epiphany_support_vector_misalignment):
Ditto.
* config/gcn/gcn.cc (gcn_vectorize_support_vector_misalignment):
Ditto.
* config/loongarch/loongarch.cc 
(loongarch_builtin_support_vector_misalignment):
Ditto.
* config/riscv/riscv.cc (riscv_support_vector_misalignment):
Always support known aligned types.
* config/rs6000/rs6000.cc (rs6000_builtin_support_vector_misalignment):
Ditto.
* config/s390/s390.cc (s390_support_vector_misalignment):
Ditto.
* internal-fn.cc (expand_scatter_store_optab_fn): Change
argument numbers.
(expand_gather_load_optab_fn): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_else_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_fn_alias_ptr_index): New helper.
(internal_fn_offset_index): Ditto.
(internal_fn_scale_index): Ditto.
(internal_gather_scatter_fn_supported_p): Ditto.
* internal-fn.h (internal_fn_offset_index): Declare.
(internal_fn_scale_index): Ditto.
(internal_fn_alias_ptr_index): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Ditto.
* target.def: Add gather_scatter argument and adjust docs.
* doc/tm.texi: Ditto.
* targhooks.cc (default_builtin_support_vector_misalignment):
Add gather_scatter argument.
* targhooks.h (default_builtin_support_vector_misalignment):
Ditto.
* tree-vect-data-refs.cc (vect_describe_gather_scatter_call):
Handle alias_ptr.
(vect_check_gather_scatter): Compute and set alias_ptr.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern):
Ditto.
* tree-vect-slp.cc (GATHER_SCATTER_OFFSET): Define.
(vect_get_and_check_slp_defs): Use define.
* tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
Set alias_ptr.
(get_group_load_store_type): Do not special-case gather/scatter.
(get_load_store_type): Compute misalignment.
(vectorizable_store): Remove alignment assert for
scatter/gather.
(vectorizable_load): Ditto.
* tree-vectorizer.h (struct gather_scatter_info): Add alias_ptr.
(GATHER_SCATTER_LEGACY_P): Define.
(GATHER_SCATTER_IFN_P): Ditto.
(GATHER_SCATTER_UNSUPPORTED_P): Ditto.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix riscv misalign supported check.
---
gcc/config/aarch64/aarch64.cc |  12 ++-
gcc/config/arm/arm.cc |  11 ++-
gcc/config/epiphany/epiphany.cc   |   8 +-
gcc/config/gcn/gcn.cc |   5 +-
gcc/config/loongarch/loongarch.cc |   8 +-
gcc/config/riscv/riscv.cc |  29 +--
gcc/config/rs6000/rs6000.cc   |   6 +-
gcc/config/s390/s390.cc   |   6 +-
gcc/doc/tm.texi   |   8 +-
gcc/internal-fn.cc

[patch,avr] Turn on LRA per default

2025-06-27 Thread Georg-Johann Lay


This turns on -mlra per default on avr.

Ok for trunk?

Johann

--

AVR: target/113934 - Use LRA per default.

Now that the patches for PR120424 are upstream, the last known bug
associated with avr+lra has been fixed: PR118591.  So we can pull the
switch that turns on LRA per default.

This patch only sets -mlra per default.  It doesn't do any Reload related
cleanup or removal from the avr backend, hence -mno-lra still works.

The only new problem is that gcc.dg/torture/pr64088.c fails with LRA
but not with Reload.  Though that test case is awkward since it is UB
but expects the compiler to behave in a specific way which avr-gcc
doesn't do: PR116780.

This patch also avoids a relative recent ICE that breaks building libgcc:
R24:DI is allowed per hard_regno_mode_ok, but R26:SI is disallowed
for Reload for old reasons.  Outcome is that a split2 pattern for
R24:DI = zero_extend:DI (R22:SI) runs into an ICE.

AVR-LibC builds fine with this patch.
The AVR-LibC testsuite passes without errors.

gcc/
 PR target/113934
* config/avr/avr.opt (-mlra): Turn on per default.

diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index fcd2bf68f2a..988311927bd 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -19,8 +19,8 @@
 ; .

 mlra
-Target Var(avropt_lra_p) UInteger Init(0) Optimization Undocumented
-Usa LRA for reload instead of the old reload framework.  This option is 
experimental, and it may be removed in future versions of the compiler.

+Target Var(avropt_lra_p) UInteger Init(1) Optimization Undocumented
+Usa LRA for reload instead of the old reload framework.  This option is 
experimental, on per default, and it may be removed in future versions 
of the compiler.


 mcall-prologues
 Target Mask(CALL_PROLOGUES) Optimization

[RFC] libstdc++: Provide meaning to precision for duration in chrono-spec.

2025-06-27 Thread Tomasz Kamiński

The standard does not currently specify how the precision value
is interpreted if specify, only prohibit it from being used for
formatting any other object than durations with floating point types.

This patch interprets user-specified duration value as follows:
 * if spec is empty for floating-point duration, the ostringstream
   is configured with precision value
 * for "%Q" is used for floating-point duration, the duration units
   are formated with format string that includes precision
 * for "%S" the precision controls the number of decimal digits
   for subseconds that is printed.

With support in "%S" setting precision makes sense also for time-points.
This patch takes simple approach of always allowing precision to be specified,
and the value is ignored if no specifier are affected. We could also limit
if to situations when _Subseconds or _EpochUnits for floating point rep
are requested.

Finally, for integral durations, the precision is trimmed to 18 digits,
this can be adjusted, if we decide to go with direction of this patch.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__foramtter_chrono::_M_parse):
Parse precision value into _M_spec.
(__formatter_chrono::_M_Q): Include precision in format string,
if specified.
(__formatter_duration::_M_format_to_ostream): Configure precision
if provided.
* testsuite/std/time/format/format.cc: Precision is now allowed.
* testsuite/std/time/format/precision.cc: Adjusted tests.
---
This patch is not meant to be landed now, it provides implementation 
exprience for one of the direction of resolving meaning of precision.

 libstdc++-v3/include/bits/chrono_io.h |  25 ++--
 .../testsuite/std/time/format/format.cc   |   2 +-
 .../testsuite/std/time/format/precision.cc| 113 +-
 3 files changed, 92 insertions(+), 48 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index d6bc6c7cf2a..a3264932024 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -594,19 +594,9 @@ namespace __format
  if (__finished())
return __first;
 
- if (*__first == '.')
-   {
- if ((__parts & _ChronoParts::_EpochUnits) == 0
- || !__spec._M_floating_point_rep)
-   __throw_format_error("format error: invalid precision for 
duration");
-
- // Precision is allowed, but value is ignored.
- __first = _Spec<_CharT>()._M_parse_precision(__first, __last, 
__pc);
- // Still inditate that there was user supplied precision.
- __spec._M_prec_kind = _WP_value;
-if (__finished())
-  return __first;
-   }
+ __first = __spec._M_parse_precision(__first, __last, __pc);
+ if (__finished())
+   return __first;
 
  __spec._M_localized = false;
  __first = __spec._M_parse_locale(__first, __last);
@@ -1499,7 +1489,12 @@ namespace __format
 _FormatContext& __ctx) const
{
  // %Q The duration's numeric value.
- return std::vformat_to(std::move(__out), _S_empty_spec, __t._M_ereps);
+
+ __string_view __fs = _S_empty_spec;
+ if (_M_spec._M_floating_point_rep 
+   && _M_spec._M_prec_kind != _WP_none)
+   __fs = _GLIBCXX_WIDEN("{0:.{2}}");
+ return std::vformat_to(std::move(__out), __fs, __t._M_ereps);
}
 
   template
@@ -1903,6 +1898,8 @@ namespace __format
 
   if (__is_neg) [[unlikely]]
 __os << this->_S_plus_minus[1];
+  if (_M_spec._M_prec_kind != _WP_none)
+__os.precision(_M_spec._M_get_precision(__fc));
   __os << __d;
 
  auto __str = std::move(__os).str();
diff --git a/libstdc++-v3/testsuite/std/time/format/format.cc 
b/libstdc++-v3/testsuite/std/time/format/format.cc
index d6e35832cb5..474b1b5a938 100644
--- a/libstdc++-v3/testsuite/std/time/format/format.cc
+++ b/libstdc++-v3/testsuite/std/time/format/format.cc
@@ -56,7 +56,7 @@ test_bad_format_strings()
   VERIFY( not is_format_string_for("{:04%T}", t) );
 
   // precision only valid for chrono::duration types with floating-point rep.
-  VERIFY( not is_format_string_for("{:.4}", t) );
+  // VERIFY( not is_format_string_for("{:.4}", t) );
 
   // unfinished format string
   VERIFY( not is_format_string_for("{:", t) );
diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc 
b/libstdc++-v3/testsuite/std/time/format/precision.cc
index aa266156c1f..46a774f70f4 100644
--- a/libstdc++-v3/testsuite/std/time/format/precision.cc
+++ b/libstdc++-v3/testsuite/std/time/format/precision.cc
@@ -19,26 +19,25 @@ test_empty()
   res = std::format(WIDEN("{:}"), d);
   VERIFY( res == WIDEN("33.1112s") );
   res = std::format(WIDEN("{:.0}"), d);
-  VERIFY( res == WIDEN("33.1112s") );
+  VERIFY( res == WIDEN("3e+01s") );
   res = std::fo

[commmited v2] libstdc++: Fix warnings introduced by type-erasing for chrono commits [PR110739]

2025-06-27 Thread Tomasz Kamiński

The r16-1709-g4b3cefed1a08344495fedec4982d85168bd8173f caused `-Woverflow`
in empty_spec.cc file. This warning is not cause by any issue in shipping
code, and results in taking to much shortcut when implementing a test-only
custom representation type Rep, where long was always used to store a value.
In particular common type for Rep and long long int, was de-facto long.
This is addressed by adding Under template parameter, that controls the type
of stored value, and handling it properly in common_type specializations.
No changes to shipping code are necessary.

Secondly, extracting _M_locale_fmt calls in r16-1712-gcaac94, resulted in __ctx
format parameter no longer being used. This patch removes such parameter
entirely, and replace _FormatContext template parameter, with _OutIter parameter
for __out. For consistency type of the __out is decoupled from _FormatContext,
for functions that still need context:
 * to extract locale (_M_A_a, _M_B_b, _M_c, _M_p, _M_r, _M_subsecs)
 * perform formatting for duration/subseconds (_M_Q, _M_T, _M_S, _M_subsecs)

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Rename _Out to _OutIter for consistency, and update calls
to specifier functions.
(__formatter_chrono::_M_wi, __formatter_chrono::_M_C_y_Y)
(__formatter_chrono::_M_D_x, __formatter_chrono::_M_d_e)
(__formatter_chrono::_M_F, __formatter_chrono::_M_g_G)
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_j)
(__formatter_chrono::_M_m, __formatter_chrono::_M_M)
(__formatter_chrono::_M_q, __formatter_chrono::_M_R_X)
(__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W)
(__formatter_chrono::_M_z, __formatter_chrono::_M_z):
Remove _FormatContext parameter, and  introduce _OutIter
for __out type.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_B_b)
(__formatter_chrono::_M_p, __formatter_chrono::_M_Q)
(__formatter_chrono::_M_r, __formatter_chrono::_M_S)
(__formatter_chrono::_M_subsecs, __formatter_chrono::_M_T):
Introduce separate _OutIter template parameter for __out.
(__formatter_chrono::_M_c, __formatter_chrono::_M_T):
Likewise, and adjust calls to specifiers functions.
* testsuite/std/time/format/empty_spec.cc: Make underlying
type for Rep configurable.
---
Tested on x86_64-linux (full-suite), I have found any overflow warnings,
neither warnings produced from chrono_io.h

Pushed to trunk.

 libstdc++-v3/include/bits/chrono_io.h | 221 --
 .../testsuite/std/time/format/empty_spec.cc   |  43 ++--
 2 files changed, 129 insertions(+), 135 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index bcfd51b9866..0ffbf06a7ff 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -942,9 +942,9 @@ namespace __format
  return __out;
}
 
-  template
-   _Out
-   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
+  template
+   _OutIter
+   _M_format_to(const _ChronoData<_CharT>& __t, _OutIter __out,
 _FormatContext& __fc) const
{
  auto __first = _M_spec._M_chrono_specs.begin();
@@ -1005,7 +1005,7 @@ namespace __format
{
// %\0 is extension for handling weekday index
case '\0':
- __out = _M_wi(__t._M_weekday_index, std::move(__out), __fc);
+ __out = _M_wi(__t._M_weekday_index, std::move(__out));
  break;
case 'a':
case 'A':
@@ -1022,41 +1022,41 @@ namespace __format
case 'C':
case 'y':
case 'Y':
- __out = _M_C_y_Y(__t._M_year, std::move(__out), __fc, __c);
+ __out = _M_C_y_Y(__t._M_year, std::move(__out), __c);
  break;
case 'd':
case 'e':
- __out = _M_d_e(__t._M_day, std::move(__out), __fc, __c);
+ __out = _M_d_e(__t._M_day, std::move(__out), __c);
  break;
case 'D':
case 'x':
- __out = _M_D_x(__t, std::move(__out), __fc);
+ __out = _M_D_x(__t, std::move(__out));
  break;
case 'F':
- __out = _M_F(__t, std::move(__out), __fc);
+ __out = _M_F(__t, std::move(__out));
  break;
case 'g':
case 'G':
- __out = _M_g_G(__t, std::move(__out), __fc, __c == 'G');
+ __out = _M_g_G(__t, std::move(__out), __c == 'G');
  break;
case 'H':
case 'I':
- __out = _M_H_I(__t._M_hours, __print_sign(), __fc, __c);
+ __out = _M_

Re: [PATCH 5/8] libstdc++: Directly implement ranges::stable_partition [PR100795]

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 15:26, Patrick Palka  wrote:
>
> On Fri, 27 Jun 2025, Jonathan Wakely wrote:
>
> > On 26/06/25 22:25 -0400, Patrick Palka wrote:
> > > PR libstdc++/100795
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/ranges_algo.h (__detail::__find_if_not_n): New,
> > > based on the stl_algo.h implementation.
> > > (__detail::__stable_partition_adaptive): Likewise.
> > > (__stable_partition_fn::operator()): Reimplement in terms of
> > > the above.
> > > * testsuite/25_algorithms/stable_partition/constrained.cc
> > > (test03): New test.
> > > ---
> > > libstdc++-v3/include/bits/ranges_algo.h   | 106 +-
> > > .../stable_partition/constrained.cc   |  26 +
> > > 2 files changed, 127 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > index 7dfd4e7ed64c..a9924cd9c49e 100644
> > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > @@ -3133,6 +3133,81 @@ namespace ranges
> > >   inline constexpr __partition_fn partition{};
> > >
> > > #if _GLIBCXX_HOSTED
> > > +  namespace __detail
> > > +  {
> > > +/// Like find_if_not(), but uses and updates a count of the
> > > +/// remaining range length instead of comparing against an end
> > > +/// iterator.
> > > +template
> > > +  constexpr _Iter
> > > +  __find_if_not_n(_Iter __first, _Distance& __len, _Pred __pred)
> > > +  {
> > > +   for (; __len; --__len,  (void) ++__first)
> > > + if (!__pred(*__first))
> > > +   break;
> > > +   return __first;
> > > +  }
> > > +
> > > +template > > +typename _Pred, typename _Distance>
> > > +  _GLIBCXX26_CONSTEXPR
> > > +  subrange<_Iter>
> > > +  __stable_partition_adaptive(_Iter __first, _Sent __last,
> > > + _Pred __pred, _Distance __len,
> > > + _Pointer __buffer,
> > > + _Distance __buffer_size)
> > > +  {
> > > +   if (__len == 1)
> > > + return {__first, ranges::next(__first, 1)};
> > > +
> > > +   if (__len <= __buffer_size)
> > > + {
> > > +   _Iter __result1 = __first;
> > > +   _Pointer __result2 = __buffer;
> > > +
> > > +   // The precondition guarantees that !__pred(__first), so
> > > +   // move that element to the buffer before starting the loop.
> > > +   // This ensures that we only call __pred once per element.
> > > +   *__result2 = ranges::iter_move(__first);
> > > +   ++__result2;
> > > +   ++__first;
> > > +   for (; __first != __last; ++__first)
> > > + if (__pred(*__first))
> > > +   {
> > > + *__result1 = ranges::iter_move(__first);
> > > + ++__result1;
> > > +   }
> > > + else
> > > +   {
> > > + *__result2 = ranges::iter_move(__first);
> > > + ++__result2;
> > > +   }
> > > +
> > > +   ranges::move(__buffer, __result2, __result1);
> > > +   return {__result1, __first};
> > > + }
> > > +
> > > +   _Iter __middle = __first;
> > > +   ranges::advance(__middle, __len / 2);
> > > +   _Iter __left_split
> > > + = __detail::__stable_partition_adaptive(__first, __middle, __pred,
> > > + __len / 2, __buffer,
> > > + __buffer_size).begin();
> > > +
> > > +   // Advance past true-predicate values to satisfy this
> > > +   // function's preconditions.
> > > +   _Distance __right_len = __len - __len / 2;
> > > +   _Iter __right_split = __detail::__find_if_not_n(__middle, __right_len,
> > > __pred);
> > > +
> > > +   if (__right_len)
> > > + __right_split
> > > +   = __detail::__stable_partition_adaptive(__right_split, __last,
> > > __pred,
> > > +   __right_len, __buffer,
> > > __buffer_size).begin();
> > > +
> > > +   return ranges::rotate(__left_split, __middle, __right_split);
> > > +  }
> > > +  } // namespace __detail
> > > +
> > >   struct __stable_partition_fn
> > >   {
> > > template _Sent,
> > > @@ -3144,11 +3219,32 @@ namespace ranges
> > >   operator()(_Iter __first, _Sent __last,
> > >  _Pred __pred, _Proj __proj = {}) const
> > >   {
> > > -   auto __lasti = ranges::next(__first, __last);
> > > -   auto __middle
> > > - = std::stable_partition(std::move(__first), __lasti,
> > > - __detail::__make_pred_proj(__pred, __proj));
> > > -   return {std::move(__middle), std::move(__lasti)};
> > > +   auto __pred_proj = __detail::__make_pred_proj(__pred, __proj);
> > > +   __first = ranges::find_if_not(__first, __last, __pred_proj);
> >
> > Does this end up going through another layer of
> > invoke(pred, invoke(proj, *i)) inside ranges::find_if_not?
> > Hopeuflly with the recent _

[PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-27 Thread Yuao Ma

Hi Jakub,

> I think the __builtin_constant_p(acospi(0.5)) approach is usable, but would
> be much better done on the lib/target-supports.exp side.
> So, have foldable_pi_based_trigonometry effective target, which would test
> if __builtin_constant_p(acospi(0.5)) is 1.

Thanks again for your helpful advice.

I've added the foldable_pi_based_trigonometry effective target and removed the
conditional branch in the test case. The test results look good.

Thanks,
Yuao



0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch
Description: 0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch

[PATCH] libstdc++: Lift locale initialization in main chrono format loop [PR110739]

2025-06-27 Thread Tomasz Kamiński

This patch lifts locale initialization from locale-specific handling methods
into _M_format_to function, and pass the locale by const reference.
To avoid uncessary computation of locale::classic(), we use _Optional_locale,
and emplace __fc.locale() into it only for localized formatting 
(_M_spec._M_localized) or locale::classic() if chrono-spec contains locale
specific specifiers (_M_spec._M_locale_specific).
The later is inprecise, as locale::classic() is only needed for subset of
locale-specific specifiers (%a, %A, %b, %B, %c, %p, %r) while _M_locale_specific
is set for %x,%x and if O/E modifiers are used. However, default output are not
impacted (they use %b), or in case when month (%b) and weekday (%a) is used, the
locale::classic() is be constructed once for non-localized output.
 
In _M_S we no longer guard quering of numpuct facet, with check that requires
potentially equally expensive construction of locale::classic. We also mark
localized as unlikely.

The _M_locale method is no longer used in __formatter_chrono, and thus was
moved to __formatter_duration.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Compute locale and pass it to specifiers method.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
(__formatter_chrono::_M_c, __formatter_chrono::_M_p)
(__formatter_chrono::_M_r): Accept locale instead of format context.
(__formatter_chrono::_M_subsecs): Call __ctx.locale() directly,
instead of _M_locale and do not compare with locale::classic().
Add [[unlikely]] attributes.
(__formatter_chrono::_M_locale): Move to __formatter_duration.
(__formatter_duration::_M_locale): Moved from __formatter_chrono.
---
Doing this in separate patch, as I wanted to fix overlow warning soon.
I have realized that we have dedicated _Optional_locale that I can use
to avoid calling locale::classic().
Testing on x86_64-linux. OK for trunk when all test passes?

 libstdc++-v3/include/bits/chrono_io.h | 71 ---
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index bcf9830fb9e..a25cb9ada01 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -964,10 +964,16 @@ namespace __format
return std::move(__out);
  };
 
+ _Optional_locale __loc;
+ if (_M_spec._M_localized)
+   __loc = __fc.locale();
+ else if (_M_spec._M_locale_specific)
+   __loc = locale::classic();
+
  struct tm __tm{};
  bool __use_locale_fmt = false;
  if (_M_spec._M_localized && _M_spec._M_locale_specific)
-   if (__fc.locale() != locale::classic())
+   if (__loc.value() != locale::classic())
  {
__use_locale_fmt = true;
 
@@ -1004,7 +1010,7 @@ namespace __format
{
  _CharT __c = *__first++;
  if (__use_locale_fmt && _S_localized_spec(__c, __mod)) 
[[unlikely]]
-   __out = _M_locale_fmt(std::move(__out), __fc.locale(),
+   __out = _M_locale_fmt(std::move(__out), __loc.value(),
  __tm, __c, __mod);
  else switch (__c)
{
@@ -1014,15 +1020,17 @@ namespace __format
  break;
case 'a':
case 'A':
- __out = _M_a_A(__t._M_weekday, std::move(__out), __fc, __c == 
'A');
+ __out = _M_a_A(__t._M_weekday, std::move(__out),
+__loc.value(), __c == 'A');
  break;
case 'b':
case 'h':
case 'B':
- __out = _M_b_B(__t._M_month, std::move(__out), __fc, __c == 
'B');
+ __out = _M_b_B(__t._M_month, std::move(__out),
+__loc.value(), __c == 'B');
  break;
case 'c':
- __out = _M_c(__t, std::move(__out), __fc);
+ __out = _M_c(__t, std::move(__out), __loc.value());
  break;
case 'C':
case 'y':
@@ -1058,7 +1066,7 @@ namespace __format
  __out = _M_M(__t._M_minutes, __print_sign());
  break;
case 'p':
- __out = _M_p(__t._M_hours, std::move(__out), __fc);
+ __out = _M_p(__t._M_hours, std::move(__out), __loc.value());
  break;
case 'q':
  __out = _M_q(__t._M_unit_suffix, std::move(__out));
@@ -1067,7 +1075,7 @@ namespace __format
  __out = _M_Q(__t, __print_sign(), __fc);
  break;
case 'r':
- __out = _M_r(__t, __print_sign(), __fc);
+ __out = _M_r(__t, __print_sign(), __loc.value());
  break;

Re: [PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:04:35PM +, Yuao Ma wrote:
> > I think the __builtin_constant_p(acospi(0.5)) approach is usable, but would
> > be much better done on the lib/target-supports.exp side.
> > So, have foldable_pi_based_trigonometry effective target, which would test
> > if __builtin_constant_p(acospi(0.5)) is 1.
> 
> Thanks again for your helpful advice.
> 
> I've added the foldable_pi_based_trigonometry effective target and removed the
> conditional branch in the test case. The test results look good.

--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -14495,3 +14495,15 @@ proc check_effective_target_xtensa_atomic { } {
#endif
 }]
 }
+
+# Return 1 if pi-based trigonometry function is foldable
+# We should remove this after bumping the minimum mpfr version to 4.2.0.
+proc check_effective_target_foldable_pi_based_trigonometry { } {
+return [check_runtime foldable_pi_based_trigonometry {
+   #include 

Please don't include math.h here.

+   int main ()
+   {
+ return !__builtin_constant_p (acospi (0.5));

And instead of this line use __builtin_acospi (0.5).
and, in dejagnu for runtime tests we prefer __builtin_abort on failure, so
  if (!__builtin_constant_p (__builtin_acospi (0.5)))
__builtin_abort ();
  return 0;

or so.

Otherwise LGTM.

Jakub

Re: [PATCH 7/8] libstdc++: Directly implement ranges::sample [PR100795]

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 15:37, Patrick Palka  wrote:
>
> On Fri, 27 Jun 2025, Jonathan Wakely wrote:
>
> > On 27/06/25 14:53 +0100, Jonathan Wakely wrote:
> > > On 26/06/25 23:12 -0400, Patrick Palka wrote:
> > > > On Thu, 26 Jun 2025, Patrick Palka wrote:
> > > >
> > > > > PR libstdc++/100795
> > > > >
> > > > > libstdc++-v3/ChangeLog:
> > > > >
> > > > > * include/bits/ranges_algo.h (__sample_fn::operator()):
> > > > > Reimplement the forward_iterator branch directly.
> > > > > * testsuite/25_algorithms/sample/constrained.cc (test02):
> > > > > New test.
> > > > > ---
> > > > > libstdc++-v3/include/bits/ranges_algo.h   | 70 +--
> > > > > .../25_algorithms/sample/constrained.cc   | 28 
> > > > > 2 files changed, 91 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > > > index b12da2af1263..672a0ebce0de 100644
> > > > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > > > @@ -1839,14 +1839,70 @@ namespace ranges
> > > > >   operator()(_Iter __first, _Sent __last, _Out __out,
> > > > >  iter_difference_t<_Iter> __n, _Gen&& __g) const
> > > > >   {
> > > > > +   // FIXME: Correctly handle integer-class difference types.
> > > >
> > > > On second thought maybe we don't need to teach uniform_int_distribution
> > > > to handle integer-class difference types.  We could just assert that
> > > > __n fits inside a long long and use that as the difference type?  Same
> > > > for shuffle.
> > >
> > > Yeah, if we're being asked to take more than 1<<64 samples something
> > > probably went very wrong somewhere.
> > >
> > > But isn't it valid to pass in an enormous value of n, as long as
> > > last - first is not ridiculous?
> > >
> > > for example:
> > >
> > > auto population = views::iota((__int128)0, (__int128)10);
> > > using D = ranges::difference_t;
> > > ranges::sample(population, out, numeric_limits::max(), gen);
> > >
> > > This n won't fit in long long, but min(last - first, n) will.
>
> Good point, noted.
>
> >
> > Does std::uniform_int_distribution currently support __int128? I think
> > it does, just using the slower "two divisions" path, because we don't
> > have a larger type to use for Lemire's algorithm.
>
> Ah, looks like uniform_int_distribution does support __int128 already, but
> only in non-strict mode.  In strict mode we trip over the is_integral
> static_assert (since __int128 isn't an integral type in strict mode), so
> that assert needs to be relaxed.

I already plan to make is_integral<__int128> unconditionally true (I
have a local branch with most of the changes for that).


>
> >
> > > > > if constexpr (forward_iterator<_Iter>)
> > > > >   {
> > > > > -   // FIXME: Forwarding to std::sample here requires 
> > > > > computing
> > > > > __lasti
> > > > > -   // which may take linear time.
> > > > > -   auto __lasti = ranges::next(__first, __last);
> > > > > -   return _GLIBCXX_STD_A::
> > > > > - sample(std::move(__first), std::move(__lasti), 
> > > > > std::move(__out),
> > > > > -__n, std::forward<_Gen>(__g));
> > > > > +   using _Size = iter_difference_t<_Iter>;
> > > > > +   using __distrib_type = uniform_int_distribution<_Size>;
> > > > > +   using __param_type = typename __distrib_type::param_type;
> > > > > +   using _USize = __detail::__make_unsigned_like_t<_Size>;
> > > > > +   using __uc_type
> > > > > + = common_type_t > > > > remove_reference_t<_Gen>::result_type,
> > > > > _USize>;
> > > > > +
> > > > > +   if (__first == __last)
> > > > > + return __out;
> > > > > +
> > > > > +   __distrib_type __d{};
> > > > > +   _Size __unsampled_sz = ranges::distance(__first, __last);
> > > > > +   __n = std::min(__n, __unsampled_sz);
> > > > > +
> > > > > +   // If possible, we use __gen_two_uniform_ints to 
> > > > > efficiently
> > > > > produce
> > > > > +   // two random numbers using a single distribution 
> > > > > invocation:
> > > > > +
> > > > > +   const __uc_type __urngrange = __g.max() - __g.min();
> > > > > +   if (__urngrange / __uc_type(__unsampled_sz) >=
> > > > > __uc_type(__unsampled_sz))
> > > > > + // I.e. (__urngrange >= __unsampled_sz * 
> > > > > __unsampled_sz) but
> > > > > without
> > > > > + // wrapping issues.
> > > > > + {
> > > > > +   while (__n != 0 && __unsampled_sz >= 2)
> > > > > + {
> > > > > +   const pair<_Size, _Size> __p =
> > > > > + __gen_two_uniform_ints(__unsampled_sz, 
> > > > > __unsampled_sz -
> > > > > 1, __g);
> > > > > +
> > > > > +   --__unsampled_sz;
> > > > >

[committed v2] libstdc++: Use runtime format for internal format calls in chrono [PR110739]

2025-06-27 Thread Tomasz Kamiński

This patch adjust all internal std::format call inside of __formatter_chrono,
to use runtime format string and thus avoid compile time checking of validity
of the format string. Majority of cases are covered by calling newly introduced
_S_empty_fs() function that returns __Runtime_format_string containing
_S_empty_spec, instead of passing later directly.

In case of _M_j we use _S_str_d3 function (extracted from _S_str_d2), 
eliminating
call to std::format outside of unlikely scenario in which day of year is greater
than 1000 (this may happen for year_month_day with month greater than 12). In
consequence, outside of handling subseconds, we no longer delegate to 
std::format
or construct temporary strings, when formatting chrono types with ok() values.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_S_empty_fs): Define.
(__formatter_chrono::_S_str_d2): Use _S_str_d3 for 3+ digits and
place allways_inline attribute after comment.
(__formatter_chrono::_S_str_d3): Extracted from _S_str_d2.
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_R_X): Replace
_S_empty_spec with _S_empty_fs().
(__formatter_chrono::_M_j): Likewise and use _S_str_d3 in common
case.
(__format::operator-(_ChronoParts, _ChronoParts))
(__format::operator-=(_ChronoParts, _ChronoParts))
(__formatter_chrono::_S_fill_two_digits)
(__formatter_chrono::_S_str_d1): Place always_inline attribute
after comment.
---
v2 changs placement of always_inline attributes.
Pushed to trunk.

 libstdc++-v3/include/bits/chrono_io.h | 37 +++
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 0ffbf06a7ff..bcf9830fb9e 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -250,14 +250,14 @@ namespace __format
   operator|=(_ChronoParts& __x, _ChronoParts __y) noexcept
   { return __x = __x | __y; }
 
-  [[__gnu__::__always_inline__]]
   // returns copy of x with all bits from y unset.
+  [[__gnu__::__always_inline__]]
   constexpr _ChronoParts
   operator-(_ChronoParts __x, _ChronoParts __y) noexcept
   { return static_cast<_ChronoParts>((unsigned short)__x & ~(unsigned 
short)__y); }
 
-  [[__gnu__::__always_inline__]]
   // unsets all bits of x that are set in y
+  [[__gnu__::__always_inline__]]
   constexpr _ChronoParts&
   operator-=(_ChronoParts& __x, _ChronoParts __y) noexcept
   { return __x = __x - __y; }
@@ -873,6 +873,11 @@ namespace __format
   static constexpr const _CharT* _S_minus_empty_spec = _S_chars + 17;
   static constexpr const _CharT* _S_empty_spec = _S_chars + 18;
 
+  [[__gnu__::__always_inline__]]
+  static _Runtime_format_string<_CharT>
+  _S_empty_fs()
+  { return _Runtime_format_string<_CharT>(_S_empty_spec); }
+
   // Return the formatting locale.
   template
std::locale
@@ -1405,7 +1410,7 @@ namespace __format
__i = 12;
}
  else if (__i >= 100) [[unlikely]]
-   return std::format_to(std::move(__out), _S_empty_spec, __i);
+   return std::format_to(std::move(__out), _S_empty_fs(), __i);
 
  return __format::__write(std::move(__out), _S_two_digits(__i));
}
@@ -1418,11 +1423,15 @@ namespace __format
  {
// Decimal number of days, without padding.
auto __d = chrono::floor(__t._M_hours).count();
-   return std::format_to(std::move(__out), _S_empty_spec, __d);
+   return std::format_to(std::move(__out), _S_empty_fs(), __d);
  }
 
- return std::format_to(std::move(__out), _GLIBCXX_WIDEN("{:03d}"),
-   __t._M_day_of_year.count());
+ auto __d = __t._M_day_of_year.count();
+ if (__d >= 1000) [[unlikely]]
+   return std::format_to(std::move(__out), _S_empty_fs(), __d);
+
+ _CharT __buf[3];
+ return __format::__write(std::move(__out), _S_str_d3(__buf, __d));
}
 
   template
@@ -1523,7 +1532,7 @@ namespace __format
 
  if (__hi >= 100) [[unlikely]]
{
- __out = std::format_to(std::move(__out), _S_empty_spec, __hi);
+ __out = std::format_to(std::move(__out), _S_empty_fs(), __hi);
  __sv.remove_prefix(2);
}
  else
@@ -1728,8 +1737,8 @@ namespace __format
};
   }
 
-  [[__gnu__::__always_inline__]]
   // Fills __buf[0] and __buf[1] with 2 digit value of __n.
+  [[__gnu__::__always_inline__]]
   static void
   _S_fill_two_digits(_CharT* __buf, unsigned __n)
   {
@@ -1738,9 +1747,9 @@ namespace __format
__buf[1] = __sv[1];
   }
 
-  [[__gnu__::__always_inline__]]
   // Returns decimal representation of __n.
   // Returned string_view may point to __buf.
+

[PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-27 Thread Yuao Ma

Hi Jakub,

> Please don't include math.h here.

Done.

> And instead of this line use __builtin_acospi (0.5).
> and, in dejagnu for runtime tests we prefer __builtin_abort on failure, so

Done.

Yuao




0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch
Description: 0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch

Re: [PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:38:32PM +, Yuao Ma wrote:
> Hi Jakub,
> 
> > Please don't include math.h here.
> 
> Done.
> 
> > And instead of this line use __builtin_acospi (0.5).
> > and, in dejagnu for runtime tests we prefer __builtin_abort on failure, so
> 
> Done.

Oh, one more thing.
signbit is documented to be a macro, so please don't declare
int signbit (double);
function in the testcase and instead of signbit use __builtin_signbit.

Ok for trunk with that fixed.

Jakub

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-06-27 Thread Richard Biener

On Wed, Jun 25, 2025 at 11:39 AM Richard Biener
 wrote:
>
> On Tue, Jun 24, 2025 at 5:25 PM Alexander Monakov  wrote:
> >
> > > I'd say we want to fix these kind of things before switching the default. 
> > >  Can
> > > you file bugreports for the distinct issues you noticed when adjusting the
> > > testcases?
> >
> > Sure, filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120808 for the most
> > frequently hit issue on x86 for now.
>
> Thanks.  So almost all issues arise because the FMAs are then introduced early
> (and possible folding with negates is done late).  At some point we've 
> arranged
> FMAs to be produced after vectorization only (there might be targets with 
> scalar
> FMA but no vector FMA for example).
>
> It shouldn't be too hard to handle FMAs during vectorization but having a mix
> will certainly complicate things.  Likewise undoing FMA creation when there's
> no vector FMA would rely on detecting whether the FMA was introduced by
> the compiler or the middle-end (I suppose builtin vs. IFN might do the
> job here).
>
> > > I suppose they are reproducible as well when using the C fma() function
> > > directly?
> >
> > No, unfortunately there are multiple issues with fma builtin:
> >
> > 1) __builtin_fma does not accept generic vector types
>
> indeed, you'd have to declare an OMP SIMD fma variant but that will not
> be recognized as fma or .FMA then I think.
>
> > 2) we have FMS FNMA FNMS FMADDSUB FMSUBADD internal functions, but
> > no corresponding builtins
>
> These are direct optab internal functions.  I'm not sure we want
> builtins for all of
> those, fma () with negated arguments should do fine.
>
> > 3) __builtin_fma and .FMA internal function are not the same in the 
> > middle-end,
> > I reported one instance arising from that in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892
>
> The builtin and the internal function should behave the same, in this case 
> it's
> again late vs. early exposal of FMA.
>
> I am testing partial fixes for these issues.

Can you check again after r16-1731-g08bdb6b4a32f1f?

Thanks,
Richard.

>
> Richard.
>
> >
> > Alexander

[PATCH] libstdc++: Fix warnings introduced by type-erasing for chrono commits [PR110739]

2025-06-27 Thread Tomasz Kamiński

The r16-1709-g4b3cefed1a08344495fedec4982d85168bd8173f caused `-Woverflow`
in empty_spec.cc file. This warning is not cause by any issue in shipping
code, and results in taking to much shorcut when implementing a test-only
custom representation type Rep, where long was always used to store a value.
In particular common type for Rep and long long int, was de-facto long.
This is addressed by adding Under template parameter, that controls the type
of stored value, and handling it properly in common_type specializations.
No changs to shipping code are necessary.

Secondly, extacting _M_locale_fmt calls in r16-1712-gcaac94, resulted
in __ctx format parameter no longer being used. This patch removes
such parameter entirely, and replace _FormatContext template parameter,
with _OutIter parameter for __out. For consistency type of the __out
is decoupled from _FormatContext, for functions that still need context:
 * to extract locale (_M_A_a, _M_B_b, _M_c, _M_p, _M_r, _M_subsecs)
 * perform formatting for duration/subseconds (_M_Q, _M_T, _M_f)

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Rename _Out to _OutIter for consistency, and update calls
to specifier functions.
(__formatter_chrono::_M_wi, __formatter_chrono::_M_C_y_Y)
(__formatter_chrono::_M_D_x, __formatter_chrono::_M_d_e)
(__formatter_chrono::_M_F, __formatter_chrono::_M_g_G)
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_j)
(__formatter_chrono::_M_m, __formatter_chrono::_M_M)
(__formatter_chrono::_M_q, __formatter_chrono::_M_R_X)
(__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W)
(__formatter_chrono::_M_z, __formatter_chrono::_M_z):
Remove _FormatContext parameter, and  introduce _OutIter
for __out type.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_B_b)
(__formatter_chrono::_M_p, __formatter_chrono::_M_Q)
(__formatter_chrono::_M_r, __formatter_chrono::_M_S)
(__formatter_chrono::_M_subsecs, __formatter_chrono::_M_T):
Introduce separate _OutIter template parameter for __out.
(__formatter_chrono::_M_c, __formatter_chrono::_M_T):
Likewise, and adjust calls to specifiers functions.
* testsuite/std/time/format/empty_spec.cc:
---
Still testing, but the warning in empty_spec.cc file disapeared when
using -m32.

 libstdc++-v3/include/bits/chrono_io.h | 219 --
 .../testsuite/std/time/format/empty_spec.cc   |  43 ++--
 2 files changed, 128 insertions(+), 134 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index bcfd51b9866..03f58c8a264 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -942,9 +942,9 @@ namespace __format
  return __out;
}
 
-  template
-   _Out
-   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
+  template
+   _OutIter
+   _M_format_to(const _ChronoData<_CharT>& __t, _OutIter __out,
 _FormatContext& __fc) const
{
  auto __first = _M_spec._M_chrono_specs.begin();
@@ -1005,7 +1005,7 @@ namespace __format
{
// %\0 is extension for handling weekday index
case '\0':
- __out = _M_wi(__t._M_weekday_index, std::move(__out), __fc);
+ __out = _M_wi(__t._M_weekday_index, std::move(__out));
  break;
case 'a':
case 'A':
@@ -1022,41 +1022,41 @@ namespace __format
case 'C':
case 'y':
case 'Y':
- __out = _M_C_y_Y(__t._M_year, std::move(__out), __fc, __c);
+ __out = _M_C_y_Y(__t._M_year, std::move(__out), __c);
  break;
case 'd':
case 'e':
- __out = _M_d_e(__t._M_day, std::move(__out), __fc, __c);
+ __out = _M_d_e(__t._M_day, std::move(__out), __c);
  break;
case 'D':
case 'x':
- __out = _M_D_x(__t, std::move(__out), __fc);
+ __out = _M_D_x(__t, std::move(__out));
  break;
case 'F':
- __out = _M_F(__t, std::move(__out), __fc);
+ __out = _M_F(__t, std::move(__out));
  break;
case 'g':
case 'G':
- __out = _M_g_G(__t, std::move(__out), __fc, __c == 'G');
+ __out = _M_g_G(__t, std::move(__out), __c == 'G');
  break;
case 'H':
case 'I':
- __out = _M_H_I(__t._M_hours, __print_sign(), __fc, __c);
+ __out = _M_H_I(__t._M_hours, __print_sign(), __c);
  break;
case 'j':
- __out = _M_j(__t, __print_sign(), __fc)

Re: [PATCH] libstdc++: Fix warnings introduced by type-erasing for chrono commits [PR110739]

2025-06-27 Thread Tomasz Kaminski

On Fri, Jun 27, 2025 at 1:03 PM Tomasz Kamiński  wrote:

> The r16-1709-g4b3cefed1a08344495fedec4982d85168bd8173f caused `-Woverflow`
> in empty_spec.cc file. This warning is not cause by any issue in shipping
> code, and results in taking to much shorcut when implementing a test-only
> custom representation type Rep, where long was always used to store a
> value.
> In particular common type for Rep and long long int, was de-facto long.
> This is addressed by adding Under template parameter, that controls the
> type
> of stored value, and handling it properly in common_type specializations.
> No changs to shipping code are necessary.
>
> Secondly, extacting _M_locale_fmt calls in r16-1712-gcaac94, resulted
> in __ctx format parameter no longer being used. This patch removes
> such parameter entirely, and replace _FormatContext template parameter,
> with _OutIter parameter for __out. For consistency type of the __out
> is decoupled from _FormatContext, for functions that still need context:
>  * to extract locale (_M_A_a, _M_B_b, _M_c, _M_p, _M_r, _M_subsecs)
>  * perform formatting for duration/subseconds (_M_Q, _M_T, _M_f)
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
> Rename _Out to _OutIter for consistency, and update calls
> to specifier functions.
> (__formatter_chrono::_M_wi, __formatter_chrono::_M_C_y_Y)
> (__formatter_chrono::_M_D_x, __formatter_chrono::_M_d_e)
> (__formatter_chrono::_M_F, __formatter_chrono::_M_g_G)
> (__formatter_chrono::_M_H_I, __formatter_chrono::_M_j)
> (__formatter_chrono::_M_m, __formatter_chrono::_M_M)
> (__formatter_chrono::_M_q, __formatter_chrono::_M_R_X)
> (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W)
> (__formatter_chrono::_M_z, __formatter_chrono::_M_z):
> Remove _FormatContext parameter, and  introduce _OutIter
> for __out type.
> (__formatter_chrono::_M_a_A, __formatter_chrono::_M_B_b)
> (__formatter_chrono::_M_p, __formatter_chrono::_M_Q)
> (__formatter_chrono::_M_r, __formatter_chrono::_M_S)
> (__formatter_chrono::_M_subsecs, __formatter_chrono::_M_T):
> Introduce separate _OutIter template parameter for __out.
> (__formatter_chrono::_M_c, __formatter_chrono::_M_T):
> Likewise, and adjust calls to specifiers functions.
> * testsuite/std/time/format/empty_spec.cc:
>
This now says:
* testsuite/std/time/format/empty_spec.cc: Make underlying
type for Rep configurable.


> ---
> Still testing, but the warning in empty_spec.cc file disapeared when
> using -m32.
>
>  libstdc++-v3/include/bits/chrono_io.h | 219 --
>  .../testsuite/std/time/format/empty_spec.cc   |  43 ++--
>  2 files changed, 128 insertions(+), 134 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index bcfd51b9866..03f58c8a264 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -942,9 +942,9 @@ namespace __format
>   return __out;
> }
>
> -  template
> -   _Out
> -   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
> +  template
> +   _OutIter
> +   _M_format_to(const _ChronoData<_CharT>& __t, _OutIter __out,
>  _FormatContext& __fc) const
> {
>   auto __first = _M_spec._M_chrono_specs.begin();
> @@ -1005,7 +1005,7 @@ namespace __format
> {
> // %\0 is extension for handling weekday index
> case '\0':
> - __out = _M_wi(__t._M_weekday_index, std::move(__out),
> __fc);
> + __out = _M_wi(__t._M_weekday_index, std::move(__out));
>   break;
> case 'a':
> case 'A':
> @@ -1022,41 +1022,41 @@ namespace __format
> case 'C':
> case 'y':
> case 'Y':
> - __out = _M_C_y_Y(__t._M_year, std::move(__out), __fc,
> __c);
> + __out = _M_C_y_Y(__t._M_year, std::move(__out), __c);
>   break;
> case 'd':
> case 'e':
> - __out = _M_d_e(__t._M_day, std::move(__out), __fc, __c);
> + __out = _M_d_e(__t._M_day, std::move(__out), __c);
>   break;
> case 'D':
> case 'x':
> - __out = _M_D_x(__t, std::move(__out), __fc);
> + __out = _M_D_x(__t, std::move(__out));
>   break;
> case 'F':
> - __out = _M_F(__t, std::move(__out), __fc);
> + __out = _M_F(__t, std::move(__out));
>   break;
> case 'g':
> case 'G':
> - __out = _M_g_G(__t, std::move(__out), __fc, __c == 'G');
> + __

Re: [PATCH v3 6/6] LoongArch: Add support for _BitInt [PR117599]

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:39PM +0800, Yang Yujie wrote:
> [1] https://github.com/loongson/la-abi-specs
> 
>   PR target/117599
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.h: Define a PROMOTE_MODE case for
>   small _BitInts.
>   * config/loongarch/loongarch.cc (loongarch_promote_function_mode):
>   Same.
>   (loongarch_bitint_type_info): New function.
>   (TARGET_C_BITINT_TYPE_INFO): Declare.
> 
> libgcc/ChangeLog:
> 
>   * config/loongarch/t-softfp-tf: Enable _BitInt helper functions.
>   * config/loongarch/t-loongarch: Same.
>   * config/loongarch/libgcc-loongarch.ver: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/bitint-alignments.c: New test.
>   * gcc.target/loongarch/bitint-args.c: New test.
>   * gcc.target/loongarch/bitint-sizes.c: New test.

This needs to be reviewed by LoongArch maintainers.

> @@ -11214,6 +11214,34 @@ loongarch_c_mode_for_suffix (char suffix)
>return VOIDmode;
>  }
>  
> +/* Implement TARGET_C_BITINT_TYPE_INFO.
> +   Return true if _BitInt(N) is supported and fill its details into *INFO.  
> */
> +bool
> +loongarch_bitint_type_info (int n, struct bitint_info *info)
> +{
> +  if (n <= 8)
> +info->limb_mode = QImode;
> +  else if (n <= 16)
> +info->limb_mode = HImode;
> +  else if (n <= 32)
> +info->limb_mode = SImode;
> +  else if (n <= 64)
> +info->limb_mode = DImode;
> +  else if (n <= 128)
> +info->limb_mode = TImode;
> +  else
> +info->limb_mode = DImode;
> +
> +  info->abi_limb_mode = info->limb_mode;
> +
> +  if (n > 64)
> +info->abi_limb_mode = TImode;
> +
> +  info->big_endian = false;
> +  info->extended = true;
> +  return true;
> +}

>From my POV this is ok.

> +GCC_16.0.0 {
> +__mulbitint3
> +__divmodbitint4
> +__fixsfbitint
> +__fixdfbitint
> +__floatbitintsf
> +__floatbitintdf
> +}
> --- a/libgcc/config/loongarch/t-softfp-tf
> +++ b/libgcc/config/loongarch/t-softfp-tf
> @@ -1,3 +1,4 @@
>  softfp_float_modes += tf
>  softfp_extensions += sftf dftf
>  softfp_truncations += tfsf tfdf
> +softfp_extras += floatbitinttf fixtfbitint

Given the tf stuff here, shouldn't
__fixtfbitint and __floatbitinttf be in the export list next to sf/df?

Jakub

Re: [PATCH v3 3/6] bitint: Allow unused bits when testing extended _BitInt ABIs

2025-06-27 Thread Jakub Jelinek

On Fri, Jun 27, 2025 at 03:33:36PM +0800, Yang Yujie wrote:
> In LoongArch psABI, large _BitInt(N) (N > 64) objects are only
> extended to fill the highest 8-byte chunk that contains any used bit,
> but the size of such a large _BitInt type is a multiple of their
> 16-byte alignment.  So there may be an entire unused 8-byte
> chunk that is not filled by extension, and this chunk shouldn't be
> checked when testing if the object is properly extended.
> 
> The original bitintext.h assumed that all bits within
> sizeof(_BitInt(N)) beyond used bits are filled by extension.
> This patch changes that for LoongArch and possibly
> any future ports with a similar behavior.
> 
> P.S. For encoding this test as well as type-generic programming,
> it would be nice to have a builtin function to obtain "N" at
> compile time from _BitInt(N)-typed expressions.  But here
> we stick to existing ones (__builtin_clrsbg / __builtin_clzg).
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/bitintext.h: Generalize BEXTC to only check extension
>   within PROMOTED_SIZE bits.

Ok for trunk.

Jakub

Re: [PATCH] libstdc++: Use runtime format for internal format calls in chrono [PR110739]

2025-06-27 Thread Tomasz Kaminski

On Fri, Jun 27, 2025 at 10:31 AM Tomasz Kamiński 
wrote:

> This patch adjust all internal std::format call inside of
> __formatter_chrono,
> to use runtime format stirng and thus avoid compile time checking of
> validity
> of the format string. Majority of cases are covered by calling newly
> introduced
> _S_empty_fs() function that returns __Runtime_format_string containing
> _S_empty_spec, instead of passing later directly.
>
> In case of _M_j we use _S_str_d3 function (extracted from _S_str_d2),
> eliminating
> call to std::format outside of unlikely scenario in which day of year is
> greater
> than 1000 (this may happen for year_month_day with month greater than 12).
> In
> consequence, outside of handling subseconds, we no longer delegate to
> std::format
> or construct temporary strings, when formatting chrono types with ok()
> values.
>
> PR libstdc++/110739
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__formatter_chrono::_S_empty_fs):
> Define.
> (__formatter_chrono::_S_str_d2): Use _S_str_d3 for 3+ digits.
> (__formatter_chrono::_S_str_d3): Extracted from _S_str_d2.
> (__formatter_chrono::_M_H_I, __formatter_chrono::_M_R_X): Replace
> _S_empty_spec with _S_empty_fs().
> (__formatter_chrono::_M_j): Likewise and use _S_str_d3 in common
> case.
> ---



I do not think this buys us much, but I think it is worth doing anyway.
It also finishes my side goal, of getting rid of temporary strings, and
using local buffers,
that I applied to other specifiers in previous commits.
Tested on x86_64-linux locally. The std/time* test passed with
-D_GLIBCXX_USE_CXX11_ABI=0 and -D_GLIBCXX_DEBUG.
OK for trunk?


>  libstdc++-v3/include/bits/chrono_io.h | 27 ++-
>  1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index bcfd51b9866..d6bc6c7cf2a 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -873,6 +873,11 @@ namespace __format
>static constexpr const _CharT* _S_minus_empty_spec = _S_chars + 17;
>static constexpr const _CharT* _S_empty_spec = _S_chars + 18;
>
> +  [[__gnu__::__always_inline__]]
> +  static _Runtime_format_string<_CharT>
> +  _S_empty_fs()
> +  { return _Runtime_format_string<_CharT>(_S_empty_spec); }
> +
>// Return the formatting locale.
>template
> std::locale
> @@ -1411,7 +1416,7 @@ namespace __format
> __i = 12;
> }
>   else if (__i >= 100) [[unlikely]]
> -   return std::format_to(std::move(__out), _S_empty_spec, __i);
> +   return std::format_to(std::move(__out), _S_empty_fs(), __i);
>
>   return __format::__write(std::move(__out), _S_two_digits(__i));
> }
> @@ -1425,11 +1430,15 @@ namespace __format
>   {
> // Decimal number of days, without padding.
> auto __d = chrono::floor(__t._M_hours).count();
> -   return std::format_to(std::move(__out), _S_empty_spec, __d);
> +   return std::format_to(std::move(__out), _S_empty_fs(), __d);
>   }
>
> - return std::format_to(std::move(__out), _GLIBCXX_WIDEN("{:03d}"),
> -   __t._M_day_of_year.count());
> + auto __d = __t._M_day_of_year.count();
> + if (__d >= 1000) [[unlikely]]
> +   return std::format_to(std::move(__out), _S_empty_fs(), __d);
> +
> + _CharT __buf[3];
> + return __format::__write(std::move(__out), _S_str_d3(__buf,
> __d));
> }
>
>template
> @@ -1534,7 +1543,7 @@ namespace __format
>
>   if (__hi >= 100) [[unlikely]]
> {
> - __out = std::format_to(std::move(__out), _S_empty_spec,
> __hi);
> + __out = std::format_to(std::move(__out), _S_empty_fs(),
> __hi);
>   __sv.remove_prefix(2);
> }
>   else
> @@ -1772,7 +1781,15 @@ namespace __format
>{
> if (__n < 100) [[likely]]
>   return _S_two_digits(__n);
> +return _S_str_d3(__buf, __n);
> +  }
>
> +  [[__gnu__::__always_inline__]]
> +  // Returns decimal representation of __n, padded to 3 digits.
> +  // Returned string_view points to __buf.
> +  static basic_string_view<_CharT>
> +  _S_str_d3(span<_CharT, 3> __buf, unsigned __n)
> +  {
> _S_fill_two_digits(__buf.data(), __n / 10);
> __buf[2] = _S_chars[__n % 10];
> return __string_view(__buf.data(), 3);
> --
> 2.49.0
>
>

[PATCH 2/7 v2] RISC-V: Add support for the XAndesperf ISA extension.

2025-06-27 Thread KuanLin Chen

Hi Kito,

>>* +(define_predicate "extract_loc_imm_si"*

> Rename it to unsigned_5_bit_integer_operand

>>* +  (and (match_code "const_int")
*>>* +(match_test "IN_RANGE (INTVAL (op), 0, 31)")))
*>>* +*

>>* +(define_predicate "extract_loc_imm_di"*

> Rename it to unsigned_6_bit_integer_operand

>>* + (and (match_code "const_int") *>>* + (match_test "IN_RANGE (INTVAL
(op), 0, 63)")))*


All fixed as you mentioned. But I use "const_int6_operand" that already
defined to substitute *"extract_loc_imm_di".*
And Define "const_int5_operand" to substitute "*extract_loc_imm_si".*

Hi Jeff,

> It doesn't look like the conditional branch patterns support out of
> range targets.  Or is there something I'm missing?

> Long branch handling isn't terribly hard.  For most cases it'll end up
> generating assembly like this (from the "branch" pattern in riscv.md:

>* {
*>*   if (get_attr_length (insn) == 12)
*>* return "b%r1\t%2,%z3,1f; jump\t%l0,ra; 1:";
*> >*   return "b%C1\t%2,%z3,%l0";
*>* }*

> Closely related, I think your "length" attribute is wrong and needs
> updating.Length is usually computed by generic code, is that not
> working in your case?

Add long branch handling and remove "length" on
"*nds_branch_imms7" and "*nds_branch_on_bit"

> One of your cost cases sets *total = 0.  That seems quite unexpected.
> Is that actually correct?  I would have expected COSTS_N_INSNS (1).

This is for "*nds_branch_on_bit", we expect to lower the
"ZERO_EXTRACT" cost for the combine phase to generate

the pattern easierly.

> For the riscv.md define_insn_and_splits that you changed, changing the
> condition seems correct.  But I don't think you need to change the split
> conditional.  When the split condition starts with "&&" it'll use the
> main condition && the split condition.

Fixed. Thanks for your remind. I don't know the minor difference before.

Thanks to both of you.

This patch adds support for the XAndesperf ISA extension.
The 32-bit AndeStar V5 extension includes branch instructions,
load effective address instructions, and string processing
instructions for performance improvement.
New INSN patterns are added into the new file andes.md
as a seprated vender extension.

gcc/ChangeLog:

* config/riscv/constraints.md (Ou07): New constraint.
(ads_Bext): New constraint.
* config/riscv/iterators.md (ANYLE32): New iterator.
(sizen): New iterator.
(sh_limit): New iterator.
(sh_bit): New iterator.
* config/riscv/predicates.md (ads_branch_bbcs_operand): New predicate.
(ads_branch_bimm_operand): New predicate.
(ads_imm_extract_operand): New predicate.
(ads_extract_size_imm_si): New predicate.
(ads_extract_size_imm_di): New predicate.
(const_int5_operand): New predicate.
* config/riscv/riscv-builtins.cc:
Add new AVAIL andesperf32 and andesperf64.
Add new define RISCV_ATYPE_ULONG and RISCV_ATYPE_LONG.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
* config/riscv/riscv.cc
(riscv_extend_cost): Cost for pattern 'bfo'.
(riscv_rtx_costs): Cost for XAndesperf extension.
* config/riscv/riscv.md: Add support for XAndesperf to patterns
zero_extendsidi2_internal, zero_extendhi2, extendsidi2_internal,
extend2, 3
and branch_on_bit.
* config/riscv/vector-iterators.md
 (sz): Add sign_extract and zero_extract.
* config/riscv/andes.def: New file for vender Andes.
* config/riscv/andes.md: New file for vender Andes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandesperf-1.c: New test.
* gcc.target/riscv/xandesperf-10.c: New test.
* gcc.target/riscv/xandesperf-2.c: New test.
* gcc.target/riscv/xandesperf-3.c: New test.
* gcc.target/riscv/xandesperf-4.c: New test.
* gcc.target/riscv/xandesperf-5.c: New test.
* gcc.target/riscv/xandesperf-6.c: New test.
* gcc.target/riscv/xandesperf-7.c: New test.
* gcc.target/riscv/xandesperf-8.c: New test.
* gcc.target/riscv/xandesperf-9.c: New test.


0002-RISC-V-Add-support-for-the-XAndesperf-ISA-extension.patch
Description: Binary data

[PATCH 3/7 v2] RISC-V: Add support for the XAndesbfhcvt ISA extension.

2025-06-27 Thread KuanLin Chen

Hi,

This extension defines instructions to perform scalar floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754
32-bit single-precision floating-point (SP) data in a scalar
floating point register.

gcc/ChangeLog:

* config/riscv/andes.def: Add nds_fcvt_s_bf16 and nds_fcvt_bf16_s.
* config/riscv/andes.md (riscv_nds_fcvt_bf16_s): New pattern.
(riscv_nds_fcvt_s_bf16): New pattern.
* config/riscv/riscv-builtins.cc: New AVAIL andesbfhcvt.
Add new define RISCV_ATYPE_BF and RISCV_ATYPE_SF.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandesbfhcvt-1.c: New test.
* gcc.target/riscv/xandesbfhcvt-2.c: New test.


0003-RISC-V-Add-support-for-the-XAndesbfhcvt-ISA-extensio.patch
Description: Binary data

RE: [PATCH V3] x86: Enable separate shrink wrapping

2025-06-27 Thread Cui, Lili

> -Original Message-
> From: H.J. Lu 
> Sent: Friday, June 27, 2025 4:48 PM
> To: Cui, Lili 
> Cc: ubiz...@gmail.com; gcc-patches@gcc.gnu.org; Liu, Hongtao
> ; richard.guent...@gmail.com; Michael Matz
> ; Sam James ; kenjin4...@gmail.com
> Subject: Re: [PATCH V3] x86: Enable separate shrink wrapping
> 
> On Tue, Jun 17, 2025 at 10:04 PM Cui, Lili  wrote:
> >
> > From: Lili Cui 
> >
> > Hi Uros,
> >
> > This is patch v3, the main changes are as follows.
> >
> > 1. Added a pro_epilogue_adjust_stack_add_nocc in i386.md to add memory
> clobber for lea/mov.
> > 2. Adjusted some formatting issues.
> > 3. Added scan-rtl-dumps for ia32 in shrink_wrap_separate.C.
> >
> > Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No
> performance regression was observed.
> > For O2 multi-copy :
> > 511.povray_r improved by 2.8% on ZNVER5.
> > 511.povray_r improved by 4.2% on EMR
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > Use this patch to build the latest Linux kernel and boot successfully.
> >
> > Thanks,
> > Lili.
> >
> >
> > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> > enable separate shrink wrapping for function prologues/epilogues in
> > x86.
> >
> > When performing separate shrink wrapping, we choose to use mov instead
> > of push/pop, because using push/pop is more complicated to handle rsp
> > adjustment and may lose performance, so here we choose to use mov,
> > which has a small impact on code size, but guarantees performance.
> >
> > Using mov means we need to use sub/add to maintain the stack frame. In
> > some special cases, we need to use lea to prevent affecting EFlags.
> >
> > Avoid inserting sub between test-je-jle to change EFlags, lea should
> > be used here.
> >
> > foo:
> > xorl%eax, %eax
> > testl   %edi, %edi
> > je  .L11
> > sub $16, %rsp  --> leaq-16(%rsp), %rsp
> > movq%r13, 8(%rsp)
> > movl$1, %r13d
> > jle .L4
> >
> > Tested against SPEC CPU 2017, this change always has a net-positive
> > effect on the dynamic instruction count.  See the following table for
> > the breakdown on how this reduces the number of dynamic instructions
> > per workload on a like-for-like (with/without this commit):
> >
> > instruction count   basewith commit (commit-base)/commit
> > 502.gcc_r   98666845943 96891561634 -1.80%
> > 526.blender_r   6.21226E+11 6.12992E+11 -1.33%
> > 520.omnetpp_r   1.1241E+11  1.11093E+11 -1.17%
> > 500.perlbench_r 1271558717  1263268350  -0.65%
> > 523.xalancbmk_r 2.20103E+11 2.18836E+11 -0.58%
> > 531.deepsjeng_r 2.73591E+11 2.72114E+11 -0.54%
> > 500.perlbench_r 64195557393 63881512409 -0.49%
> > 541.leela_r 2.99097E+11 2.98245E+11 -0.29%
> > 548.exchange2_r 1.27976E+11 1.27784E+11 -0.15%
> > 527.cam4_r  88981458425 7334679 -0.11%
> > 554.roms_r  2.60072E+11 2.59809E+11 -0.10%
> >
> > Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No
> performance regression was observed.
> >
> > For O2 multi-copy :
> > 511.povray_r improved by 2.8% on ZNVER5.
> > 511.povray_r improved by 4% on EMR
> > 511.povray_r improved by 3.3 % ~ 4.6% on ICELAKE.
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-protos.h (ix86_get_separate_components):
> > New function.
> > (ix86_components_for_bb): Likewise.
> > (ix86_disqualify_components): Likewise.
> > (ix86_emit_prologue_components): Likewise.
> > (ix86_emit_epilogue_components): Likewise.
> > (ix86_set_handled_components): Likewise.
> > * config/i386/i386.cc (save_regs_using_push_pop):
> > Split from ix86_compute_frame_layout.
> > (ix86_compute_frame_layout):
> > Use save_regs_using_push_pop.
> > (pro_epilogue_adjust_stack):
> > Use gen_pro_epilogue_adjust_stack_add_nocc.
> > (ix86_expand_prologue): Add some assertions and adjust
> > the stack frame at the beginning of the prolog for shrink
> > wrapping separate.
> > (ix86_emit_save_regs_using_mov):
> > Skip registers that are wrapped separately.
> > (ix86_emit_restore_regs_using_mov): Likewise.
> > (ix86_expand_epilogue): Add some assertions and set
> > restore_regs_via_mov to true for shrink wrapping separate.
> > (ix86_get_separate_components): New function.
> > (ix86_components_for_bb): Likewise.
> > (ix86_disqualify_components): Likewise.
> > (ix86_emit_prologue_components): Likewise.
> > (ix86_emit_epilogue_components): Likewise.
> > (ix86_set_handled_components): Likewise.
> > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> > (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> > (TARGET_SH

Re: [PATCH 4/8] libstdc++: Directly implement ranges::stable_sort [PR100795]

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 15:16, Jonathan Wakely  wrote:
>
> On Fri, 27 Jun 2025 at 15:15, Patrick Palka  wrote:
> >
> > On Fri, 27 Jun 2025, Jonathan Wakely wrote:
> >
> > > On 26/06/25 22:25 -0400, Patrick Palka wrote:
> > > > PR libstdc++/100795
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > > * include/bits/ranges_algo.h (__detail::__move_merge): New,
> > > > based on the stl_algo.h implementation.
> > > > (__detail::__merge_sort_loop): Likewise.
> > > > (__detail::__chunk_insertion_sort): Likewise.
> > > > (__detail::__merge_sort_with_buffer): Likewise.
> > > > (__detail::__stable_sort_adaptive): Likewise.
> > > > (__detail::__stable_sort_adaptive_resize): Likewise.
> > > > (__detail::__inplace_stable_sort): Likewise.
> > > > (__stable_sort_fn::operator()): Reimplement in terms of the above.
> > > > * testsuite/25_algorithms/stable_sort/constrained.cc:
> > > > ---
> > > > libstdc++-v3/include/bits/ranges_algo.h   | 207 +-
> > > > .../25_algorithms/stable_sort/constrained.cc  |  30 +++
> > > > 2 files changed, 233 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > > index b0357600adbc..7dfd4e7ed64c 100644
> > > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > > @@ -2388,6 +2388,170 @@ namespace ranges
> > > >
> > > >   inline constexpr __sort_fn sort{};
> > > >
> > > > +  namespace __detail
> > > > +  {
> > > > +/// This is a helper function for the __merge_sort_loop routines.
> > > > +template
> > > > +  _Out
> > > > +  __move_merge(_Iter __first1, _Iter __last1,
> > > > +  _Iter __first2, _Iter __last2,
> > > > +  _Out __result, _Comp __comp)
> > > > +  {
> > > > +   while (__first1 != __last1 && __first2 != __last2)
> > > > + {
> > > > +   if (__comp(*__first2, *__first1))
> > > > + {
> > > > +   *__result = ranges::iter_move(__first2);
> > > > +   ++__first2;
> > > > + }
> > > > +   else
> > > > + {
> > > > +   *__result = ranges::iter_move(__first1);
> > > > +   ++__first1;
> > > > + }
> > > > +   ++__result;
> > > > + }
> > > > +   return ranges::move(__first2, __last2,
> > > > +   ranges::move(__first1, __last1,
> > > > __result).out).out;
> > > > +  }
> > > > +
> > > > +template > > > typename
> > > > _Comp>
> > > > +  void
> > > > +  __merge_sort_loop(_Iter __first, _Iter __last, _Out __result,
> > > > +   _Distance __step_size, _Comp __comp)
> > > > +  {
> > > > +   const _Distance __two_step = 2 * __step_size;
> > > > +
> > > > +   while (__last - __first >= __two_step)
> > > > + {
> > > > +   __result = __detail::__move_merge(__first, __first + 
> > > > __step_size,
> > > > + __first + __step_size,
> > > > + __first + __two_step,
> > > > + __result, __comp);
> > > > +   __first += __two_step;
> > > > + }
> > > > +   __step_size = ranges::min(_Distance(__last - __first), __step_size);
> > > > +
> > > > +   __detail::__move_merge(__first, __first + __step_size,
> > > > +  __first + __step_size, __last, __result,
> > > > __comp);
> > > > +  }
> > > > +
> > > > +template
> > > > +  constexpr void
> > > > +  __chunk_insertion_sort(_Iter __first, _Iter __last,
> > > > +_Distance __chunk_size, _Compare __comp)
> > > > +  {
> > > > +   while (__last - __first >= __chunk_size)
> > > > + {
> > > > +   __detail::__insertion_sort(__first, __first + __chunk_size,
> > > > __comp);
> > > > +   __first += __chunk_size;
> > > > + }
> > > > +   __detail::__insertion_sort(__first, __last, __comp);
> > > > +  }
> > > > +
> > > > +template
> > > > +  void
> > > > +  __merge_sort_with_buffer(_Iter __first, _Iter __last,
> > > > +  _Pointer __buffer, _Comp __comp)
> > > > +  {
> > > > +   using _Distance = iter_difference_t<_Iter>;
> > > > +
> > > > +   const _Distance __len = __last - __first;
> > > > +   const _Pointer __buffer_last = __buffer + ptrdiff_t(__len);
> > > > +
> > > > +   constexpr int __chunk_size = 7;
> > > > +   _Distance __step_size = __chunk_size;
> > > > +   __detail::__chunk_insertion_sort(__first, __last, __step_size,
> > > > __comp);
> > > > +
> > > > +   while (__step_size < __len)
> > > > + {
> > > > +   __detail::__merge_sort_loop(__first, __last, __buffer,
> > > > +   __step_size, __comp);
> > > > +   __step_size *= 2;
> > > > +   __detail::__merge_sort_loop(__buffer, __buffer_last, __first,
> > > > +   ptrdiff_t(__step_size), __comp);
> > >

Re: [PATCH] sh: Recognize >> 31 in treg_set_expr_not_const01

2025-06-27 Thread Jeff Law





On 6/27/25 7:59 AM, Oleg Endo wrote:


On Fri, 2025-06-27 at 10:51 -0300, Raphael Moreira Zinsly wrote:

A right shift of 31 will become 0 or 1, this can be checked for
treg_set_expr_not_const01 to avoid matching addc_t_r as this
can expand to a 3 insn sequence instead.
This improves tests 023 to 026 from gcc.target/sh/pr54236-2.c, e.g.:
test_023:
shllr5
mov #0,r1
mov r4,r0
rts
addcr1,r0

With this change:
test_023:
shllr5
movtr0
rts
add r4,r0

We noticed this while evaluating a patch to improve how we handle
selecting between two constants based on the output of a LT/GE 0
test.

gcc/ChangeLog:
* config/sh/predicates.md
(treg_set_expr_not_const01): call sh_recog_treg_set_expr_not_01
* config/sh/sh-protos.h
(sh_recog_treg_set_expr_not_01): New function
config/sh/sh.cc (sh_recog_treg_set_expr_not_01): Likewise

gcc/testsuite/ChangeLog:
* gcc.target/sh/pr54236-2.c: Fix comments and expected output


Assuming that this passes the usual regression tests, it looks OK to me.
Please apply.
I ran it in my tester for Raphael.  So sh3/sh3eb linux crosses.  The 
sh4/sh4eb bootstraps won't fire off until Sun/Mon if I remember the 
scheduling correctly.


jeff

Re: [PATCH 5/8] libstdc++: Directly implement ranges::stable_partition [PR100795]

2025-06-27 Thread Patrick Palka

On Fri, 27 Jun 2025, Jonathan Wakely wrote:

> On 26/06/25 22:25 -0400, Patrick Palka wrote:
> > PR libstdc++/100795
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/bits/ranges_algo.h (__detail::__find_if_not_n): New,
> > based on the stl_algo.h implementation.
> > (__detail::__stable_partition_adaptive): Likewise.
> > (__stable_partition_fn::operator()): Reimplement in terms of
> > the above.
> > * testsuite/25_algorithms/stable_partition/constrained.cc
> > (test03): New test.
> > ---
> > libstdc++-v3/include/bits/ranges_algo.h   | 106 +-
> > .../stable_partition/constrained.cc   |  26 +
> > 2 files changed, 127 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index 7dfd4e7ed64c..a9924cd9c49e 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -3133,6 +3133,81 @@ namespace ranges
> >   inline constexpr __partition_fn partition{};
> > 
> > #if _GLIBCXX_HOSTED
> > +  namespace __detail
> > +  {
> > +/// Like find_if_not(), but uses and updates a count of the
> > +/// remaining range length instead of comparing against an end
> > +/// iterator.
> > +template
> > +  constexpr _Iter
> > +  __find_if_not_n(_Iter __first, _Distance& __len, _Pred __pred)
> > +  {
> > +   for (; __len; --__len,  (void) ++__first)
> > + if (!__pred(*__first))
> > +   break;
> > +   return __first;
> > +  }
> > +
> > +template > +typename _Pred, typename _Distance>
> > +  _GLIBCXX26_CONSTEXPR
> > +  subrange<_Iter>
> > +  __stable_partition_adaptive(_Iter __first, _Sent __last,
> > + _Pred __pred, _Distance __len,
> > + _Pointer __buffer,
> > + _Distance __buffer_size)
> > +  {
> > +   if (__len == 1)
> > + return {__first, ranges::next(__first, 1)};
> > +
> > +   if (__len <= __buffer_size)
> > + {
> > +   _Iter __result1 = __first;
> > +   _Pointer __result2 = __buffer;
> > +
> > +   // The precondition guarantees that !__pred(__first), so
> > +   // move that element to the buffer before starting the loop.
> > +   // This ensures that we only call __pred once per element.
> > +   *__result2 = ranges::iter_move(__first);
> > +   ++__result2;
> > +   ++__first;
> > +   for (; __first != __last; ++__first)
> > + if (__pred(*__first))
> > +   {
> > + *__result1 = ranges::iter_move(__first);
> > + ++__result1;
> > +   }
> > + else
> > +   {
> > + *__result2 = ranges::iter_move(__first);
> > + ++__result2;
> > +   }
> > +
> > +   ranges::move(__buffer, __result2, __result1);
> > +   return {__result1, __first};
> > + }
> > +
> > +   _Iter __middle = __first;
> > +   ranges::advance(__middle, __len / 2);
> > +   _Iter __left_split
> > + = __detail::__stable_partition_adaptive(__first, __middle, __pred,
> > + __len / 2, __buffer,
> > + __buffer_size).begin();
> > +
> > +   // Advance past true-predicate values to satisfy this
> > +   // function's preconditions.
> > +   _Distance __right_len = __len - __len / 2;
> > +   _Iter __right_split = __detail::__find_if_not_n(__middle, __right_len,
> > __pred);
> > +
> > +   if (__right_len)
> > + __right_split
> > +   = __detail::__stable_partition_adaptive(__right_split, __last,
> > __pred,
> > +   __right_len, __buffer,
> > __buffer_size).begin();
> > +
> > +   return ranges::rotate(__left_split, __middle, __right_split);
> > +  }
> > +  } // namespace __detail
> > +
> >   struct __stable_partition_fn
> >   {
> > template _Sent,
> > @@ -3144,11 +3219,32 @@ namespace ranges
> >   operator()(_Iter __first, _Sent __last,
> >  _Pred __pred, _Proj __proj = {}) const
> >   {
> > -   auto __lasti = ranges::next(__first, __last);
> > -   auto __middle
> > - = std::stable_partition(std::move(__first), __lasti,
> > - __detail::__make_pred_proj(__pred, __proj));
> > -   return {std::move(__middle), std::move(__lasti)};
> > +   auto __pred_proj = __detail::__make_pred_proj(__pred, __proj);
> > +   __first = ranges::find_if_not(__first, __last, __pred_proj);
> 
> Does this end up going through another layer of
> invoke(pred, invoke(proj, *i)) inside ranges::find_if_not?
> Hopeuflly with the recent _Pred_proj changes that will get inlined,
> but is there any reason to not just use:
> 
>   __first = ranges::find_if_not(__first, __last, __pred, __proj);
> 
> here, and then use __pred_proj for the __stable_partition_adaptive
> calls below?

Good catch, that works nicely here (and I think t

[patch,wwwdocs,applied] AVR uses LRA per default

2025-06-27 Thread Georg-Johann Lay


Adjusted backends.html for avr.
Applied as obvious.

Johann

--

commit 3ba9c3647d9dee45c5afe90ccc21c3b5753ca8aa

backends.html (avr): Uses LRA per default.

https:/gcc.gnu.org/r16-1733 enabled LRA per default on AVR.

diff --git a/htdocs/backends.html b/htdocs/backends.html
index 2a63f91b..52f897b3 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -72,7 +72,7 @@ aarch64| Qq  b  gia  s
 alpha  |  ?  Q   Cqmgi  e
 arc|  B  b  gia
 arm| b   ia  s
-avr|L  FIl  p   g
+avr|L  FIl  p   g a
 bfin   |   Fgi
 c6x|   S CB gi
 cr16   |L  F C  gs

Re: [PATCH 7/8] libstdc++: Directly implement ranges::sample [PR100795]

2025-06-27 Thread Jonathan Wakely


On 26/06/25 22:25 -0400, Patrick Palka wrote:

PR libstdc++/100795


OK for trunk (the FIXME can stay for now and be dealt with later).



libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__sample_fn::operator()):
Reimplement the forward_iterator branch directly.
* testsuite/25_algorithms/sample/constrained.cc (test02):
New test.
---
libstdc++-v3/include/bits/ranges_algo.h   | 70 +--
.../25_algorithms/sample/constrained.cc   | 28 
2 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index b12da2af1263..672a0ebce0de 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1839,14 +1839,70 @@ namespace ranges
  operator()(_Iter __first, _Sent __last, _Out __out,
 iter_difference_t<_Iter> __n, _Gen&& __g) const
  {
+   // FIXME: Correctly handle integer-class difference types.
if constexpr (forward_iterator<_Iter>)
  {
-   // FIXME: Forwarding to std::sample here requires computing __lasti
-   // which may take linear time.
-   auto __lasti = ranges::next(__first, __last);
-   return _GLIBCXX_STD_A::
- sample(std::move(__first), std::move(__lasti), std::move(__out),
-__n, std::forward<_Gen>(__g));
+   using _Size = iter_difference_t<_Iter>;
+   using __distrib_type = uniform_int_distribution<_Size>;
+   using __param_type = typename __distrib_type::param_type;
+   using _USize = __detail::__make_unsigned_like_t<_Size>;
+   using __uc_type
+ = common_type_t::result_type, 
_USize>;
+
+   if (__first == __last)
+ return __out;
+
+   __distrib_type __d{};
+   _Size __unsampled_sz = ranges::distance(__first, __last);
+   __n = std::min(__n, __unsampled_sz);
+
+   // If possible, we use __gen_two_uniform_ints to efficiently produce
+   // two random numbers using a single distribution invocation:
+
+   const __uc_type __urngrange = __g.max() - __g.min();
+   if (__urngrange / __uc_type(__unsampled_sz) >= 
__uc_type(__unsampled_sz))
+ // I.e. (__urngrange >= __unsampled_sz * __unsampled_sz) but 
without
+ // wrapping issues.
+ {
+   while (__n != 0 && __unsampled_sz >= 2)
+ {
+   const pair<_Size, _Size> __p =
+ __gen_two_uniform_ints(__unsampled_sz, __unsampled_sz - 
1, __g);
+
+   --__unsampled_sz;
+   if (__p.first < __n)
+ {
+   *__out = *__first;
+   ++__out;
+   --__n;
+ }
+
+   ++__first;
+
+   if (__n == 0) break;
+
+   --__unsampled_sz;
+   if (__p.second < __n)
+ {
+   *__out = *__first;
+   ++__out;
+   --__n;
+ }
+
+   ++__first;
+ }
+ }
+
+   // The loop above is otherwise equivalent to this one-at-a-time 
version:
+
+   for (; __n != 0; ++__first)
+ if (__d(__g, __param_type{0, --__unsampled_sz}) < __n)
+   {
+ *__out = *__first;
+ ++__out;
+ --__n;
+   }
+   return __out;
  }
else
  {
@@ -1867,7 +1923,7 @@ namespace ranges
if (__k < __n)
  __out[__k] = *__first;
  }
-   return __out + __sample_sz;
+   return __out + iter_difference_t<_Out>(__sample_sz);
  }
  }

diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
index b9945b164903..150e2d2036e0 100644
--- a/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
@@ -20,6 +20,7 @@

#include 
#include 
+#include 
#include 
#include 

@@ -59,9 +60,36 @@ test01()
}
}

+void
+test02()
+{
+  // PR libstdc++/100795 - ranges::sample should not use std::sample
+#if 0 // FIXME: ranges::sample rejects integer-class difference types.
+#if __SIZEOF_INT128__
+  auto v = std::views::iota(__int128(0), __int128(20));
+#else
+  auto v = std::views::iota(0ll, 20ll);
+#endif
+#else
+  auto v = std::views::iota(0, 20);
+#endif
+
+  int storage[20] = {2,5,4,3,1,6,7,9,10,8,11,14,12,13,15,16,18,0,19,17};
+  auto w = v | std::views::transform([&](auto i) -> int& { return storage[i]; 
});
+  using type = decltype(w);
+  using cat = 
std::iterator_traits>::iterator_category;
+  static_assert( std::same_as );
+  static_assert( std::ranges::random_ac

[PATCH v3 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li

From: Pan Li 

This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)  \
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = FUNC (in[i], x);   \
  }

  T sat_sub(T a, T b)
  {
return (a - b) & (-(T)(a >= b));
  }

  DEF_VX_BINARY(uint32_t, sat_sub)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vssubu.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vssubu.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup):
* config/riscv/riscv.cc (riscv_rtx_costs):
* config/riscv/vector-iterators.md:

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 1 +
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 45dd9256d02..76fb1c36357 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5581,6 +5581,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 case SMIN:
 case UMIN:
 case US_PLUS:
+case US_MINUS:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bbc7547d385..f5d2b2e74ae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3996,6 +3996,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
case MOD:
case UMOD:
case US_PLUS:
+   case US_MINUS:
  *total = get_vector_binary_rtx_cost (op, scalar2vr_cost);
  break;
default:
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 0e1318d1447..782544423c4 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,7 +4042,7 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_v_vdup [
-  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus
+  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus 
us_minus
 ])
 
 (define_code_iterator any_int_binop_no_shift_vdup_v [
-- 
2.43.0

Re: [PATCH 4/8] libstdc++: Directly implement ranges::stable_sort [PR100795]

2025-06-27 Thread Patrick Palka

On Fri, 27 Jun 2025, Jonathan Wakely wrote:

> On 26/06/25 22:25 -0400, Patrick Palka wrote:
> > PR libstdc++/100795
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/bits/ranges_algo.h (__detail::__move_merge): New,
> > based on the stl_algo.h implementation.
> > (__detail::__merge_sort_loop): Likewise.
> > (__detail::__chunk_insertion_sort): Likewise.
> > (__detail::__merge_sort_with_buffer): Likewise.
> > (__detail::__stable_sort_adaptive): Likewise.
> > (__detail::__stable_sort_adaptive_resize): Likewise.
> > (__detail::__inplace_stable_sort): Likewise.
> > (__stable_sort_fn::operator()): Reimplement in terms of the above.
> > * testsuite/25_algorithms/stable_sort/constrained.cc:
> > ---
> > libstdc++-v3/include/bits/ranges_algo.h   | 207 +-
> > .../25_algorithms/stable_sort/constrained.cc  |  30 +++
> > 2 files changed, 233 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index b0357600adbc..7dfd4e7ed64c 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -2388,6 +2388,170 @@ namespace ranges
> > 
> >   inline constexpr __sort_fn sort{};
> > 
> > +  namespace __detail
> > +  {
> > +/// This is a helper function for the __merge_sort_loop routines.
> > +template
> > +  _Out
> > +  __move_merge(_Iter __first1, _Iter __last1,
> > +  _Iter __first2, _Iter __last2,
> > +  _Out __result, _Comp __comp)
> > +  {
> > +   while (__first1 != __last1 && __first2 != __last2)
> > + {
> > +   if (__comp(*__first2, *__first1))
> > + {
> > +   *__result = ranges::iter_move(__first2);
> > +   ++__first2;
> > + }
> > +   else
> > + {
> > +   *__result = ranges::iter_move(__first1);
> > +   ++__first1;
> > + }
> > +   ++__result;
> > + }
> > +   return ranges::move(__first2, __last2,
> > +   ranges::move(__first1, __last1,
> > __result).out).out;
> > +  }
> > +
> > +template > _Comp>
> > +  void
> > +  __merge_sort_loop(_Iter __first, _Iter __last, _Out __result,
> > +   _Distance __step_size, _Comp __comp)
> > +  {
> > +   const _Distance __two_step = 2 * __step_size;
> > +
> > +   while (__last - __first >= __two_step)
> > + {
> > +   __result = __detail::__move_merge(__first, __first + __step_size,
> > + __first + __step_size,
> > + __first + __two_step,
> > + __result, __comp);
> > +   __first += __two_step;
> > + }
> > +   __step_size = ranges::min(_Distance(__last - __first), __step_size);
> > +
> > +   __detail::__move_merge(__first, __first + __step_size,
> > +  __first + __step_size, __last, __result,
> > __comp);
> > +  }
> > +
> > +template
> > +  constexpr void
> > +  __chunk_insertion_sort(_Iter __first, _Iter __last,
> > +_Distance __chunk_size, _Compare __comp)
> > +  {
> > +   while (__last - __first >= __chunk_size)
> > + {
> > +   __detail::__insertion_sort(__first, __first + __chunk_size,
> > __comp);
> > +   __first += __chunk_size;
> > + }
> > +   __detail::__insertion_sort(__first, __last, __comp);
> > +  }
> > +
> > +template
> > +  void
> > +  __merge_sort_with_buffer(_Iter __first, _Iter __last,
> > +  _Pointer __buffer, _Comp __comp)
> > +  {
> > +   using _Distance = iter_difference_t<_Iter>;
> > +
> > +   const _Distance __len = __last - __first;
> > +   const _Pointer __buffer_last = __buffer + ptrdiff_t(__len);
> > +
> > +   constexpr int __chunk_size = 7;
> > +   _Distance __step_size = __chunk_size;
> > +   __detail::__chunk_insertion_sort(__first, __last, __step_size,
> > __comp);
> > +
> > +   while (__step_size < __len)
> > + {
> > +   __detail::__merge_sort_loop(__first, __last, __buffer,
> > +   __step_size, __comp);
> > +   __step_size *= 2;
> > +   __detail::__merge_sort_loop(__buffer, __buffer_last, __first,
> > +   ptrdiff_t(__step_size), __comp);
> > +   __step_size *= 2;
> > + }
> > +  }
> > +
> > +template
> > +  void
> > +  __merge_adaptive(_Iter __first, _Iter __middle, _Iter __last,
> > +  iter_difference_t<_Iter> __len1,
> > +  iter_difference_t<_Iter> __len2,
> > +  _Pointer __buffer, _Comp __comp); // defined near
> > inplace_merge
> > +
> > +template > typename _Comp>
> > +  void
> > +  __merge_adaptive_resize(_Iter __first, _Iter __middle, _Iter __last,
> > + _Distance __len1, _Distance __len2,
> > + _Point

Re: [PATCH 4/8] libstdc++: Directly implement ranges::stable_sort [PR100795]

2025-06-27 Thread Jonathan Wakely

On Fri, 27 Jun 2025 at 15:15, Patrick Palka  wrote:
>
> On Fri, 27 Jun 2025, Jonathan Wakely wrote:
>
> > On 26/06/25 22:25 -0400, Patrick Palka wrote:
> > > PR libstdc++/100795
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/ranges_algo.h (__detail::__move_merge): New,
> > > based on the stl_algo.h implementation.
> > > (__detail::__merge_sort_loop): Likewise.
> > > (__detail::__chunk_insertion_sort): Likewise.
> > > (__detail::__merge_sort_with_buffer): Likewise.
> > > (__detail::__stable_sort_adaptive): Likewise.
> > > (__detail::__stable_sort_adaptive_resize): Likewise.
> > > (__detail::__inplace_stable_sort): Likewise.
> > > (__stable_sort_fn::operator()): Reimplement in terms of the above.
> > > * testsuite/25_algorithms/stable_sort/constrained.cc:
> > > ---
> > > libstdc++-v3/include/bits/ranges_algo.h   | 207 +-
> > > .../25_algorithms/stable_sort/constrained.cc  |  30 +++
> > > 2 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > index b0357600adbc..7dfd4e7ed64c 100644
> > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > @@ -2388,6 +2388,170 @@ namespace ranges
> > >
> > >   inline constexpr __sort_fn sort{};
> > >
> > > +  namespace __detail
> > > +  {
> > > +/// This is a helper function for the __merge_sort_loop routines.
> > > +template
> > > +  _Out
> > > +  __move_merge(_Iter __first1, _Iter __last1,
> > > +  _Iter __first2, _Iter __last2,
> > > +  _Out __result, _Comp __comp)
> > > +  {
> > > +   while (__first1 != __last1 && __first2 != __last2)
> > > + {
> > > +   if (__comp(*__first2, *__first1))
> > > + {
> > > +   *__result = ranges::iter_move(__first2);
> > > +   ++__first2;
> > > + }
> > > +   else
> > > + {
> > > +   *__result = ranges::iter_move(__first1);
> > > +   ++__first1;
> > > + }
> > > +   ++__result;
> > > + }
> > > +   return ranges::move(__first2, __last2,
> > > +   ranges::move(__first1, __last1,
> > > __result).out).out;
> > > +  }
> > > +
> > > +template > > _Comp>
> > > +  void
> > > +  __merge_sort_loop(_Iter __first, _Iter __last, _Out __result,
> > > +   _Distance __step_size, _Comp __comp)
> > > +  {
> > > +   const _Distance __two_step = 2 * __step_size;
> > > +
> > > +   while (__last - __first >= __two_step)
> > > + {
> > > +   __result = __detail::__move_merge(__first, __first + __step_size,
> > > + __first + __step_size,
> > > + __first + __two_step,
> > > + __result, __comp);
> > > +   __first += __two_step;
> > > + }
> > > +   __step_size = ranges::min(_Distance(__last - __first), __step_size);
> > > +
> > > +   __detail::__move_merge(__first, __first + __step_size,
> > > +  __first + __step_size, __last, __result,
> > > __comp);
> > > +  }
> > > +
> > > +template
> > > +  constexpr void
> > > +  __chunk_insertion_sort(_Iter __first, _Iter __last,
> > > +_Distance __chunk_size, _Compare __comp)
> > > +  {
> > > +   while (__last - __first >= __chunk_size)
> > > + {
> > > +   __detail::__insertion_sort(__first, __first + __chunk_size,
> > > __comp);
> > > +   __first += __chunk_size;
> > > + }
> > > +   __detail::__insertion_sort(__first, __last, __comp);
> > > +  }
> > > +
> > > +template
> > > +  void
> > > +  __merge_sort_with_buffer(_Iter __first, _Iter __last,
> > > +  _Pointer __buffer, _Comp __comp)
> > > +  {
> > > +   using _Distance = iter_difference_t<_Iter>;
> > > +
> > > +   const _Distance __len = __last - __first;
> > > +   const _Pointer __buffer_last = __buffer + ptrdiff_t(__len);
> > > +
> > > +   constexpr int __chunk_size = 7;
> > > +   _Distance __step_size = __chunk_size;
> > > +   __detail::__chunk_insertion_sort(__first, __last, __step_size,
> > > __comp);
> > > +
> > > +   while (__step_size < __len)
> > > + {
> > > +   __detail::__merge_sort_loop(__first, __last, __buffer,
> > > +   __step_size, __comp);
> > > +   __step_size *= 2;
> > > +   __detail::__merge_sort_loop(__buffer, __buffer_last, __first,
> > > +   ptrdiff_t(__step_size), __comp);
> > > +   __step_size *= 2;
> > > + }
> > > +  }
> > > +
> > > +template
> > > +  void
> > > +  __merge_adaptive(_Iter __first, _Iter __middle, _Iter __last,
> > > +  iter_difference_t<_Iter> __len1,
> > > +  iter_difference_t<_Iter> __len2,
> > > +  _Poi

[PATCH v3 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li

From: Pan Li 

Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  18 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u8.c |  17 ++
 18 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
index 21a207edce7..b064748fc14 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
index d1063adb0d6..e334bb3690b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
index 3d96503fd9a..3e8ca0570cd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
index 339a35c3f42..1f995cd8dc1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /

Re: [PATCH 7/8] libstdc++: Directly implement ranges::sample [PR100795]

2025-06-27 Thread Jonathan Wakely


On 27/06/25 14:53 +0100, Jonathan Wakely wrote:

On 26/06/25 23:12 -0400, Patrick Palka wrote:

On Thu, 26 Jun 2025, Patrick Palka wrote:


PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__sample_fn::operator()):
Reimplement the forward_iterator branch directly.
* testsuite/25_algorithms/sample/constrained.cc (test02):
New test.
---
libstdc++-v3/include/bits/ranges_algo.h   | 70 +--
.../25_algorithms/sample/constrained.cc   | 28 
2 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index b12da2af1263..672a0ebce0de 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1839,14 +1839,70 @@ namespace ranges
  operator()(_Iter __first, _Sent __last, _Out __out,
 iter_difference_t<_Iter> __n, _Gen&& __g) const
  {
+   // FIXME: Correctly handle integer-class difference types.


On second thought maybe we don't need to teach uniform_int_distribution
to handle integer-class difference types.  We could just assert that
__n fits inside a long long and use that as the difference type?  Same
for shuffle.


Yeah, if we're being asked to take more than 1<<64 samples something
probably went very wrong somewhere.

But isn't it valid to pass in an enormous value of n, as long as
last - first is not ridiculous?

for example:

auto population = views::iota((__int128)0, (__int128)10);
using D = ranges::difference_t;
ranges::sample(population, out, numeric_limits::max(), gen);

This n won't fit in long long, but min(last - first, n) will.


Does std::uniform_int_distribution currently support __int128? I think
it does, just using the slower "two divisions" path, because we don't
have a larger type to use for Lemire's algorithm.


if constexpr (forward_iterator<_Iter>)
  {
-   // FIXME: Forwarding to std::sample here requires computing __lasti
-   // which may take linear time.
-   auto __lasti = ranges::next(__first, __last);
-   return _GLIBCXX_STD_A::
- sample(std::move(__first), std::move(__lasti), std::move(__out),
-__n, std::forward<_Gen>(__g));
+   using _Size = iter_difference_t<_Iter>;
+   using __distrib_type = uniform_int_distribution<_Size>;
+   using __param_type = typename __distrib_type::param_type;
+   using _USize = __detail::__make_unsigned_like_t<_Size>;
+   using __uc_type
+ = common_type_t::result_type, 
_USize>;
+
+   if (__first == __last)
+ return __out;
+
+   __distrib_type __d{};
+   _Size __unsampled_sz = ranges::distance(__first, __last);
+   __n = std::min(__n, __unsampled_sz);
+
+   // If possible, we use __gen_two_uniform_ints to efficiently produce
+   // two random numbers using a single distribution invocation:
+
+   const __uc_type __urngrange = __g.max() - __g.min();
+   if (__urngrange / __uc_type(__unsampled_sz) >= 
__uc_type(__unsampled_sz))
+ // I.e. (__urngrange >= __unsampled_sz * __unsampled_sz) but 
without
+ // wrapping issues.
+ {
+   while (__n != 0 && __unsampled_sz >= 2)
+ {
+   const pair<_Size, _Size> __p =
+ __gen_two_uniform_ints(__unsampled_sz, __unsampled_sz - 
1, __g);
+
+   --__unsampled_sz;
+   if (__p.first < __n)
+ {
+   *__out = *__first;
+   ++__out;
+   --__n;
+ }
+
+   ++__first;
+
+   if (__n == 0) break;
+
+   --__unsampled_sz;
+   if (__p.second < __n)
+ {
+   *__out = *__first;
+   ++__out;
+   --__n;
+ }
+
+   ++__first;
+ }
+ }
+
+   // The loop above is otherwise equivalent to this one-at-a-time 
version:
+
+   for (; __n != 0; ++__first)
+ if (__d(__g, __param_type{0, --__unsampled_sz}) < __n)
+   {
+ *__out = *__first;
+ ++__out;
+ --__n;
+   }
+   return __out;
  }
else
  {
@@ -1867,7 +1923,7 @@ namespace ranges
if (__k < __n)
  __out[__k] = *__first;
  }
-   return __out + __sample_sz;
+   return __out + iter_difference_t<_Out>(__sample_sz);
  }
  }

diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
index b9945b164903..150e2d2036e0 100644
--- a/libstdc++-v3/testsuite/2

RE: [PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread Li, Pan2

It is better to leave a record in CI system, I will send v3 to trigger another 
CI and see, will commit it if CI is OK.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 27, 2025 9:53 PM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to 
vssubu.vx on GR2VR cost

> Is there anyway we can retrigger the test somewhere ? If no I can send a v3 
> series with the commit reordered and see.

I don't think there's a way other than re-submitting.   But if you're sure you 
tested properly and the CI is mistaken we can go ahead.  I just wanted to make 
sure as with the sub/add confusion in v1 your test seemed to show no errors.

-- 
Regards
 Robin

[PATCH v3 2/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li

From: Pan Li 

The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c:
Diito.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c:
Ditto.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c   | 2 +-
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
index 2261872e3de..b32907afcbb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
index 4250567686a..344080cb93a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
index 656aad70165..492c3168216 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
-- 
2.43.0

Re: [PATCH 7/8] libstdc++: Directly implement ranges::sample [PR100795]

2025-06-27 Thread Patrick Palka

On Fri, 27 Jun 2025, Jonathan Wakely wrote:

> On 27/06/25 14:53 +0100, Jonathan Wakely wrote:
> > On 26/06/25 23:12 -0400, Patrick Palka wrote:
> > > On Thu, 26 Jun 2025, Patrick Palka wrote:
> > > 
> > > > PR libstdc++/100795
> > > > 
> > > > libstdc++-v3/ChangeLog:
> > > > 
> > > > * include/bits/ranges_algo.h (__sample_fn::operator()):
> > > > Reimplement the forward_iterator branch directly.
> > > > * testsuite/25_algorithms/sample/constrained.cc (test02):
> > > > New test.
> > > > ---
> > > > libstdc++-v3/include/bits/ranges_algo.h   | 70 +--
> > > > .../25_algorithms/sample/constrained.cc   | 28 
> > > > 2 files changed, 91 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > > index b12da2af1263..672a0ebce0de 100644
> > > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > > @@ -1839,14 +1839,70 @@ namespace ranges
> > > >   operator()(_Iter __first, _Sent __last, _Out __out,
> > > >  iter_difference_t<_Iter> __n, _Gen&& __g) const
> > > >   {
> > > > +   // FIXME: Correctly handle integer-class difference types.
> > > 
> > > On second thought maybe we don't need to teach uniform_int_distribution
> > > to handle integer-class difference types.  We could just assert that
> > > __n fits inside a long long and use that as the difference type?  Same
> > > for shuffle.
> > 
> > Yeah, if we're being asked to take more than 1<<64 samples something
> > probably went very wrong somewhere.
> > 
> > But isn't it valid to pass in an enormous value of n, as long as
> > last - first is not ridiculous?
> > 
> > for example:
> > 
> > auto population = views::iota((__int128)0, (__int128)10);
> > using D = ranges::difference_t;
> > ranges::sample(population, out, numeric_limits::max(), gen);
> > 
> > This n won't fit in long long, but min(last - first, n) will.

Good point, noted.

> 
> Does std::uniform_int_distribution currently support __int128? I think
> it does, just using the slower "two divisions" path, because we don't
> have a larger type to use for Lemire's algorithm.

Ah, looks like uniform_int_distribution does support __int128 already, but
only in non-strict mode.  In strict mode we trip over the is_integral
static_assert (since __int128 isn't an integral type in strict mode), so
that assert needs to be relaxed.

> 
> > > > if constexpr (forward_iterator<_Iter>)
> > > >   {
> > > > -   // FIXME: Forwarding to std::sample here requires computing
> > > > __lasti
> > > > -   // which may take linear time.
> > > > -   auto __lasti = ranges::next(__first, __last);
> > > > -   return _GLIBCXX_STD_A::
> > > > - sample(std::move(__first), std::move(__lasti), 
> > > > std::move(__out),
> > > > -__n, std::forward<_Gen>(__g));
> > > > +   using _Size = iter_difference_t<_Iter>;
> > > > +   using __distrib_type = uniform_int_distribution<_Size>;
> > > > +   using __param_type = typename __distrib_type::param_type;
> > > > +   using _USize = __detail::__make_unsigned_like_t<_Size>;
> > > > +   using __uc_type
> > > > + = common_type_t > > > remove_reference_t<_Gen>::result_type,
> > > > _USize>;
> > > > +
> > > > +   if (__first == __last)
> > > > + return __out;
> > > > +
> > > > +   __distrib_type __d{};
> > > > +   _Size __unsampled_sz = ranges::distance(__first, __last);
> > > > +   __n = std::min(__n, __unsampled_sz);
> > > > +
> > > > +   // If possible, we use __gen_two_uniform_ints to efficiently
> > > > produce
> > > > +   // two random numbers using a single distribution 
> > > > invocation:
> > > > +
> > > > +   const __uc_type __urngrange = __g.max() - __g.min();
> > > > +   if (__urngrange / __uc_type(__unsampled_sz) >=
> > > > __uc_type(__unsampled_sz))
> > > > + // I.e. (__urngrange >= __unsampled_sz * __unsampled_sz) 
> > > > but
> > > > without
> > > > + // wrapping issues.
> > > > + {
> > > > +   while (__n != 0 && __unsampled_sz >= 2)
> > > > + {
> > > > +   const pair<_Size, _Size> __p =
> > > > + __gen_two_uniform_ints(__unsampled_sz, 
> > > > __unsampled_sz -
> > > > 1, __g);
> > > > +
> > > > +   --__unsampled_sz;
> > > > +   if (__p.first < __n)
> > > > + {
> > > > +   *__out = *__first;
> > > > +   ++__out;
> > > > +   --__n;
> > > > + }
> > > > +
> > > > +   ++__first;
> > > > +
> > > > +   if (__n == 0) break;
> > > > +
> > > > +   --__unsampled_sz;
> > > >

[PATCH v3 4/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 12 files changed, 24 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
index de10d66a1b2..afb5a8513a9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X8)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X8)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
 /* { dg-final { scan-assembler {vsaddu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
index 2e59da06c97..a907e9b7222 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X4)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X4)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -29,3 +30,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vremu.vx} } } */
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
index 064ed1f2e89..efabf9930f0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_W

Re: [PATCH v3 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Tomasz Kaminski

On Fri, Jun 27, 2025 at 9:52 AM Luc Grosheintz 
wrote:

> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (default_accessor): New class.
> * src/c++23/std.cc.in: Register default_accessor.
> * testsuite/23_containers/mdspan/accessors/default.cc: New test.
> * testsuite/23_containers/mdspan/accessors/default_neg.cc: New
> test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan   | 31 ++
>  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
>  .../23_containers/mdspan/accessors/default.cc | 59 +++
>  .../mdspan/accessors/default_neg.cc   | 23 
>  4 files changed, 115 insertions(+), 1 deletion(-)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 6dc2441f80b..c72a64094b7 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>[[no_unique_address]] _S_strides_t _M_strides;
>  };
>
> +  template
> +struct default_accessor
> +{
> +  static_assert(!is_array_v<_ElementType>,
> +   "ElementType must not be an array type");
> +  static_assert(!is_abstract_v<_ElementType>,
> +   "ElementType must not be an abstract class type");
> +
> +  using offset_policy = default_accessor;
> +  using element_type = _ElementType;
> +  using reference = element_type&;
> +  using data_handle_type = element_type*;
> +
> +  constexpr
> +  default_accessor() noexcept = default;
> +
> +  template
> +   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
> +   constexpr
> +   default_accessor(default_accessor<_OElementType>) noexcept
> +   { }
>
I would a test checking the constraint on this constructor, this
essentially means
that default_accessor cannot be converted to default_accessor
(pointer arithmetic gives different behavior), but default_accessor can
be converted
to default_accessor. Simple check on is_convertible should suffice.

> +
> +  constexpr reference
> +  access(data_handle_type __p, size_t __i) const noexcept
> +  { return __p[__i]; }
> +
> +  constexpr data_handle_type
> +  offset(data_handle_type __p, size_t __i) const noexcept
> +  { return __p + __i; }
> +};
> +
>  _GLIBCXX_END_NAMESPACE_VERSION
>  }
>  #endif
> diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
> std.cc.in
> index 9336118f5d9..e692caaa5f9 100644
> --- a/libstdc++-v3/src/c++23/std.cc.in
> +++ b/libstdc++-v3/src/c++23/std.cc.in
> @@ -1850,7 +1850,8 @@ export namespace std
>using std::layout_left;
>using std::layout_right;
>using std::layout_stride;
> -  // FIXME layout_left_padded, layout_right_padded, default_accessor and
> mdspan
> +  using std::default_accessor;
> +  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and
> mdspan
>  }
>  #endif
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> new file mode 100644
> index 000..303833d4857
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> @@ -0,0 +1,59 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +
> +template
> +  constexpr void
> +  test_accessor_policy()
> +  {
> +static_assert(std::copyable);
> +static_assert(std::is_nothrow_move_constructible_v);
> +static_assert(std::is_nothrow_move_assignable_v);
> +static_assert(std::is_nothrow_swappable_v);
> +  }
> +
> +constexpr bool
> +test_access()
> +{
> +  std::default_accessor accessor;
> +  std::array a{10, 11, 12, 13, 14};
> +  VERIFY(accessor.access(a.data(), 0) == 10);
> +  VERIFY(accessor.access(a.data(), 4) == 14);
> +  return true;
> +}
> +
> +constexpr bool
> +test_offset()
> +{
> +  std::default_accessor accessor;
> +  std::array a{10, 11, 12, 13, 14};
> +  VERIFY(accessor.offset(a.data(), 0) == a.data());
> +  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
> +  return true;
> +}
> +
> +constexpr void
> +test_ctor()
> +{
> +
> static_assert(std::is_nothrow_constructible_v,
> +
>  std::default_accessor>);
> +  static_assert(std::is_convertible_v,
> + std::default_accessor>);
> +  static_assert(!std::is_constructible_v,
> +std::default_accessor>);
> +}
> +
> +int
> +main()
> +{
> +  test_accessor_policy>();
> +  test_access();
> +  static_assert(test_access());
> +  test_offset();
> +  static_assert(test_offset());
> +  test_ctor();
> +  return 0;
> +}
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_n

[PATCH 7/7 v2] RISC-V: Add support for the XAndesvdot ISA extension.

2025-06-27 Thread KuanLin Chen

Hi,

This extension defines vector instructions to calculae of the
signed/unsigned
dot product of four SEW/4-bit data and accumulate the result into a SEWbit
element for all elements in a vector register.

gcc/ChangeLog:

* config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New
class.
(class nds_vd4dotsu): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vd4dots):
Ditto.
(nds_vd4dotsu): Ditto.
(nds_vd4dotu): Ditto.
* config/riscv/andes-vector.md
(@pred_nds_vd4dot): New pattern.
(@pred_nds_vd4dotsu): New pattern.
* config/riscv/genrvv-type-indexer.cc (main): Modify sew of
QUAD_FIX,
QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED.
* config/riscv/riscv-vector-builtins.cc
(qexti__ops): New operand information.
(qexti_su__ops): New operand information.
(qextu__ops): New operand information.
* config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def.
(required_ext_to_isa_name): Add case XANDESVDOT_EXT.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator.

gcc/testsuite/ChangeLog:

*
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c:
New test.
*
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c:
New test.
*
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c:
New test.
*
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: New test.
*
gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: New
test.
*
gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: New test.


0007-RISC-V-Add-support-for-the-XAndesvdot-ISA-extension.patch
Description: Binary data

[PATCH v2 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li

From: Pan Li 

The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c:
Diito.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c:
Ditto.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c   | 2 +-
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
index 2261872e3de..b32907afcbb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
index 4250567686a..344080cb93a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
index 656aad70165..492c3168216 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
-- 
2.43.0

Re: [PATCH v4 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz





On 6/27/25 11:44, Jonathan Wakely wrote:

On Fri, 27 Jun 2025 at 10:39, Tomasz Kaminski  wrote:


Also, for single patch (not-patch series), you do not need to have [PATCH 0/N], 
simple [PATCH] and then [PATCH v2] also works.


Yeah, sending a 0/N cover letter is only useful to describe what a
multi-part patch series does.  For a single patch, you should be
describing what it does in that patch itself, and a cover letter just
adds noise.



Makes sense & I've found the place I can add comments, so there's
no need for the cover-letter anymore.





On Fri, Jun 27, 2025 at 11:11 AM Tomasz Kaminski  wrote:




On Fri, Jun 27, 2025 at 11:06 AM Luc Grosheintz  
wrote:


libstdc++-v3/ChangeLog:

 * include/std/mdspan (default_accessor): New class.
 * src/c++23/std.cc.in: Register default_accessor.
 * testsuite/23_containers/mdspan/accessors/default.cc: New test.
 * testsuite/23_containers/mdspan/accessors/default_neg.cc: New test.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan   | 31 
  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
  .../23_containers/mdspan/accessors/default.cc | 72 +++
  .../mdspan/accessors/default_neg.cc   | 23 ++
  4 files changed, 128 insertions(+), 1 deletion(-)
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..c72a64094b7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _S_strides_t _M_strides;
  };

+  template
+struct default_accessor
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
  _GLIBCXX_END_NAMESPACE_VERSION
  }
  #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 9336118f5d9..e692caaa5f9 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1850,7 +1850,8 @@ export namespace std
using std::layout_left;
using std::layout_right;
using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and mdspan
  }
  #endif

diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
new file mode 100644
index 000..ecccda2b68e
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,72 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+class Base
+{ };
+
+class Derived : public Base
+{ };
+
+constexpr void
+test_ctor()
+{
+  static_assert(std::is_nothrow_constructible_v,
+   std::default_accessor>);


Hi, sorry for being unclear before, and resulting in another patch.
I would like to see a positive test case that cost-adjustment are allowed, i.e.:
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
And similar for Derived. This is important, as it allows passing mdspan

Re: [PATCH v5] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz


Sorry, I'll continue working another day. This commit is
broken; please ignore.

On 6/27/25 13:15, Luc Grosheintz wrote:

libstdc++-v3/ChangeLog:

* include/std/mdspan (default_accessor): New class.
* src/c++23/std.cc.in: Register default_accessor.
* testsuite/23_containers/mdspan/accessors/default.cc: New test.
* testsuite/23_containers/mdspan/accessors/default_neg.cc: New test.

Signed-off-by: Luc Grosheintz 
---

Changes since v4:

   * Test types with different cv-qualifiers.

  libstdc++-v3/include/std/mdspan   | 31 ++
  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
  .../23_containers/mdspan/accessors/default.cc | 99 +++
  .../mdspan/accessors/default_neg.cc   | 23 +
  4 files changed, 155 insertions(+), 1 deletion(-)
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..c72a64094b7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _S_strides_t _M_strides;
  };
  
+  template

+struct default_accessor
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
  _GLIBCXX_END_NAMESPACE_VERSION
  }
  #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 9336118f5d9..e692caaa5f9 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1850,7 +1850,8 @@ export namespace std
using std::layout_left;
using std::layout_right;
using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and mdspan
  }
  #endif
  
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc

new file mode 100644
index 000..62ac791f0aa
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,99 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+class Base
+{ };
+
+class Derived : public Base
+{ };
+
+constexpr void
+test_ctor()
+{
+  // T -> T
+  static_assert(std::is_nothrow_constructible_v,
+   std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+
+  // T -> const T
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+
+  // const T -> T
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+
+  // T <-> volatile T
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor

[PATCH v5] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz

libstdc++-v3/ChangeLog:

* include/std/mdspan (default_accessor): New class.
* src/c++23/std.cc.in: Register default_accessor.
* testsuite/23_containers/mdspan/accessors/default.cc: New test.
* testsuite/23_containers/mdspan/accessors/default_neg.cc: New test.

Signed-off-by: Luc Grosheintz 
---

Changes since v4:

  * Test types with different cv-qualifiers.

 libstdc++-v3/include/std/mdspan   | 31 ++
 libstdc++-v3/src/c++23/std.cc.in  |  3 +-
 .../23_containers/mdspan/accessors/default.cc | 99 +++
 .../mdspan/accessors/default_neg.cc   | 23 +
 4 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..c72a64094b7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[no_unique_address]] _S_strides_t _M_strides;
 };
 
+  template
+struct default_accessor
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
 _GLIBCXX_END_NAMESPACE_VERSION
 }
 #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 9336118f5d9..e692caaa5f9 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1850,7 +1850,8 @@ export namespace std
   using std::layout_left;
   using std::layout_right;
   using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and mdspan
 }
 #endif
 
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
new file mode 100644
index 000..62ac791f0aa
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,99 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+class Base
+{ };
+
+class Derived : public Base
+{ };
+
+constexpr void
+test_ctor()
+{
+  // T -> T
+  static_assert(std::is_nothrow_constructible_v,
+   std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+
+  // T -> const T
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+
+  // const T -> T
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+
+  // T <-> volatile T
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+
+  // size difference
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+
+  //

[PATCH] tree-optimization/120780: Support object size for containing objects

2025-06-27 Thread Siddhesh Poyarekar

MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access.  Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.

gcc/ChangeLog:

PR tree-optimization/120780
* tree-object-size.cc (inner_at_offset,
get_wholesize_for_memref): New functions.
(addr_object_size): Call GET_WHOLESIZE_FOR_MEMREF.

gcc/testsuite/ChangeLog:

PR tree-optimization/120780
* gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case.

Signed-off-by: Siddhesh Poyarekar 
---
 .../builtin-dynamic-object-size-pr120780.c| 233 ++
 gcc/tree-object-size.cc   |  87 ++-
 2 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
new file mode 100644
index 000..0d6593ec828
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -0,0 +1,233 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+typedef __SIZE_TYPE__ size_t;
+#define NUM_MCAST_RATE 6
+
+#define MIN(a,b) ((a) < (b) ? (a) : (b))
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+
+struct inner
+{
+  int dummy[4];
+};
+
+struct container
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct inner mesh;
+};
+
+static void
+test1_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container *sdata =
+(struct container *) ((void *) ifmsh
+ - __builtin_offsetof (struct container, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (&sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test1 (size_t sz)
+{
+  struct container *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->mesh;
+
+  test1_child (ifmsh,
+  (sz > sizeof (sdata->mcast_rate)
+   ? sz - sizeof (sdata->mcast_rate) : 0));
+
+  __builtin_free (sdata);
+}
+
+struct container2
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  union
+{
+  int dummy;
+  double dbl;
+  struct inner mesh;
+} u;
+};
+
+static void
+test2_child (struct inner *ifmsh, size_t sz)
+{ 
+  struct container2 *sdata =
+(struct container2 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container2, u.mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  size_t diff = sizeof (*sdata) - sz;
+  size_t expected = MIN(sizeof (double), MAX (sizeof (sdata->u), diff) - diff);
+
+  if (__builtin_dynamic_object_size (&sdata->u.dbl, 1) != expected)
+FAIL ();
+
+  expected = MAX (sizeof (sdata->u.mesh), diff) - diff;
+  if (__builtin_dynamic_object_size (&sdata->u.mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test2 (size_t sz)
+{
+  struct container2 *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->u.mesh;
+
+  test2_child (ifmsh, sz);;
+
+  __builtin_free (sdata);
+}
+
+struct container3
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  char mesh[8];
+};
+
+static void
+test3_child (char ifmsh[], size_t expected)
+{ 
+  struct container3 *sdata =
+(struct container3 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container3, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test3 (size_t sz)
+{
+  struct container3 *sdata = __builtin_malloc (sz);
+  char *ifmsh = sdata->mesh;
+  size_t diff = sizeof (*sdata) - sz;
+
+  test3_child (ifmsh, MAX(sizeof (sdata->mesh), diff) - diff);
+
+  __builtin_free (sdata);
+}
+
+
+struct container4
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct
+{
+  int dummy;
+  struct inner mesh;
+} s;
+};
+
+static void
+test4_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container4 *sdata =
+(struct container4 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container4, s.mesh));
+
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (&sdata->s.mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test4 (size_t sz)
+{
+  struct container4 *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->s.mesh;
+  size_t diff = sizeof (*sdata) - sz;
+
+  test4_child (ifmsh, MAX(sizeof (sdata->s.mesh), diff) - d

Re: [PATCH v3 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-27 Thread Luc Grosheintz





On 6/27/25 10:28, Tomasz Kaminski wrote:

On Fri, Jun 27, 2025 at 9:52 AM Luc Grosheintz 
wrote:


libstdc++-v3/ChangeLog:

 * include/std/mdspan (default_accessor): New class.
 * src/c++23/std.cc.in: Register default_accessor.
 * testsuite/23_containers/mdspan/accessors/default.cc: New test.
 * testsuite/23_containers/mdspan/accessors/default_neg.cc: New
test.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan   | 31 ++
  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
  .../23_containers/mdspan/accessors/default.cc | 59 +++
  .../mdspan/accessors/default_neg.cc   | 23 
  4 files changed, 115 insertions(+), 1 deletion(-)
  create mode 100644
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
  create mode 100644
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..c72a64094b7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _S_strides_t _M_strides;
  };

+  template
+struct default_accessor
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }


I would a test checking the constraint on this constructor, this
essentially means
that default_accessor cannot be converted to default_accessor
(pointer arithmetic gives different behavior), but default_accessor can
be converted
to default_accessor. Simple check on is_convertible should suffice.


Okay, I checked this by using char and int (guaranteed to have different
sizes and hence not pointer compatible). I'll add this variation too.




+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
  _GLIBCXX_END_NAMESPACE_VERSION
  }
  #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
std.cc.in
index 9336118f5d9..e692caaa5f9 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1850,7 +1850,8 @@ export namespace std
using std::layout_left;
using std::layout_right;
using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and
mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded, aligned_accessor and
mdspan
  }
  #endif

diff --git
a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
new file mode 100644
index 000..303833d4857
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,59 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+constexpr void
+test_ctor()
+{
+
static_assert(std::is_nothrow_constructible_v,
+
  std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);


Here's the variation of the test I implemented.


+}
+
+int
+main()
+{
+  test_accessor_policy>();
+  test_access();
+  static_assert(test_access());
+  test_offset();
+  static_assert(test_offset());
+  test_ctor();
+  return 0;
+}
diff --git
a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default_neg.cc
b/libstdc++-v3/testsui

Re: [PATCH][RFC] c/96570 - diagnostics for conversions to/from time_t

2025-06-27 Thread Florian Weimer

* Joseph Myers:

> On Thu, 26 Jun 2025, Richard Biener wrote:
>
>> The following prototypes diagnostics for conversions to/from time_t
>> where the source/destination does not have sufficient precision for it.
>> I've lumped this into -Wconversion for the moment and didn't bother
>> fixing up the testcase for !ilp32 or the -Wconversion diagnostics that
>> happen.
>> 
>> Would -Wtime-conversion (or -Wtime_t-conversion?) be an appropriate
>> option?  I'd enable it with -Wconversion.
>
> I think such a warning should be based on an attribute on the time_t type 
> that means "warn for implicit truncation of this type" (I'm less clear on 
> why warnings for implicit widening conversions *to* time_t are supposed to 
> be useful), rather than hardcoding it to be based on the time_t name.  
> It's hardly just time_t for which a warning about such implicit truncation 
> might be useful.

Ideally, we'd also warn about conversion to long, when long isn't
actually uint64_t.  This way, we can diagnose potential truncation
during 64-bit builds.

Thanks,
Florian

Re: [PATCH 1/4] c++: Add flag to detect underlying representative of bitfield decls

2025-06-27 Thread Nathaniel Shead

On Wed, Jun 25, 2025 at 01:05:39PM -0400, Jason Merrill wrote:
> On 5/21/25 10:14 PM, Nathaniel Shead wrote:
> > This patch isn't currently necessary with how I've currently done the
> > follow-up patches, but is needed for avoiding any potential issues in
> > the future with DECL_CONTEXT'ful types getting created in the compiler
> > with no names on the fields.  (For instance, this change would make much
> > of r15-7342-gd3627c78be116e unnecessary.)
> > 
> > It does take up another flag though in the frontend though.  Another
> > possible approach would be to instead do a walk through all the fields
> > first to see if this is the target of a DECL_BIT_FIELD_REPRESENTATIVE;
> > thoughts?  Or would you prefer to skip this patch entirely?
> 
> It seems like the only way to reach such a FIELD_DECL is through
> DECL_BIT_FIELD_REPRESENTATIVE, so we ought to be able to use that without
> adding another walk?

Fair enough, how does this look instead?  Bootstrapped and tested (so
far just modules.exp) on x86_64-pc-linux-gnu, OK for trunk if full
regtest passes?

-- >8 --

Subject: [PATCH] c++/modules: Make bitfield storage unit detection more robust

Modules streaming needs to handle these differently from other unnamed
FIELD_DECLs that are streamed for internal RECORD_DECLs, and there
doesn't seem to be a good way to detect this case otherwise.

This matters only to allow for compiler-generated type definitions that
build FIELD_DECLs with no name, as otherwise they get confused.
Currently the only such types left I hadn't earlier fixed by giving
names to are contextless, for which we have an early check to mark their
fields as MK_unique anyway, but there may be other cases in the future.

gcc/cp/ChangeLog:

* module.cc (trees_out::walking_bit_field_unit): New flag.
(trees_out::trees_out): Initialize it.
(trees_out::core_vals): Set it.
(trees_out::get_merge_kind): Use it, move previous ad-hoc check
into assertion.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 42a1b83e164..7bc3e576293 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3097,6 +3097,9 @@ private:
   unsigned section;
   bool writing_local_entities; /* Whether we might walk into a TU-local
   entity we need to emit placeholders for.  */
+  bool walking_bit_field_unit;  /* Whether we're walking the underlying
+  storage for a bit field.  There's no other
+  great way to detect this.  */
 #if CHECKING_P
   int importedness;/* Checker that imports not occurring
   inappropriately.  +ve imports ok,
@@ -3263,7 +3266,7 @@ trees_out::trees_out (allocator *mem, module_state 
*state, depset::hash &deps,
  unsigned section)
   :parent (mem), state (state), tree_map (500),
dep_hash (&deps), ref_num (0), section (section),
-   writing_local_entities (false)
+   writing_local_entities (false), walking_bit_field_unit (false)
 {
 #if CHECKING_P
   importedness = 0;
@@ -6512,7 +6515,10 @@ trees_out::core_vals (tree t)
 case FIELD_DECL:
   WT (t->field_decl.offset);
   WT (t->field_decl.bit_field_type);
-  WT (t->field_decl.qualifier); /* bitfield unit.  */
+  {
+   auto ovr = make_temp_override (walking_bit_field_unit, true);
+   WT (t->field_decl.qualifier); /* bitfield unit.  */
+  }
   WT (t->field_decl.bit_offset);
   WT (t->field_decl.fcontext);
   WT (t->decl_common.initial);
@@ -11268,15 +11274,16 @@ trees_out::get_merge_kind (tree decl, depset *dep)
  return MK_named;
}
 
- if (!DECL_NAME (decl)
- && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
- && !DECL_BIT_FIELD_REPRESENTATIVE (decl))
+ if (walking_bit_field_unit)
{
  /* The underlying storage unit for a bitfield.  We do not
 need to dedup it, because it's only reachable through
 the bitfields it represents.  And those are deduped.  */
  // FIXME: Is that assertion correct -- do we ever fish it
  // out and put it in an expr?
+ gcc_checking_assert (!DECL_NAME (decl)
+  && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
+  && !DECL_BIT_FIELD_REPRESENTATIVE (decl));
  gcc_checking_assert ((TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
? TREE_CODE (TREE_TYPE (TREE_TYPE (decl)))
: TREE_CODE (TREE_TYPE (decl)))
-- 
2.47.0

Re: [PATCH][RFC] c/96570 - diagnostics for conversions to/from time_t

2025-06-27 Thread Richard Biener

On Fri, 27 Jun 2025, Florian Weimer wrote:

> * Joseph Myers:
> 
> > On Thu, 26 Jun 2025, Richard Biener wrote:
> >
> >> The following prototypes diagnostics for conversions to/from time_t
> >> where the source/destination does not have sufficient precision for it.
> >> I've lumped this into -Wconversion for the moment and didn't bother
> >> fixing up the testcase for !ilp32 or the -Wconversion diagnostics that
> >> happen.
> >> 
> >> Would -Wtime-conversion (or -Wtime_t-conversion?) be an appropriate
> >> option?  I'd enable it with -Wconversion.
> >
> > I think such a warning should be based on an attribute on the time_t type 
> > that means "warn for implicit truncation of this type" (I'm less clear on 
> > why warnings for implicit widening conversions *to* time_t are supposed to 
> > be useful), rather than hardcoding it to be based on the time_t name.  
> > It's hardly just time_t for which a warning about such implicit truncation 
> > might be useful.
> 
> Ideally, we'd also warn about conversion to long, when long isn't
> actually uint64_t.  This way, we can diagnose potential truncation
> during 64-bit builds.

Yeah, though fun fact - time_t is 'long' on most 64bit systems ...
so you'd diagnose a conversion from long to long.  I'm not sure
how to make this work generally with the attribute idea when
not keying explicitly on 'time_t'.

Richard.

1 2 >

1 - 100 of 172 matches

Mail list logo