date:20250519

[PATCH] [testsuite] [x86] strlenopt-80 needs -msse2 on ia32

2025-05-19 Thread Alexandre Oliva



The string length optimizations at 8-byte blocks requires -msse2;
-msse is not enough.  Bump it.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/strlenopt-80.c: Bump to -msse2.
---
 gcc/testsuite/gcc.dg/strlenopt-80.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/strlenopt-80.c 
b/gcc/testsuite/gcc.dg/strlenopt-80.c
index 63d4eb17e4c3f..0b16a41423661 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-80.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-80.c
@@ -6,7 +6,7 @@
{ dg-do compile { target { { aarch64*-*-* i?86-*-* x86_64-*-* } || { { 
powerpc*-*-* } && lp64 } } } }
 
{ dg-options "-O2 -Wall -fdump-tree-optimized" }
-   { dg-additional-options "-msse" { target i?86-*-* x86_64-*-* } } */
+   { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 /* On powerpc configurations that have -mstrict-align by default,
the memcpy calls for ncpylog >= 3 are not turned into MEM_REFs.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [x86] memcpy-6 needs -msse2

2025-05-19 Thread Alexandre Oliva



The 8-byte memory operations will only be inlined on ia32 with
-msse2.  Bump it.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* memcpy-6.c: Bump to -msse2.
---
 gcc/testsuite/gcc.dg/memcpy-6.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/memcpy-6.c b/gcc/testsuite/gcc.dg/memcpy-6.c
index d4df03903c353..49aec338d2f2e 100644
--- a/gcc/testsuite/gcc.dg/memcpy-6.c
+++ b/gcc/testsuite/gcc.dg/memcpy-6.c
@@ -7,7 +7,7 @@
{ dg-do compile }
{ dg-options "-O0 -Wrestrict -fdump-tree-optimized" }
{ dg-skip-if "skip non-x86 targets" { ! { i?86-*-* x86_64-*-* } } }
-   { dg-additional-options "-msse" { target i?86-*-* x86_64-*-* } } */
+   { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 char a[32];
 

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] add missing require vect_early_break_hw for vect-tsvc

2025-05-19 Thread Alexandre Oliva



Some tsvc tests add vect_early_break options without requiring the
feature to be available.  Add the requirements.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Require vect_early_break_hw.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c |1 +
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c |1 +
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c |1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 0d55d0dd67c3b..21a9c5a6b2b6a 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
 #include "tsvc.h"
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 5539f0f08411f..e4433385d6686 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
 #include "tsvc.h"
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 73bed5d4c57a2..146df409ecc64 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
 #include "tsvc.h"


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH v3 2/3] sbitmap: Add bitmap_is_range_set_p function

2025-05-19 Thread Konstantinos Eleftheriou

Hi Richard, thanks for your response.

On Tue, May 20, 2025 at 8:05 AM Richard Biener
 wrote:
>
> On Mon, May 19, 2025 at 4:14 PM Konstantinos Eleftheriou
>  wrote:
> >
> > This patch adds the `bitmap_is_range_set_p` function in sbitmap,
> > which checks if all the bits in a range are set. This function
> > calls `bitmap_bit_in_range_p_1`, which has been updated to use
> > the `any_inverted` parameter. When `any_inverted` is true, the helper
> > function checks if any of the bits in the range is unset, instead of
> > checking the opposite.
> >
> > Function `bitmap_bit_in_range_p` has been updated to call
> > `bitmap_bit_in_range_p_1` with the `any_inverted` parameter
> > set to false, retaining its previous functionality.
> >
> > Function `bitmap_is_range_set_p` calls `bitmap_bit_in_range_p_1`
> > with `any_inverted` set to true and returns the negation of the
> > result, i.e. true if all the bits in the range are set.
> >
> > gcc/ChangeLog:
> >
> > * sbitmap.cc (bitmap_bit_in_range_p_1): Added the `any_inverted`
> > parameter and changed the logic to check if any of the bits in
> > the range is unset, when the value of the parameter is "true".
> > (bitmap_is_range_set_p): New function.
> > (bitmap_bit_in_range_p): Call and return the result of
> > `bitmap_bit_in_range_p_1` with the `any_inverted` parameter set
> > to false.
> > * sbitmap.h (bitmap_is_range_set_p): New function.
> >
> > Signed-off-by: Konstantinos Eleftheriou 
> > ---
> >
> > (no changes since v1)
> >
> >  gcc/sbitmap.cc | 27 ---
> >  gcc/sbitmap.h  |  1 +
> >  2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/sbitmap.cc b/gcc/sbitmap.cc
> > index 94f2bbd6c8fd..99f1db540ab6 100644
> > --- a/gcc/sbitmap.cc
> > +++ b/gcc/sbitmap.cc
> > @@ -326,12 +326,13 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
> > unsigned int count)
> >bmap->elms[start_word] |= mask;
> >  }
> >
> > -/* Return TRUE if any bit between START and END inclusive is set within
> > -   the simple bitmap BMAP.  Return FALSE otherwise.  */
> > +/* Helper function for bitmap_bit_in_range_p and bitmap_is_range_set_p.
> > +   If ANY_INVERTED is true, the function checks if any bit in the range
> > +   is unset.  */
> >
> >  bool
> >  bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int start,
> > -unsigned int end)
> > +unsigned int end, bool any_inverted)
> >  {
> >gcc_checking_assert (start <= end);
> >bitmap_check_index (bmap, end);
> > @@ -351,7 +352,8 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> > int start,
> >
> >SBITMAP_ELT_TYPE low_mask = ((SBITMAP_ELT_TYPE)1 << start_bitno) - 1;
> >SBITMAP_ELT_TYPE mask = high_mask - low_mask;
> > -  if (bmap->elms[start_word] & mask)
> > +  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
> > +  if ((bmap->elms[start_word] & mask) != expected_partial)
> > return true;
> >start_word++;
> >  }
> > @@ -361,9 +363,10 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> > int start,
> >
> >/* Now test words at a time until we hit a partial word.  */
> >unsigned int nwords = (end_word - start_word);
> > +  const SBITMAP_ELT_TYPE expected = any_inverted ? ~(SBITMAP_ELT_TYPE)0 : 
> > 0;
> >while (nwords)
> >  {
> > -  if (bmap->elms[start_word])
> > +  if (bmap->elms[start_word] != expected)
> > return true;
> >start_word++;
> >nwords--;
> > @@ -373,7 +376,17 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> > int start,
> >SBITMAP_ELT_TYPE mask = ~(SBITMAP_ELT_TYPE)0;
> >if (end_bitno + 1 < SBITMAP_ELT_BITS)
> >  mask = ((SBITMAP_ELT_TYPE)1 << (end_bitno + 1)) - 1;
> > -  return (bmap->elms[start_word] & mask) != 0;
> > +  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
> > +  return (bmap->elms[start_word] & mask) != expected_partial;
> > +}
> > +
> > +/* Return TRUE if all bits between START and END inclusive are set within
> > +   the simple bitmap BMAP.  Return FALSE otherwise.  */
> > +
> > +bool
> > +bitmap_is_range_set_p (const_sbitmap bmap, unsigned int start, unsigned 
> > int end)
> > +{
> > +  return !bitmap_bit_in_range_p_1 (bmap, start, end, true);
> >  }
> >
> >  /* Return TRUE if any bit between START and END inclusive is set within
> > @@ -382,7 +395,7 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> > int start,
> >  bool
> >  bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned 
> > int end)
> >  {
> > -  return bitmap_bit_in_range_p_1 (bmap, start, end);
> > +  return bitmap_bit_in_range_p_1 (bmap, start, end, false);
> >  }
> >
> >  #if GCC_VERSION < 3400
> > diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
> > index 66f9e138503c..4ff93e7a98f9 100644
> > --- a/gcc/sbitmap.h
> > +++ b/gcc/sbitmap.h
> > @@ -288,6 +288,7 @@ extern bool bitmap_io

[PATCH] [testsuite] [x86] double copysign requires -msse2

2025-05-19 Thread Alexandre Oliva



SSE_FLOAT_MODE_P only holds for DFmode with SSE2, and that's a
condition for copysign3 to be available under TARGET_SSE_MATH.

Various copysign testcases use -msse -mfpmath=sse on ia32 to enable
the copysign builtins and patterns, but that would only be enough if
the tests were limited to floats.  Since they test doubles as well, we
need -msse2 instead of -msse.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?

(This patch, as posted, applies on top of this:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683066.html
plus the missing bit in the followup I've just posted)


for  gcc/testsuite/ChangeLog

* gcc.dg/fold-copysign-1.c: Bump to sse2 on ia32.
* gcc.dg/pr55152-2.c: Likewise.
* gcc.dg/tree-ssa/abs-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
---
 gcc/testsuite/gcc.dg/fold-copysign-1.c |2 +-
 gcc/testsuite/gcc.dg/pr55152-2.c   |2 +-
 gcc/testsuite/gcc.dg/tree-ssa/abs-4.c  |2 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
b/gcc/testsuite/gcc.dg/fold-copysign-1.c
index 7b67b11af7dba..f9b3ba19f99b5 100644
--- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
+++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
-/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
 /* { dg-additional-options "-mcmpb" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-mdouble=64" { target { avr-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c b/gcc/testsuite/gcc.dg/pr55152-2.c
index ed293c0cae3eb..40933f7f9dc42 100644
--- a/gcc/testsuite/gcc.dg/pr55152-2.c
+++ b/gcc/testsuite/gcc.dg/pr55152-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -ffinite-math-only -fno-signed-zeros -fstrict-overflow 
-fdump-tree-optimized" } */
-/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
 /* { dg-additional-options "-mpowerpc-gfxopt" { target { powerpc*-*-* } } } */
 
 double g (double a)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
index e86af846449b5..c58d24834ab4d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O1 -fdump-tree-optimized" } */
-/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
 /* { dg-additional-options "-mcmpb" { target { powerpc*-*-* } } } */
 /* PR tree-optimization/109829 */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index af58a7d5b3332..2aa8980afe7a6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-backprop-details" }  */
-/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
 /* { dg-additional-options "-mcmpb" { target { powerpc*-*-* } } } */
 
 void start (void *);

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [x86] vect-simd-clone-1[678]e.c adjust

2025-05-19 Thread Alexandre Oliva



Since r13-6296, we haven't got 4 simdclone calls for these tests on
ia32 without avx_runtime.  With avx_runtime, we get 3 such calls even
on ia32, but we didn't test for anything on ia32 with avx_runtime.
Adjust and simplify the expectations and comments.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/vect/vect-simd-clone-16e.c: Expect fewer calls on ia32.
* gcc.dg/vect/vect-simd-clone-17e.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18e.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c |8 +++-
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c |8 +++-
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c |8 +++-
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
index f80b0e0581e35..2f7cdfb22119e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
@@ -6,11 +6,9 @@
 #include "vect-simd-clone-16.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use another call for the epilogue loops.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { { ! avx_runtime } && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } 
} } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" 
{ target { avx_runtime && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } 
} */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { ! avx_runtime } && { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } 
} */
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! avx_runtime } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" 
{ target avx_runtime } } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
index c7c510b8a6abd..8f10aff3b897e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
@@ -6,11 +6,9 @@
 #include "vect-simd-clone-17.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use another call for the epilogue loops.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { { ! avx_runtime } && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } 
} } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" 
{ target { avx_runtime && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } 
} */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { ! avx_runtime } && { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } 
} */
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! avx_runtime } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" 
{ target avx_runtime } } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
index e00c3d78038bf..142fcc8b0b55d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
@@ -6,11 +6,9 @@
 #include "vect-simd-clone-18.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use another call for the epilogue loops.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { { ! avx_runtime } && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } 
} } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" 
{ target { avx_runtime && { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } 
} */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { ! avx_runtime } && { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } 
} */
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! avx_runtime } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.si

[PATCH] [testsuite] [x86] no-callee-saved-16.c needs -fomit-frame-pointer

2025-05-19 Thread Alexandre Oliva



If the toolchain is built with --enable-frame-pointer,
gcc.target/i386/no-callee-saved-16.c will not get the expected
optimization without -fomit-frame-pointer, that would be enabled by
-O2 without the configure flag.  Add it.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-16.c: Add -fomit-frame-pointer.
---
 gcc/testsuite/gcc.target/i386/no-callee-saved-16.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
index 112d1764f3e10..a5589e21ab3b9 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-options "-O2 -fomit-frame-pointer 
-mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
 
 typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
 


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [arm] adjust fp16-aapcs for gcc-14

2025-05-19 Thread Alexandre Oliva



(The backport I've only just posted is not enough for the tests to pass;
there's another problem)

r14-10824 is a backport of r15-4549, that rewrote and extended into
check-function-bodies the save/restore expectations introduced in
r15-2160.  Alas, r15-2160 mentions an insn_propagation patch that
enables those specific save/restore insns to be generated, presumably
r15-1945, and this change is not present in gcc-14, so we get
different save/restore insns, and the test fails, even after
backporting r15-1035, that allows for single-character function names
in check-function-bodies.

Drop the save/restore checks that don't belong in gcc-14.

Tested with gcc-14 on arm-vxworks7r2.  Ok to install in gcc-14?


for  gcc/testsuite/ChangeLog

* gcc.target/arm/fp16-aapcs-1.c: Drop save/restore checks.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c |7 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c |8 
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c |7 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c |8 
 4 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c 
b/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
index b18d7cda65c8d..450c52fcd5c6c 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
@@ -29,10 +29,8 @@ Below block is for non-armv8.1
 ** ...
 ** vmov\.f32   s0, \2
 ** )
-** vstr\.32s2, \[sp, #4\]  @ int
+** ...
 ** bl  swap
-** vldr\.32s2, \[sp, #4\]  @ int
-** vmov\.f32   s0, s2
 
 ** |
 
@@ -50,9 +48,8 @@ Below block is for armv8.1
 ** ...
 ** vmovs0, \4  @ __fp16
 ** )
-** vstr\.32s2, \[sp, #4\]  @ int
+** ...
 ** bl  swap
-** vldr\.16s0, \[sp, #4\]
 
 ** )
 ** ...
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c 
b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
index 48510e895368d..c15f29dd3e44b 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
@@ -28,14 +28,6 @@ swap (__fp16, __fp16);
 ** )
 ** ...
 */
-/*
-** F: { target arm_little_endian }
-** ...
-** str r2, \[sp, #4\]
-** bl  swap
-** ldrhr0, \[sp, #4\]  @ __fp16
-** ...
-*/
 __fp16
 F (__fp16 a, __fp16 b, __fp16 c)
 {
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c 
b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
index 7238ef3a02e03..1102dc7344919 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
@@ -29,10 +29,8 @@ Below block is for non-armv8.1
 ** ...
 ** vmov\.f32   s0, \2
 ** )
-** vstr\.32s2, \[sp, #4\]  @ int
+** ...
 ** bl  swap
-** vldr\.32s2, \[sp, #4\]  @ int
-** vmov\.f32   s0, s2
 
 ** |
 
@@ -50,9 +48,8 @@ Below block is for armv8.1
 ** ...
 ** vmovs0, \4
 ** )
-** vstr\.32s2, \[sp, #4\]  @ int
+** ...
 ** bl  swap
-** vldr\.16s0, \[sp, #4\]
 
 ** )
 ** ...
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c 
b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
index 13f08d8afa32d..00a44d15129a8 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
@@ -28,14 +28,6 @@ swap (__fp16, __fp16);
 ** )
 ** ...
 */
-/*
-** F: { target arm_little_endian }
-** ...
-** str r2, \[sp, #4\]
-** bl  swap
-** ldrhr0, \[sp, #4\]  @ __fp16
-** ...
-*/
 __fp16
 F (__fp16 a, __fp16 b, __fp16 c)
 {

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [x86] pr108938-3.c needs -msse2 for bswap in foo2 with -m32

2025-05-19 Thread Alexandre Oliva



Without SSE2, we don't combine the separate loads in foo2 and get
separate rotates, instead of a bswap.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr108938-3.c: Add -msse2.
---
 gcc/testsuite/gcc.target/i386/pr108938-3.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr108938-3.c 
b/gcc/testsuite/gcc.target/i386/pr108938-3.c
index 757a0c456bc45..47293d49bb9ea 100644
--- a/gcc/testsuite/gcc.target/i386/pr108938-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr108938-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mno-movbe -mno-avx" } */
+/* { dg-options "-O2 -ftree-vectorize -mno-movbe -msse2 -mno-avx" } */
 /* { dg-final { scan-assembler-times "bswap\[\t ]+" 2 { target { ! ia32 } } } 
} */
 /* { dg-final { scan-assembler-times "bswap\[\t ]+" 3 { target ia32 } } } */
 

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [x86] pr31985.c needs -fomit-frame-pointer to match movl count

2025-05-19 Thread Alexandre Oliva



On an --enable-frame-pointer toolchain, pr31985.c gets an extra movl
and fails.  Enable -fomit-frame-pointer explicitly.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr31985.c: Add -fomit-frame-pointer.
---
 gcc/testsuite/gcc.target/i386/pr31985.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr31985.c 
b/gcc/testsuite/gcc.target/i386/pr31985.c
index a6de1b5b14318..a0a91116242c1 100644
--- a/gcc/testsuite/gcc.target/i386/pr31985.c
+++ b/gcc/testsuite/gcc.target/i386/pr31985.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fomit-frame-pointer" } */
 
 void test_c (unsigned int a, unsigned int b, unsigned int c, unsigned int d)
 {


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH] [testsuite] add missing require vect_early_break_hw for vect-tsvc

2025-05-19 Thread Richard Biener

OK

On Tue, May 20, 2025 at 6:20 AM Alexandre Oliva  wrote:
>
>
> Some tsvc tests add vect_early_break options without requiring the
> feature to be available.  Add the requirements.
>
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
> arm-, x86-, and x86_64-vxworks7r2.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/vect/tsvc/vect-tsvc-s332.c: Require vect_early_break_hw.
> * gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c |1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c |1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c |1 +
>  3 files changed, 3 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> index 0d55d0dd67c3b..21a9c5a6b2b6a 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> index 5539f0f08411f..e4433385d6686 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> index 73bed5d4c57a2..146df409ecc64 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
>
>
> --
> Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
> Free Software Activist FSFLA co-founder GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity.
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH] [testsuite] [ppc] add -mpowerpc-gfxopt or -mcmpb to copysign tests

2025-05-19 Thread Alexandre Oliva

On May  8, 2025, Alexandre Oliva  wrote:

>   * gcc.dg/fold-copysign-1.c: Add -mcmpb on ppc.

Uhh, sorry, I've accidentally dropped this bit from the patch posted a
couple of weeks ago.  Here it is.

BTW, Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683066.html


diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
b/gcc/testsuite/gcc.dg/fold-copysign-1.c
index 1f5141b1c5d64..7b67b11af7dba 100644
--- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
+++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
 /* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
x86_64-*-* } && ilp32 } } } */
+/* { dg-additional-options "-mcmpb" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-mdouble=64" { target { avr-*-* } } } */
 
 double foo (double x)




-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

[PATCH] [testsuite] [x86] forwprop-41 needs -msse

2025-05-19 Thread Alexandre Oliva



The vector operations are only turned into BIT_INSERT_EXPR with -msse
on ia32.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
arm-, x86-, and x86_64-vxworks7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/forwprop-41.c: Add -msse on x86.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
index a1f08289dd696..1c5b500deb158 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized -Wno-psabi -w" } */
+/* { dg-additional-options "-msse" { target i?86-*-* x86_64-*-* } } */
 
 #define vector __attribute__((__vector_size__(16) ))
 

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH v3 2/3] sbitmap: Add bitmap_is_range_set_p function

2025-05-19 Thread Richard Biener

On Mon, May 19, 2025 at 4:14 PM Konstantinos Eleftheriou
 wrote:
>
> This patch adds the `bitmap_is_range_set_p` function in sbitmap,
> which checks if all the bits in a range are set. This function
> calls `bitmap_bit_in_range_p_1`, which has been updated to use
> the `any_inverted` parameter. When `any_inverted` is true, the helper
> function checks if any of the bits in the range is unset, instead of
> checking the opposite.
>
> Function `bitmap_bit_in_range_p` has been updated to call
> `bitmap_bit_in_range_p_1` with the `any_inverted` parameter
> set to false, retaining its previous functionality.
>
> Function `bitmap_is_range_set_p` calls `bitmap_bit_in_range_p_1`
> with `any_inverted` set to true and returns the negation of the
> result, i.e. true if all the bits in the range are set.
>
> gcc/ChangeLog:
>
> * sbitmap.cc (bitmap_bit_in_range_p_1): Added the `any_inverted`
> parameter and changed the logic to check if any of the bits in
> the range is unset, when the value of the parameter is "true".
> (bitmap_is_range_set_p): New function.
> (bitmap_bit_in_range_p): Call and return the result of
> `bitmap_bit_in_range_p_1` with the `any_inverted` parameter set
> to false.
> * sbitmap.h (bitmap_is_range_set_p): New function.
>
> Signed-off-by: Konstantinos Eleftheriou 
> ---
>
> (no changes since v1)
>
>  gcc/sbitmap.cc | 27 ---
>  gcc/sbitmap.h  |  1 +
>  2 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/sbitmap.cc b/gcc/sbitmap.cc
> index 94f2bbd6c8fd..99f1db540ab6 100644
> --- a/gcc/sbitmap.cc
> +++ b/gcc/sbitmap.cc
> @@ -326,12 +326,13 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
> unsigned int count)
>bmap->elms[start_word] |= mask;
>  }
>
> -/* Return TRUE if any bit between START and END inclusive is set within
> -   the simple bitmap BMAP.  Return FALSE otherwise.  */
> +/* Helper function for bitmap_bit_in_range_p and bitmap_is_range_set_p.
> +   If ANY_INVERTED is true, the function checks if any bit in the range
> +   is unset.  */
>
>  bool
>  bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int start,
> -unsigned int end)
> +unsigned int end, bool any_inverted)
>  {
>gcc_checking_assert (start <= end);
>bitmap_check_index (bmap, end);
> @@ -351,7 +352,8 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
> start,
>
>SBITMAP_ELT_TYPE low_mask = ((SBITMAP_ELT_TYPE)1 << start_bitno) - 1;
>SBITMAP_ELT_TYPE mask = high_mask - low_mask;
> -  if (bmap->elms[start_word] & mask)
> +  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
> +  if ((bmap->elms[start_word] & mask) != expected_partial)
> return true;
>start_word++;
>  }
> @@ -361,9 +363,10 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> int start,
>
>/* Now test words at a time until we hit a partial word.  */
>unsigned int nwords = (end_word - start_word);
> +  const SBITMAP_ELT_TYPE expected = any_inverted ? ~(SBITMAP_ELT_TYPE)0 : 0;
>while (nwords)
>  {
> -  if (bmap->elms[start_word])
> +  if (bmap->elms[start_word] != expected)
> return true;
>start_word++;
>nwords--;
> @@ -373,7 +376,17 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned 
> int start,
>SBITMAP_ELT_TYPE mask = ~(SBITMAP_ELT_TYPE)0;
>if (end_bitno + 1 < SBITMAP_ELT_BITS)
>  mask = ((SBITMAP_ELT_TYPE)1 << (end_bitno + 1)) - 1;
> -  return (bmap->elms[start_word] & mask) != 0;
> +  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
> +  return (bmap->elms[start_word] & mask) != expected_partial;
> +}
> +
> +/* Return TRUE if all bits between START and END inclusive are set within
> +   the simple bitmap BMAP.  Return FALSE otherwise.  */
> +
> +bool
> +bitmap_is_range_set_p (const_sbitmap bmap, unsigned int start, unsigned int 
> end)
> +{
> +  return !bitmap_bit_in_range_p_1 (bmap, start, end, true);
>  }
>
>  /* Return TRUE if any bit between START and END inclusive is set within
> @@ -382,7 +395,7 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
> start,
>  bool
>  bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
> end)
>  {
> -  return bitmap_bit_in_range_p_1 (bmap, start, end);
> +  return bitmap_bit_in_range_p_1 (bmap, start, end, false);
>  }
>
>  #if GCC_VERSION < 3400
> diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
> index 66f9e138503c..4ff93e7a98f9 100644
> --- a/gcc/sbitmap.h
> +++ b/gcc/sbitmap.h
> @@ -288,6 +288,7 @@ extern bool bitmap_ior (sbitmap, const_sbitmap, 
> const_sbitmap);
>  extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
>  extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
>  extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned 
> int);
> +extern bool bitmap_is_range_set_p (const_sbitmap, unsigned int, unsigned 
> int

[PATCH] [gcc-14] testsuite: Improve check-function-bodies

2025-05-19 Thread Alexandre Oliva

The backport of commit 205515da82a2914d765e74ba73fd2765e1254112 to
gcc-14 as 8b1146fe46e62f8b03bd9ddee48995794e192e82, rewriting
gcc.target/arm/fp16-aapcs-[1234].c into check-function-bodies, requires
the following patch for the one-character function names used in those
tests.  Tested with gcc-14 on arm-vxworks7r2.  Ok to install?

From: Wilco Dijkstra 

Improve check-function-bodies by allowing single-character function names.

gcc/testsuite:
* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
function names.

(cherry pick from commit acdc9df371fbe99e814a3f35a439531e08af79e7)
---
 gcc/testsuite/lib/scanasm.exp |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index d1c8e3b50794a..737eefc655e90 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -869,15 +869,15 @@ proc configure_check-function-bodies { config } {
 # Regexp for the start of a function definition (name in \1).
 if { [istarget nvptx*-*-*] } {
set up_config(start) {
-   {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
+   {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S*)$}
}
 } elseif { [istarget *-*-darwin*] } {
set up_config(start) {
-   {^_([a-zA-Z_]\S+):$}
+   {^_([a-zA-Z_]\S*):$}
{^LFB[0-9]+:}
}
 } else {
-   set up_config(start) {{^([a-zA-Z_]\S+):$}}
+   set up_config(start) {{^([a-zA-Z_]\S*):$}}
 }
 
 # Regexp for the end of a function definition.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

RE: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-19 Thread Richard Biener

On Mon, 19 May 2025, Tamar Christina wrote:

-Original Message-
From: Richard Biener 
Sent: Friday, May 16, 2025 11:35 AM
To: gcc-patches@gcc.gnu.org
Cc: Richard Sandiford ; Tamar Christina

Subject: [PATCH][RFC] Allow the target to request a masked vector epilogue

Targets recently got the ability to request the vector mode to be
used for a vector epilogue (or the epilogue of a vector epilogue).  The
following adds the ability for it to indicate the epilogue should use
loop masking, irrespective of the --param vect-partial-vector-usage
setting.

The simple prototype below uses a separate flag from the epilogue
mode, but I wonder how we want to more generally want to handle
whether to use masking or not when iterating over modes.  Currently
we mostly rely on --param vect-partial-vector-usage.  aarch64
and riscv have both variable-length modes but also fixed-size modes
where for the latter, like on x86, the target couldn't request
a mode specifically with or without masking.  It seems both
aarch64 and riscv fully rely on cost comparison and fully
exploiting the mode iteration space (but not masked vs. non-masked?!)
here?

I was thinking of adding a vectorization_mode class that would
encapsulate the mode and whether to allow masking or alternatively
to make the vector_modes array (and the m_suggested_epilogue_mode)
a std::pair of mode and mask flag?

I personally like the class approach as it seems more easily extensible in
the future.  I was recently wondering about what would be useful for
epilogues, and with the change to unroll in the vectorizer it would be
useful to be able to requires an epilogue of a particular unroll factor.

Or some other way to convey your requested VF?

Yes, though we currently handle this by selecting the mode instead of
a VF - those are not really complementary.  I'd like to see us towards
selecting/requesting a VF and chosing modes as they seem to fit that
constraint (thereby relaxing the single vector size restriction as well).
I guess for VLA vs non-VLA we still need some "related mode" here as
that's how the target hook works.

But one thing at a time ... the single-vector-size thing is high
on my list to fix.

Richard.

Thanks,
Tamar

For the x86 case going the prototype way would be sufficient, we
wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
so any further iteration on epilogue modes if the requested mode
would fail to vectorize is OK to be unmasked.

Any comments on this?  You are not yet using m_suggested_epilogue_mode
to get more than one vector epilogue, this might be a way to add
heuristics when to use a masked epilogue.

Thanks,
Richard.

* tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
Add masked output parameter and return m_masked_epilogue.
(vector_costs::m_masked_epilogue): New tristate flag.
(vector_costs::vector_costs): Initialize m_masked_epilogue.
* tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
flag to optionally initialize can_use_partial_vectors_p.
(vect_analyze_loop): For epilogues also get whether to use
a masked epilogue for this loop from the target and use
that for the first epilogue mode we try.
---
 gcc/tree-vect-loop.cc | 29 +
 gcc/tree-vectorizer.h | 12 +---
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2d1a6883e6b..4af510ff20c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop,
vec_info_shared *shared,
 const vect_loop_form_info *loop_form_info,
 loop_vec_info orig_loop_vinfo,
 const vector_modes &vector_modes, unsigned &mode_i,
+int masked_p,
 machine_mode &autodetected_vector_mode,
 bool &fatal)
 {
@@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop,
vec_info_shared *shared,

   machine_mode vector_mode = vector_modes[mode_i];
   loop_vinfo->vector_mode = vector_mode;
+  if (masked_p != -1)
+loop_vinfo->can_use_partial_vectors_p = masked_p;
   unsigned int suggested_unroll_factor = 1;
   unsigned slp_done_for_suggested_uf = 0;

@@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple
*loop_vectorized_call,
   cached_vf_per_mode[last_mode_i] = -1;
   opt_loop_vec_info loop_vinfo
= vect_analyze_loop_1 (loop, shared, &loop_form_info,
-  NULL, vector_modes, mode_i,
+  NULL, vector_modes, mode_i, -1,
   autodetected_vector_mode, fatal);
   if (fatal)
break;
@@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple
*loop_vectorized_call,
  array may contain length-agnostic and length-specific modes.  Their
  ordering is not guaranteed, so we could end up picking a mode for the mai

Re: [PATCH] [testsuite] add missing require vect_early_break_hw for vect-tsvc

2025-05-19 Thread Uros Bizjak

LGTM for the whole series.

Thanks,
Uros.

On Tue, May 20, 2025 at 6:17 AM Alexandre Oliva  wrote:
>
>
> Some tsvc tests add vect_early_break options without requiring the
> feature to be available.  Add the requirements.
>
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-14 on aarch64-,
> arm-, x86-, and x86_64-vxworks7r2.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/vect/tsvc/vect-tsvc-s332.c: Require vect_early_break_hw.
> * gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c |1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c |1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c |1 +
>  3 files changed, 3 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> index 0d55d0dd67c3b..21a9c5a6b2b6a 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> index 5539f0f08411f..e4433385d6686 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> index 73bed5d4c57a2..146df409ecc64 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
>  #include "tsvc.h"
>
>
> --
> Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
> Free Software Activist FSFLA co-founder GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity.
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH] tree-chrec: Use signed_type_for in convert_affine_scev

2025-05-19 Thread Richard Biener


On Mon, 19 May 2025, Jakub Jelinek wrote:


Hi!

On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.

The patch fixes it similarly what has been done for GCC 14 in various
other spots.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.

Richard.


2025-05-19  Jakub Jelinek  

* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.

--- gcc/tree-chrec.cc.jj2025-04-08 14:09:24.743815607 +0200
+++ gcc/tree-chrec.cc   2025-05-15 16:06:05.383354229 +0200
@@ -1490,7 +1490,7 @@ convert_affine_scev (class loop *loop, t
  new_step = *step;
  if (TYPE_PRECISION (step_type) > TYPE_PRECISION (ct) && TYPE_UNSIGNED (ct))
{
-  tree signed_ct = build_nonstandard_integer_type (TYPE_PRECISION (ct), 0);
+  tree signed_ct = signed_type_for (ct);
  new_step = chrec_convert (signed_ct, new_step, at_stmt,
use_overflow_semantics);
}

Jakub

[PATCH] libgcc: Move bitint support exports to x86/aarch64 specific map files

2025-05-19 Thread Jakub Jelinek

Hi!

When adding _BitInt support I was hoping all or most of arches would
implement it already for GCC 14.  That didn't happen and with
new hosts adding support for _BitInt for GCC 16 (s390x-linux and as was
posted today loongarch-linux too), we need the _BitInt support functions
exported on those arches at GCC_16.0.0 rather than GCC_14.0.0 which
shouldn't be changed anymore.

The following patch does that.  Both arches were already exporting
some of the _BitInt related symbols in their specific map files, this
just moves the remaining ones there as well.

Tested on x86_64-linux (-m32/-m64, with all bitint related tests, both
normally and --target_board=unix/-shared-libgcc, additionally compared
abilists of libgcc before/after), ok for trunk?

2025-05-20  Jakub Jelinek  

* libgcc-std.ver.in (GCC_14.0.0): Remove bitint related exports
from here.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Add them here.
* config/i386/libgcc-darwin.ver (GCC_14.0.0): Likewise.
* config/i386/libgcc-sol2.ver (GCC_14.0.0): Likewise.
* config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Likewise.

--- libgcc/libgcc-std.ver.in.jj 2025-04-08 14:09:53.631413461 +0200
+++ libgcc/libgcc-std.ver.in2025-05-20 08:23:24.323741294 +0200
@@ -1947,12 +1947,6 @@ GCC_7.0.0 {
 
 %inherit GCC_14.0.0 GCC_7.0.0
 GCC_14.0.0 {
-  __PFX__mulbitint3
-  __PFX__divmodbitint4
-  __PFX__fixsfbitint
-  __PFX__fixdfbitint
-  __PFX__floatbitintsf
-  __PFX__floatbitintdf
   __PFX__hardcfr_check
   __PFX__strub_enter
   __PFX__strub_update
--- libgcc/config/i386/libgcc-glibc.ver.jj  2025-04-08 14:09:53.518415033 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 2025-05-20 08:25:59.310613264 +0200
@@ -229,10 +229,16 @@ GCC_13.0.0 {
 
 %inherit GCC_14.0.0 GCC_13.0.0
 GCC_14.0.0 {
+  __mulbitint3
+  __divmodbitint4
+  __fixsfbitint
+  __fixdfbitint
   __fixxfbitint
   __fixtfbitint
   __floatbitintbf
   __floatbitinthf
+  __floatbitintsf
+  __floatbitintdf
   __floatbitintxf
   __floatbitinttf
 }
--- libgcc/config/i386/libgcc-darwin.ver.jj 2024-02-09 11:59:11.907051978 
+0100
+++ libgcc/config/i386/libgcc-darwin.ver2025-05-20 08:27:08.877659307 
+0200
@@ -37,10 +37,16 @@ GCC_14.0.0 {
   __truncxfbf2
   __trunchfbf2
   # Added to GCC_14.0.0 in i386/libgcc-glibc.ver.
+  __mulbitint3
+  __divmodbitint4
+  __fixsfbitint
+  __fixdfbitint
   __fixxfbitint
   __fixtfbitint
   __floatbitintbf
   __floatbitinthf
+  __floatbitintsf
+  __floatbitintdf
   __floatbitintxf
   __floatbitinttf
 }
--- libgcc/config/i386/libgcc-sol2.ver.jj   2025-04-08 14:09:53.518415033 
+0200
+++ libgcc/config/i386/libgcc-sol2.ver  2025-05-20 08:26:44.751990139 +0200
@@ -144,10 +144,16 @@ GCC_14.0.0 {
   __truncxfbf2
   __trunchfbf2
   # Added to GCC_14.0.0 in i386/libgcc-glibc.ver.
+  __mulbitint3
+  __divmodbitint4
+  __fixsfbitint
+  __fixdfbitint
   __fixxfbitint
   __fixtfbitint
   __floatbitintbf
   __floatbitinthf
+  __floatbitintsf
+  __floatbitintdf
   __floatbitintxf
   __floatbitinttf
 }
--- libgcc/config/aarch64/libgcc-softfp.ver.jj  2025-04-08 14:09:53.174419821 
+0200
+++ libgcc/config/aarch64/libgcc-softfp.ver 2025-05-20 08:28:03.638908388 
+0200
@@ -42,8 +42,14 @@ GCC_13.0.0 {
 
 %inherit GCC_14.0.0 GCC_13.0.0
 GCC_14.0.0 {
+  __mulbitint3
+  __divmodbitint4
+  __fixsfbitint
+  __fixdfbitint
   __fixtfbitint
   __floatbitintbf
   __floatbitinthf
+  __floatbitintsf
+  __floatbitintdf
   __floatbitinttf
 }

Jakub

[PATCH v2 0/1]RISC-V :The following changes enable P8700 MIPS processor for RISC-V.

2025-05-19 Thread Umesh Kalappa

>>Every type listed in that attribute must have a mapping to a function unit in 
>>your scheduler model
Thank you Jeff and added the dummies reservation for the leftout attributes and 
tested with dejagnu riscv.exp.

Thank you again for reference
~U

[PATCH v2 2/2] MIPS p8700 doesn't have vector extension and added the dummies reservation for the same.

2025-05-19 Thread Umesh Kalappa

---
 gcc/config/riscv/mips-p8700.md | 28 
 1 file changed, 28 insertions(+)

diff --git a/gcc/config/riscv/mips-p8700.md b/gcc/config/riscv/mips-p8700.md
index 11d0b1ca793..ae0ea8dc896 100644
--- a/gcc/config/riscv/mips-p8700.md
+++ b/gcc/config/riscv/mips-p8700.md
@@ -35,6 +35,11 @@
 ;; Long FPU pipeline.
 (define_cpu_unit "mips_p8700_fpu_apu" "mips_p8700_fpu_pipe")
 
+;; P8700 unsupported insns are mapped to dummies reservations
+(define_reservation "mips_p8700_dummies"
+ "mips_p8700_agq |  mips_p8700_al2 |  mips_p8700_ctistd |  mips_p8700_lsu |
+ mips_p8700_fpu_short |  mips_p8700_fpu_long")
+
 (define_reservation "mips_p8700_agq_al2" "mips_p8700_agq, mips_p8700_al2")
 (define_reservation "mips_p8700_agq_ctistd" "mips_p8700_agq, 
mips_p8700_ctistd")
 (define_reservation "mips_p8700_agq_lsu" "mips_p8700_agq, mips_p8700_lsu")
@@ -137,3 +142,26 @@
   (and (eq_attr "tune" "mips_p8700")
(eq_attr "type" "call,jalr"))
   "mips_p8700_agq_ctistd")
+
+;; mips-p8700 dummies insn and placeholder that had no mapping to p8700 
hardware.
+(define_insn_reservation "mips_p8700_unknown" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "rdvlenb,rdvl,wrvxrm,wrfrm,
+   rdfrm,vsetvl,vsetvl_pre,vlde,vste,vldm,vstm,vlds,vsts,
+   vldux,vldox,vstux,vstox,vldff,vldr,vstr,
+   
vlsegde,vssegte,vlsegds,vssegts,vlsegdux,vlsegdox,vssegtux,vssegtox,vlsegdff,
+   vialu,viwalu,vext,vicalu,vshift,vnshift,vicmp,viminmax,
+   vimul,vidiv,viwmul,vimuladd,sf_vqmacc,viwmuladd,vimerge,vimov,
+   vsalu,vaalu,vsmul,vsshift,vnclip,sf_vfnrclip,
+   vfalu,vfwalu,vfmul,vfdiv,vfwmul,vfmuladd,vfwmuladd,vfsqrt,vfrecp,
+   vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,
+   vfcvtitof,vfcvtftoi,vfwcvtitof,vfwcvtftoi,
+   vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,
+   vired,viwred,vfredu,vfredo,vfwredu,vfwredo,
+   vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,vfmovvf,vfmovfv,
+   vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,
+   
vgather,vcompress,vmov,vector,vandn,vbrev,vbrev8,vrev8,vclz,vctz,vcpop,vrol,vror,vwsll,
+   
vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz,
+   
vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c,vfncvtbf16,vfwcvtbf16,vfwmaccbf16,
+   sf_vc,sf_vc_se"))
+  "mips_p8700_dummies")
-- 
2.43.0

[PATCH v2] RISC-V: Fix the warning of temporary object dangling references.

2025-05-19 Thread Dongyan Chen

During the GCC compilation, some warnings about temporary object dangling 
references emerged. They appeared in these code lines in riscv-common.cc:
const riscv_ext_info_t &implied_ext_info, const riscv_ext_info_t &ext_info = 
get_riscv_ext_info (ext) and auto &ext_info = get_riscv_ext_info (search_ext).
The issue arose because the local variable types were not used in a 
standardized way, causing their references to dangle once the function ended.
To fix this, the patch changes the argument type of get_riscv_ext_info to 
`const char *`, thereby eliminating the warnings.

Changes for v2:
- Change the argument type of get_riscv_ext_info to `const char *` to eliminate 
the warnings.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (get_riscv_ext_info): Fix 
argument type.
(riscv_subset_list::check_implied_ext): Type conversion.

---
 gcc/common/config/riscv/riscv-common.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 53ca03910b38..c843393998cb 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -215,7 +215,7 @@ static const std::unordered_map riscv_ext_infos
 };

 static const riscv_ext_info_t &
-get_riscv_ext_info (const std::string &ext)
+get_riscv_ext_info (const char * ext)
 {
   auto itr = riscv_ext_infos.find (ext);
   if (itr == riscv_ext_infos.end ())
@@ -1112,7 +1112,7 @@ riscv_subset_list::check_implied_ext ()
   for (itr = m_head; itr != NULL; itr = itr->next)
 {
   auto &ext = *itr;
-  auto &ext_info = get_riscv_ext_info (ext.name);
+  auto &ext_info = get_riscv_ext_info (ext.name.c_str ());
   for (auto &implied_ext : ext_info.implied_exts ())
{
  if (!implied_ext.match (this))
--
2.43.0

[PATCH v2 1/2] The following changes enable P8700 processor for RISCV and P8700 is a high-performance processor from MIPS by extending RISCV with custom instructions.

2025-05-19 Thread Umesh Kalappa

---
 gcc/config/riscv/mips-p8700.md   | 139 +++
 gcc/config/riscv/riscv-cores.def |   5 ++
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.cc|  22 +
 gcc/config/riscv/riscv.md|   3 +-
 5 files changed, 170 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/mips-p8700.md

diff --git a/gcc/config/riscv/mips-p8700.md b/gcc/config/riscv/mips-p8700.md
new file mode 100644
index 000..11d0b1ca793
--- /dev/null
+++ b/gcc/config/riscv/mips-p8700.md
@@ -0,0 +1,139 @@
+;; DFA-based pipeline description for MIPS P8700.
+;;
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "mips_p8700_agen_alq_pipe, mips_p8700_mdu_pipe, 
mips_p8700_fpu_pipe")
+
+;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes
+(define_cpu_unit "mips_p8700_agq, mips_p8700_al2, mips_p8700_ctistd, 
mips_p8700_lsu"
+"mips_p8700_agen_alq_pipe")
+
+(define_cpu_unit "mips_p8700_gpmul, mips_p8700_gpdiv" "mips_p8700_mdu_pipe")
+
+;; The arithmetic-logic-unit queue (ALQ) has ALU pipe
+(define_cpu_unit "mips_p8700_alq, mips_p8700_alu" "mips_p8700_agen_alq_pipe")
+
+;; The floating-point-unit queue (FPQ) has short and long pipes
+(define_cpu_unit "mips_p8700_fpu_short, mips_p8700_fpu_long" 
"mips_p8700_fpu_pipe")
+
+;; Long FPU pipeline.
+(define_cpu_unit "mips_p8700_fpu_apu" "mips_p8700_fpu_pipe")
+
+(define_reservation "mips_p8700_agq_al2" "mips_p8700_agq, mips_p8700_al2")
+(define_reservation "mips_p8700_agq_ctistd" "mips_p8700_agq, 
mips_p8700_ctistd")
+(define_reservation "mips_p8700_agq_lsu" "mips_p8700_agq, mips_p8700_lsu")
+(define_reservation "mips_p8700_alq_alu" "mips_p8700_alq, mips_p8700_alu")
+
+;;
+;; FPU pipe
+;;
+
+(define_insn_reservation "mips_p8700_fpu_fadd" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fabs" 2
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcmp,fmove"))
+  "mips_p8700_fpu_short, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fload" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpload"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fstore" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpstore"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmadd" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmul" 5
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmul"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_div" 17
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fdiv,fsqrt"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu*17")
+
+(define_insn_reservation "mips_p8700_fpu_fcvt" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcvt,fcvt_i2f,fcvt_f2i"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmtc" 7
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "mtc"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmfc" 7
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "mfc"))
+  "mips_p8700_agq_lsu")
+
+;;
+;; Integer pipe
+;;
+
+(define_insn_reservation "mips_p8700_int_load" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "load"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_int_store" 3
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "store"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_int_arith_1" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,logical,move,bitmanip,min,max,minu,maxu,clz,ctz,rotate,atomic,condmove,crypto,mvpair,zicond"))
+  "mips_p8700_alq_alu | mips_p8700_agq_al2")
+
+(define_insn_reservation "mips_p8700_int_nop" 0
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "nop"))
+  "mips_p8700_alq_alu | mips_p8700_agq_al2")
+
+(define_insn_reservation "mips_p8700_dsp_mult" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "imul,cpop,clmul

Re: [PATCH] RISC-V: Fix the warning of temporary object dangling references.

2025-05-19 Thread Dongyan Chen

I fix the code by changing the argument type of get_riscv_ext_info to 
`const char *` and the link is: 
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684057.html


在 2025/5/16 10:35, Kito Cheng 写道:

Hm, it really doesn't make too much sense to get that warning, but
I can reproduce that when I compile with gcc 13 (and newer)...and
seems like a known issue [1][2]...

However I don't really like that approach, could you change the
argument type of get_riscv_ext_info to `const char *` to suppress that
warning instead?

```diff
diff --git a/gcc/common/config/riscv/riscv-common.cc
b/gcc/common/config/riscv/riscv-common.cc
index 53ca03910b3..a3105c851d6 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -214,8 +214,8 @@ static const std::unordered_map riscv_ext_infos
#undef DEFINE_RISCV_EXT
};

-static const riscv_ext_info_t &
-get_riscv_ext_info (const std::string &ext)
+static inline const riscv_ext_info_t &
+get_riscv_ext_info (const char *ext)
{
   auto itr = riscv_ext_infos.find (ext);
   if (itr == riscv_ext_infos.end ())
```

[1] 
https://stackoverflow.com/questions/78759847/gcc-14-possibly-dangling-reference-to-a-temporary-warning-or-not-depending-on
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107532

[PATCH] cobol: fix minor grammar in comments

2025-05-19 Thread pulk66-s

---
 gcc/cobol/lexio.cc | 2 +-
 gcc/cobol/parse.y  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cobol/lexio.cc b/gcc/cobol/lexio.cc
index 2db1af273e9..4f68bf65887 100644
--- a/gcc/cobol/lexio.cc
+++ b/gcc/cobol/lexio.cc
@@ -1455,7 +1455,7 @@ cdftext::lex_open( const char filename[] ) {
 
   int output = open_output();
 
-  // Process any files supplied by the -include comamnd-line option.
+  // Process any files supplied by the -include command-line option.
   for( auto name : included_files ) {
 int input;
 if( -1 == (input = open(name, O_RDONLY)) ) {
diff --git a/gcc/cobol/parse.y b/gcc/cobol/parse.y
index cb96c907361..1df9a9a63e9 100644
--- a/gcc/cobol/parse.y
+++ b/gcc/cobol/parse.y
@@ -5038,7 +5038,7 @@ accept: accept_body end_accept {
  switch( $accept_body.func ) {
  case accept_done_e:
error_msg(@ec, "ON EXCEPTION valid only "
-   "with ENVIRONMENT or COMAMND-LINE(n)");
+   "with ENVIRONMENT or COMMAND-LINE(n)");
break;
  case accept_command_line_e:
if( $1.from->field == NULL ) { // take next command-line arg
@@ -5050,7 +5050,7 @@ accept: accept_body end_accept {
  parser_move(*$1.into, *$1.from);
  if( $ec.on_error || $ec.not_error ) {
error_msg(@ec, "ON EXCEPTION valid only "
-   "with ENVIRONMENT or COMAMND-LINE(n)");
+   "with ENVIRONMENT or COMMAND-LINE(n)");
  }
} else {
  parser_accept_command_line(*$1.into, *$1.from,
-- 
2.45.2

Re: [PATCH 3/5] ipa: Dump cgraph_node UID instead of order into ipa-clones dump file

2025-05-19 Thread Martin Jambor

Hello,

On Thu, May 15 2025, Jan Hubicka wrote:
>> Hi,
>> 
>> starting with GCC 15 the order is not unique for any symtab_nodes but
>> m_uid is, I believe we ought to dump the latter in the ipa-clones dump,
>> if only so that people can reliably match entries about new clones to
>> those about removed nodes (if any).
>> 
>> gcc/ChangeLog:
>> 
>> 2025-04-23  Martin Jambor  
>> 
>>  * cgraph.h (symtab_node): Make member function get_uid const.
>>  * cgraphclones.cc (dump_callgraph_transformation): Dump m_uid of the
>>  call graph nodes instead of order.
>>  * cgraph.cc (cgraph_node::remove): Likewise.
> OK,

Thank you.  I have incorporated the other cases spotted by Michal and
committed the following patch (which I plan to backport to gcc-15 next
week).

Thanks again to both of you,

Martin



Since starting from GCC 15 the order is not unique for any
symtab_nodes but m_uid is, I believe we ought to dump the latter in
the ipa-clones dump, if only so that people can reliably match entries
about new clones to those about removed nodes (if any).

This patch also contains a fixes to a few other places where we have
so far dumped order to our ordinary dumps and which have been
identified by Michal Jires.

gcc/ChangeLog:

2025-05-16  Martin Jambor  

* cgraph.h (symtab_node): Make member function get_uid const.
* cgraphclones.cc (dump_callgraph_transformation): Dump m_uid of the
call graph nodes instead of order.
* cgraph.cc (cgraph_node::remove): Likewise.
* ipa-cp.cc (ipcp_lattice::print): Likewise.
* ipa-sra.cc (ipa_sra_summarize_function): Likewise.
* symtab.cc (symtab_node::dump_base): Likewise.

Co-Authored-By: Michal Jires 
---
 gcc/cgraph.cc   | 2 +-
 gcc/cgraph.h| 2 +-
 gcc/cgraphclones.cc | 4 ++--
 gcc/ipa-cp.cc   | 2 +-
 gcc/ipa-sra.cc  | 2 +-
 gcc/symtab.cc   | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 1a2ec38374a..ac0f2519361 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1879,7 +1879,7 @@ cgraph_node::remove (void)
   clone_info *info, saved_info;
   if (symtab->ipa_clones_dump_file && symtab->cloned_nodes.contains (this))
 fprintf (symtab->ipa_clones_dump_file,
-"Callgraph removal;%s;%d;%s;%d;%d\n", asm_name (), order,
+"Callgraph removal;%s;%d;%s;%d;%d\n", asm_name (), get_uid (),
 DECL_SOURCE_FILE (decl), DECL_SOURCE_LINE (decl),
 DECL_SOURCE_COLUMN (decl));
 
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index f4ee29e998c..8dbe36eac09 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -493,7 +493,7 @@ public:
   static inline void checking_verify_symtab_nodes (void);
 
   /* Get unique identifier of the node.  */
-  inline int get_uid ()
+  inline int get_uid () const
   {
 return m_uid;
   }
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index b45ac497733..c160e8b6985 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -324,11 +324,11 @@ dump_callgraph_transformation (const cgraph_node 
*original,
 {
   fprintf (symtab->ipa_clones_dump_file,
   "Callgraph clone;%s;%d;%s;%d;%d;%s;%d;%s;%d;%d;%s\n",
-  original->asm_name (), original->order,
+  original->asm_name (), original->get_uid (),
   DECL_SOURCE_FILE (original->decl),
   DECL_SOURCE_LINE (original->decl),
   DECL_SOURCE_COLUMN (original->decl), clone->asm_name (),
-  clone->order, DECL_SOURCE_FILE (clone->decl),
+  clone->get_uid (), DECL_SOURCE_FILE (clone->decl),
   DECL_SOURCE_LINE (clone->decl), DECL_SOURCE_COLUMN (clone->decl),
   suffix);
 
diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index b41148c74de..f06ac46dfff 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -288,7 +288,7 @@ ipcp_lattice::print (FILE * f, bool dump_sources, 
bool dump_benefits)
  else
fprintf (f, " [scc: %i, from:", val->scc_no);
  for (s = val->sources; s; s = s->next)
-   fprintf (f, " %i(%f)", s->cs->caller->order,
+   fprintf (f, " %i(%f)", s->cs->caller->get_uid (),
 s->cs->sreal_frequency ().to_double ());
  fprintf (f, "]");
}
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 1331ba49b50..88bfae9502c 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4644,7 +4644,7 @@ ipa_sra_summarize_function (cgraph_node *node)
 {
   if (dump_file)
 fprintf (dump_file, "Creating summary for %s/%i:\n", node->name (),
-node->order);
+node->get_uid ());
   gcc_obstack_init (&gensum_obstack);
   loaded_decls = new hash_set;
 
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index fe9c031247f..fc1155f4696 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -989,10 +989,10 @@ symtab_node::dump_base (FILE *f)
 same_comdat_group->dump_asm_name ());
   if (next_sharing_asm_name)
 fprintf (f, "  next sharing as

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
I would appreciate a short explanation on the approach being put here,
in the message. Like passing -1 as means of saying, size not know.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
>From the non-stylistic changes, I have noticed that we need some explicit
conversion between different types.
I think we should add test that would check with size being __max_diff_t,
maybe we should add such a range to
testutils_iterators.
Rest of the of comments are mostly stylistic.


> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
> Define.
> (__ends_with_fn, ends_with): Define.
> * include/bits/version.def (ranges_starts_ends_with): Define.
> * include/bits/version.h: Regenerate.
> * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
> * src/c++23/std.cc.in (ranges::starts_with): Export.
> (ranges::ends_with): Export.
> * testsuite/25_algorithms/ends_with/1.cc: New test.
> * testsuite/25_algorithms/starts_with/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/algorithm|   1 +
>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>  7 files changed, 512 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f36e7dd59911..c59a555f528a 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -438,6 +438,238 @@ namespace ranges
>
>inline constexpr __search_n_fn search_n{};
>
> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> +  struct __starts_with_fn
> +  {
> +template _Sent1,
> +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1,
> _Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1,
> +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   iter_difference_t<_Iter1> __n1 = -1;
> +   iter_difference_t<_Iter2> __n2 = -1;
> +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> + __n1 = __last1 - __first1;
> +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> + __n2 = __last2 - __first2;
> +   return _S_impl(std::move(__first1), __last1,
> +  std::move(__first2), __last2,
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2),
> +  __n1, __n2);
> +  }
> +
> +template +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable,
> iterator_t<_Range2>,
> +_Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   range_difference_t<_Range1> __n1 = -1;
> +   range_difference_t<_Range1> __n2 = -1;
> +   if constexpr (sized_range<_Range1>)
> + __n1 = ranges::size(__r1);
> +   if constexpr (sized_range<_Range2>)
> + __n2 = ranges::size(__r2);
> +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
> +  ranges::begin(__r2), ranges::end(__r2),
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2),
> +  __n1, __n2);
> +  }
> +
> +template _Sent2,
> +typename _Pred,
> +typename _Proj1, typename _Proj2>
> +  static constexpr bool
> +  _S_impl(_Iter1 __first1, _Sent1 __last1,
>
I think I would make this function private.  The user should not look into
wrappers,
but it does need to be public.
I have slight preference for ordering arguments in following manner:
first1, last1, n1,
first2, last2, n2,

> + _Iter2 __first2, _Sent2 __last2,
> + _Pred __pred,
> + _Proj1 __proj1, _Proj2 __proj2,
> + iter_difference_t<_Iter1> __n1,
> + iter_difference_t<_Iter2> __n2)
> +  {
>
+   if (__n1 != -1 && __n2 != -1)
>
Very subjective, but I would other i

Re: [PATCH] libstdc++: Fix some Clang -Wsystem-headers warnings in

2025-05-19 Thread Tomasz Kaminski

On Fri, May 16, 2025 at 7:35 PM Jonathan Wakely  wrote:

> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (_ZipTransform::operator()): Remove name of
> unused parameter.
> (chunk_view::_Iterator, stride_view::_Iterator): Likewise.
> (join_with_view): Declare _Iterator and _Sentinel as class
> instead of struct.
> (repeat_view): Declare _Iterator as class instead of struct.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/std/ranges | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 9300c364a165..210ac8274fc1 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -5336,7 +5336,7 @@ namespace views::__adaptor
> requires move_constructible> &&
> regular_invocable&>
>   && is_object_v&>>>
> constexpr auto
> -   operator() [[nodiscard]] (_Fp&& __f) const
> +   operator() [[nodiscard]] (_Fp&&) const
> {
>   return views::empty&>>>;
> }
> @@ -6598,7 +6598,7 @@ namespace views::__adaptor
>  }
>
>  friend constexpr difference_type
> -operator-(default_sentinel_t __y, const _Iterator& __x)
> +operator-(default_sentinel_t, const _Iterator& __x)
>requires sized_sentinel_for, iterator_t<_Base>>
>  { return __detail::__div_ceil(__x._M_end - __x._M_current, __x._M_n);
> }
>
> @@ -7287,8 +7287,8 @@ namespace views::__adaptor
> using iterator_category = decltype(_S_iter_cat());
>  };
>
> -template struct _Iterator;
> -template struct _Sentinel;
> +template class _Iterator;
> +template class _Sentinel;
>
>public:
>  join_with_view() requires (default_initializable<_Vp>
> @@ -7743,7 +7743,7 @@ namespace views::__adaptor
>  __detail::__box<_Tp> _M_value;
>  [[no_unique_address]] _Bound _M_bound = _Bound();
>
> -struct _Iterator;
> +class _Iterator;
>
>  template
>  friend constexpr auto
> @@ -8303,7 +8303,7 @@ namespace views::__adaptor
>  }
>
>  friend constexpr difference_type
> -operator-(default_sentinel_t __y, const _Iterator& __x)
> +operator-(default_sentinel_t, const _Iterator& __x)
>requires sized_sentinel_for, iterator_t<_Base>>
>  { return __detail::__div_ceil(__x._M_end - __x._M_current,
> __x._M_stride); }
>
> --
> 2.49.0
>
>

Re: [PATCH] libstdc++: Fix std::format of chrono::local_days with {} [PR120293]

2025-05-19 Thread Tomasz Kaminski

On Fri, May 16, 2025 at 7:33 PM Jonathan Wakely  wrote:

> Formatting of chrono::local_days with an empty chrono-specs should be
> equivalent to inserting it into an ostream, which should use the
> overload for inserting chrono::sys_days into an ostream. The
> implementation of empty chrono-specs in _M_format_to_ostream takes some
> short cuts, and that wasn't being done correctly for chrono::local_days.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/120293
> * include/bits/chrono_io.h (_M_format_to_ostream): Add special
> case for local_time convertible to local_days.
> * testsuite/std/time/clock/local/io.cc: Check formatting of
> chrono::local_days.
> ---
>
> Tested x86_64-linux.
>
LGTM, thanks.

>
>  libstdc++-v3/include/bits/chrono_io.h | 3 +++
>  libstdc++-v3/testsuite/std/time/clock/local/io.cc | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index ace8b9f26292..92a3098e808c 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -766,6 +766,9 @@ namespace __format
>   // sys_time with period greater or equal to days:
>   if constexpr (is_convertible_v<_Tp, chrono::sys_days>)
> __os << _S_date(__t);
> + // Or a local_time with period greater or equal to days:
> + else if constexpr (is_convertible_v<_Tp,
> chrono::local_days>)
> +   __os << _S_date(__t);
>   else // Or it's formatted as "{:L%F %T}":
> {
>   auto __days = chrono::floor(__t);
> diff --git a/libstdc++-v3/testsuite/std/time/clock/local/io.cc
> b/libstdc++-v3/testsuite/std/time/clock/local/io.cc
> index b4d562f36d12..67818e876497 100644
> --- a/libstdc++-v3/testsuite/std/time/clock/local/io.cc
> +++ b/libstdc++-v3/testsuite/std/time/clock/local/io.cc
> @@ -89,6 +89,9 @@ test_format()
>
>s = std::format("{}", local_seconds{});
>VERIFY( s == "1970-01-01 00:00:00" );
> +
> +  s = std::format("{}", local_days{}); // PR libstdc++/120293
> +  VERIFY( s == "1970-01-01" );
>  }
>
>  void
> --
> 2.49.0
>
>

Re: [PATCH v2] driver: Fix multilib_os_dir and multiarch_dir for those target use TARGET_COMPUTE_MULTILIB

2025-05-19 Thread Jin Ma

On Sun, 16 Mar 2025 11:23:07 -0600, Jeff Law wrote:
> 
> 
> On 3/10/25 2:26 AM, Kito Cheng wrote:
> > This patch fixes the multilib_os_dir and multiarch_dir for those targets
> > that use TARGET_COMPUTE_MULTILIB, since the TARGET_COMPUTE_MULTILIB hook
> > only update/fix the multilib_dir but not the multilib_os_dir and 
> > multiarch_dir,
> > so the multilib_os_dir and multiarch_dir are not set correctly for those 
> > targets.
> Thankfully only RISC-V defines TARGET_COMPUTE_MULTILIB.  Though that may 
> be an argument we should look to avoid whatever magic we're doing in there.
> 
> 
> > 
> > Use RISC-V linux target (riscv64-unknown-linux-gnu) as an example:
> > 
> > ```
> > $ riscv64-unknown-linux-gnu-gcc -print-multi-lib
> > .;
> > lib32/ilp32;@march=rv32imac@mabi=ilp32
> > lib32/ilp32d;@march=rv32imafdc@mabi=ilp32d
> > lib64/lp64;@march=rv64imac@mabi=lp64
> > lib64/lp64d;@march=rv64imafdc@mabi=lp64d
> > ```
> > 
> > If we use the exactly same -march and -mabi options to compile a source 
> > file,
> > the multilib_os_dir and multiarch_dir are set correctly:
> > 
> > ```
> > $ riscv64-unknown-linux-gnu-gcc -print-multi-os-directory -march=rv64imafdc 
> > -mabi=lp64d
> > ../lib64/lp64d
> > $ riscv64-unknown-linux-gnu-gcc -print-multi-directory -march=rv64imafdc 
> > -mabi=lp64d
> > lib64/lp64d
> > ```
> > 
> > However if we use the -march=rv64imafdcv -mabi=lp64d option to compile a 
> > source
> > file, the multilib_os_dir and multiarch_dir are not set correctly:
> > ```
> > $ riscv64-unknown-linux-gnu-gcc -print-multi-os-directory -march=rv64imafdc 
> > -mabi=lp64d
> > lib64/lp64d
> > $ riscv64-unknown-linux-gnu-gcc -print-multi-directory -march=rv64imafdc 
> > -mabi=lp64d
> > lib64/lp64d
> > ```
> > 
> > That's because the TARGET_COMPUTE_MULTILIB hook only update/fix the 
> > multilib_dir
> > but not the multilib_os_dir, so the multilib_os_dir is blank and will use 
> > same
> > value as multilib_dir, but that is not correct.
> > 
> > So we introduce second chance to fix the multilib_os_dir if it's not set, 
> > we do
> > also try to fix the multiarch_dir, because it may also not set correctly if
> > multilib_os_dir is not set.
> > 
> > Changes since v1:
> > - Fix non-multilib build.
> > - Fix fix indentation.
> > 
> > gcc/ChangeLog:
> > 
> > * gcc.c (find_multilib_os_dir_by_multilib_dir): New.
> > (set_multilib_dir): Fix multilib_os_dir and multiarch_dir
> > if multilib_os_dir is not set.
> Given the fact this code is shared and I don't have a good handle on its 
> behavior and how the change potentially affects other targets, I'm 
> inclined to ask for this to wait for gcc-16 development to open and 
> backport into gcc-15.2 after soak time on the trunk.
> 
> Jeff

Hi,I think this patch is essential. Can we proceed to push it to the trunk now?

Best regards,
Jin Ma

Re: [PATCH v2] RISC-V: Fix the warning of temporary object dangling references.

2025-05-19 Thread Kito Cheng

Pushed to trunk :)

On Mon, May 19, 2025 at 3:18 PM Dongyan Chen
 wrote:
>
> During the GCC compilation, some warnings about temporary object dangling 
> references emerged. They appeared in these code lines in riscv-common.cc:
> const riscv_ext_info_t &implied_ext_info, const riscv_ext_info_t &ext_info = 
> get_riscv_ext_info (ext) and auto &ext_info = get_riscv_ext_info (search_ext).
> The issue arose because the local variable types were not used in a 
> standardized way, causing their references to dangle once the function ended.
> To fix this, the patch changes the argument type of get_riscv_ext_info to 
> `const char *`, thereby eliminating the warnings.
>
> Changes for v2:
> - Change the argument type of get_riscv_ext_info to `const char *` to 
> eliminate the warnings.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (get_riscv_ext_info): Fix 
> argument type.
> (riscv_subset_list::check_implied_ext): Type conversion.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 53ca03910b38..c843393998cb 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -215,7 +215,7 @@ static const std::unordered_map riscv_ext_info_t> riscv_ext_infos
>  };
>
>  static const riscv_ext_info_t &
> -get_riscv_ext_info (const std::string &ext)
> +get_riscv_ext_info (const char * ext)
>  {
>auto itr = riscv_ext_infos.find (ext);
>if (itr == riscv_ext_infos.end ())
> @@ -1112,7 +1112,7 @@ riscv_subset_list::check_implied_ext ()
>for (itr = m_head; itr != NULL; itr = itr->next)
>  {
>auto &ext = *itr;
> -  auto &ext_info = get_riscv_ext_info (ext.name);
> +  auto &ext_info = get_riscv_ext_info (ext.name.c_str ());
>for (auto &implied_ext : ext_info.implied_exts ())
> {
>   if (!implied_ext.match (this))
> --
> 2.43.0
>

[PATCH 2/2] RISC-V:Add testcases for signed .SAT_ADD IMM form 1 with IMM = -1.

2025-05-19 Thread Li Xu

From: xuli 

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_s_add_imm-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-2-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i8.c: ...here.
---
 ...at_s_add_imm-2.c => sat_s_add_imm-1-i16.c} | 27 ++-
 ...at_s_add_imm-3.c => sat_s_add_imm-1-i32.c} | 26 +-
 ...at_s_add_imm-4.c => sat_s_add_imm-1-i64.c} | 22 ++-
 ...sat_s_add_imm-1.c => sat_s_add_imm-1-i8.c} | 22 ++-
 ..._imm-run-2.c => sat_s_add_imm-run-1-i16.c} |  6 +
 ..._imm-run-3.c => sat_s_add_imm-run-1-i32.c} |  6 +
 ..._imm-run-4.c => sat_s_add_imm-run-1-i64.c} |  6 +
 ...d_imm-run-1.c => sat_s_add_imm-run-1-i8.c} |  6 +
 ...2-1.c => sat_s_add_imm_type_check-1-i16.c} |  0
 ...3-1.c => sat_s_add_imm_type_check-1-i32.c} |  0
 ...-1-1.c => sat_s_add_imm_type_check-1-i8.c} |  0
 11 files changed, 117 insertions(+), 4 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2.c => 
sat_s_add_imm-1-i16.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3.c => 
sat_s_add_imm-1-i32.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-4.c => 
sat_s_add_imm-1-i64.c} (55%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1.c => 
sat_s_add_imm-1-i8.c} (57%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-2.c => 
sat_s_add_imm-run-1-i16.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-3.c => 
sat_s_add_imm-run-1-i32.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-4.c => 
sat_s_add_imm-run-1-i64.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-1.c => 
sat_s_add_imm-run-1-i8.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2-1.c => 
sat_s_add_imm_type_check-1-i16.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3-1.c => 
sat_s_add_imm_type_check-1-i32.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1-1.c => 
sat_s_add_imm_type_check-1-i8.c} (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c 
b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
similarity index 53%
rename from gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
rename to gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
index 3878286d207..2e23af5d86b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
@@ -29,4 +29,29 @@
 */
 DEF_SAT_S_ADD_IMM_FMT_1(0, int16_t, uint16_t, -7, INT16_MIN, INT16_MAX)
 
-/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
+/*
+** sat_s_add_imm_int16_t_fmt_1_1:
+** addi\s+[atx][0-9]+,\s*a0,\s*-1
+** not\s+[atx][0-9]+,\s*a0
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+a0,\s*a0,\s*63
+** li\s+[atx][0-9]+,\s*32768
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** xor\

[PATCH 1/2] Match:Support signed vector SAT_ADD IMM form 1

2025-05-19 Thread Li Xu

From: xuli 

This patch would like to support vector SAT_ADD when one of the op
is singed IMM.

void __attribute__((noinline))   \
vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T sum = (UT)x + (UT)IMM;   \
  out[i] = (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}\
}

Take below form1 as example:
DEF_VEC_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * restrict 
op_1, unsigned int limit)
{
  vector([16,16]) signed char * vectp_out.28;
  vector([16,16]) signed char vect_iftmp.27;
  vector([16,16])  mask__28.26;
  vector([16,16])  mask__29.25;
  vector([16,16])  mask__19.19;
  vector([16,16])  mask__31.18;
  vector([16,16]) signed char vect__6.17;
  vector([16,16]) signed char vect__5.16;
  vector([16,16]) signed char vect_sum_15.15;
  vector([16,16]) unsigned char vect__4.14;
  vector([16,16]) unsigned char vect_x.13;
  vector([16,16]) signed char vect_x_14.12;
  vector([16,16]) signed char * vectp_op_1.10;
  vector([16,16])  _78;
  vector([16,16]) unsigned char _79;
  vector([16,16]) unsigned char _80;
  unsigned long _92;
  unsigned long ivtmp_93;
  unsigned long ivtmp_94;
  unsigned long _95;

   [local count: 118111598]:
  if (limit_12(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119322]:
  _92 = (unsigned long) limit_12(D);

   [local count: 955630226]:
  # vectp_op_1.10_62 = PHI 
  # vectp_out.28_89 = PHI 
  # ivtmp_93 = PHI 
  _95 = .SELECT_VL (ivtmp_93, POLY_INT_CST [16, 16]);
  vect_x_14.12_64 = .MASK_LEN_LOAD (vectp_op_1.10_62, 8B, { -1, ... }, _95, 0);
  vect_x.13_65 = VIEW_CONVERT_EXPR(vect_x_14.12_64);
  vect__4.14_67 = vect_x.13_65 + { 9, ... };
  vect_sum_15.15_68 = VIEW_CONVERT_EXPR(vect__4.14_67);
  vect__5.16_70 = vect_x_14.12_64 ^ { 9, ... };
  vect__6.17_71 = vect_x_14.12_64 ^ vect_sum_15.15_68;
  mask__31.18_73 = vect__5.16_70 >= { 0, ... };
  mask__19.19_75 = vect_x_14.12_64 < { 0, ... };
  mask__29.25_85 = vect__6.17_71 < { 0, ... };
  mask__28.26_86 = mask__31.18_73 & mask__29.25_85;
  _78 = ~mask__28.26_86;
  _79 = .VCOND_MASK (mask__19.19_75, { 128, ... }, { 127, ... });
  _80 = .COND_ADD (_78, vect_x.13_65, { 9, ... }, _79);
  vect_iftmp.27_87 = VIEW_CONVERT_EXPR(_80);
  .MASK_LEN_STORE (vectp_out.28_89, 8B, { -1, ... }, _95, 0, vect_iftmp.27_87);
  vectp_op_1.10_63 = vectp_op_1.10_62 + _95;
  vectp_out.28_90 = vectp_out.28_89 + _95;
  ivtmp_94 = ivtmp_93 - _95;
  if (ivtmp_94 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 118111600]:
  return;

}

After this patch:
__attribute__((noinline))
void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * restrict 
op_1, unsigned int limit)
{
  vector([16,16]) signed char * vectp_out.12;
  vector([16,16]) signed char vect_patt_10.11;
  vector([16,16]) signed char vect_x_14.10;
  vector([16,16]) signed char D.2852;
  vector([16,16]) signed char * vectp_op_1.8;
  vector([16,16]) signed char _73(D);
  unsigned long _80;
  unsigned long ivtmp_81;
  unsigned long ivtmp_82;
  unsigned long _83;

   [local count: 118111598]:
  if (limit_12(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119322]:
  _80 = (unsigned long) limit_12(D);

   [local count: 955630226]:
  # vectp_op_1.8_71 = PHI 
  # vectp_out.12_77 = PHI 
  # ivtmp_81 = PHI 
  _83 = .SELECT_VL (ivtmp_81, POLY_INT_CST [16, 16]);
  vect_x_14.10_74 = .MASK_LEN_LOAD (vectp_op_1.8_71, 8B, { -1, ... }, _73(D), 
_83, 0);
  vect_patt_10.11_75 = .SAT_ADD (vect_x_14.10_74, { 9, ... });
  .MASK_LEN_STORE (vectp_out.12_77, 8B, { -1, ... }, _83, 0, 
vect_patt_10.11_75);
  vectp_op_1.8_72 = vectp_op_1.8_71 + _83;
  vectp_out.12_78 = vectp_out.12_77 + _83;
  ivtmp_82 = ivtmp_81 - _83;
  if (ivtmp_82 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 118111600]:
  return;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 
gcc/ChangeLog:

* match.pd: add singned vector SAT_ADD IMM form1 matching.

---
 gcc/match.

[PATCH 2/2] RISC-V: Add testcases for signed vector SAT_ADD IMM form 1

2025-05-19 Thread Li Xu

From: xuli 

This patch adds testcase for form1, as shown below:

void __attribute__((noinline))   \
vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T sum = (UT)x + (UT)IMM;   \
  out[i] = (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}\
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add signed vec 
SAT_ADD IMM form1.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add sat_s_add_imm 
data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c: New 
test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c: New test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c: 
New test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h |  25 ++
 .../riscv/rvv/autovec/sat/vec_sat_data.h  | 240 ++
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c  |  10 +
 .../autovec/sat/vec_sat_s_add_imm-run-1-i16.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i32.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i64.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i8.c  |  28 ++
 .../sat/vec_sat_s_add_imm_type_check-1-i16.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i32.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i8.c   |  10 +
 13 files changed, 445 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
index 7db892cc2e9..ffdccd79b7a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
@@ -314,6 +314,31 @@ vec_sat_s_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_S_ADD_FMT_4_WRAP(T, out, op_1, op_2, N) \
   RUN_VEC_SAT_S_ADD_FMT_4(T, out, op_1, op_2, N)
 
+#define DEF_VEC_SAT_S_ADD_IMM_FMT_1(INDEX, T, UT, IMM, MIN, MAX) \
+void __attribute__((noinline))   \
+vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
+{\
+  unsigned i;

Re: [PATCH] RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cc

2025-05-19 Thread Kito Cheng

Committed to trunk, thanks :)

On Mon, May 19, 2025 at 10:44 AM Songhe Zhu
 wrote:
>
> From: zhusonghe 
>
> The variables `major` and `minor` in `gen-riscv-ext-texi.cc`
> conflict with the macros of the same name defined in ``,
> which are exposed when building with newer versions of GCC on older
> Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them
> to `major_version` and `minor_version` respectively. This aligns with the
> GCC community's recommended practice [1] and improves code clarity.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683881.html
>
> gcc/ChangeLog:
>
> * config/riscv/gen-riscv-ext-texi.cc (struct version_t):rename
> major/minor to major_version/minor_version.
>
> Signed-off-by: Songhe Zhu 
> ---
>  gcc/config/riscv/gen-riscv-ext-texi.cc | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/riscv/gen-riscv-ext-texi.cc 
> b/gcc/config/riscv/gen-riscv-ext-texi.cc
> index e15fdbf36f6..c29a375d56c 100644
> --- a/gcc/config/riscv/gen-riscv-ext-texi.cc
> +++ b/gcc/config/riscv/gen-riscv-ext-texi.cc
> @@ -6,22 +6,22 @@
>
>  struct version_t
>  {
> -  int major;
> -  int minor;
> +  int major_version;
> +  int minor_version;
>version_t (int major, int minor,
>  enum riscv_isa_spec_class spec = ISA_SPEC_CLASS_NONE)
> -: major (major), minor (minor)
> +: major_version (major), minor_version (minor)
>{}
>bool operator<(const version_t &other) const
>{
> -if (major != other.major)
> -  return major < other.major;
> -return minor < other.minor;
> +if (major_version != other.major_version)
> +  return major_version < other.major_version;
> +return minor_version < other.minor_version;
>}
>
>bool operator== (const version_t &other) const
>{
> -return major == other.major && minor == other.minor;
> +return major_version == other.major_version && minor_version == 
> other.minor_version;
>}
>  };
>
> @@ -39,7 +39,7 @@ print_ext_doc_entry (const std::string &ext_name, const 
> std::string &full_name,
>printf ("@tab");
>for (const auto &version : unique_versions)
>  {
> -  printf (" %d.%d", version.major, version.minor);
> +  printf (" %d.%d", version.major_version, version.minor_version);
>  }
>printf ("\n");
>printf ("@tab %s", full_name.c_str ());
> --
> 2.17.1
>

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 11:29 AM Jonathan Wakely  wrote:

> On Mon, 19 May 2025 at 05:46, Patrick Palka  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > -- >8 --
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
> > Define.
> > (__ends_with_fn, ends_with): Define.
> > * include/bits/version.def (ranges_starts_ends_with): Define.
> > * include/bits/version.h: Regenerate.
> > * include/std/algorithm: Provide
> __cpp_lib_ranges_starts_ends_with.
> > * src/c++23/std.cc.in (ranges::starts_with): Export.
> > (ranges::ends_with): Export.
> > * testsuite/25_algorithms/ends_with/1.cc: New test.
> > * testsuite/25_algorithms/starts_with/1.cc: New test.
> > ---
> >  libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
> >  libstdc++-v3/include/bits/version.def |   8 +
> >  libstdc++-v3/include/bits/version.h   |  10 +
> >  libstdc++-v3/include/std/algorithm|   1 +
> >  libstdc++-v3/src/c++23/std.cc.in  |   4 +
> >  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
> >  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
> >  7 files changed, 512 insertions(+)
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> >
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> > index f36e7dd59911..c59a555f528a 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -438,6 +438,238 @@ namespace ranges
> >
> >inline constexpr __search_n_fn search_n{};
> >
> > +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> > +  struct __starts_with_fn
> > +  {
> > +template _Sent1,
> > +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> > +typename _Pred = ranges::equal_to,
> > +typename _Proj1 = identity, typename _Proj2 = identity>
> > +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1,
> _Proj2>
> > +  constexpr bool
> > +  operator()(_Iter1 __first1, _Sent1 __last1,
> > +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> > +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> > +  {
> > +   iter_difference_t<_Iter1> __n1 = -1;
> > +   iter_difference_t<_Iter2> __n2 = -1;
> > +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> > + __n1 = __last1 - __first1;
> > +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> > + __n2 = __last2 - __first2;
> > +   return _S_impl(std::move(__first1), __last1,
> > +  std::move(__first2), __last2,
> > +  std::move(__pred),
> > +  std::move(__proj1), std::move(__proj2),
> > +  __n1, __n2);
> > +  }
> > +
> > +template > +typename _Pred = ranges::equal_to,
> > +typename _Proj1 = identity, typename _Proj2 = identity>
> > +  requires indirectly_comparable,
> iterator_t<_Range2>,
> > +_Pred, _Proj1, _Proj2>
> > +  constexpr bool
> > +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> > +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> > +  {
> > +   range_difference_t<_Range1> __n1 = -1;
> > +   range_difference_t<_Range1> __n2 = -1;
> > +   if constexpr (sized_range<_Range1>)
> > + __n1 = ranges::size(__r1);
> > +   if constexpr (sized_range<_Range2>)
> > + __n2 = ranges::size(__r2);
> > +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
> > +  ranges::begin(__r2), ranges::end(__r2),
> > +  std::move(__pred),
> > +  std::move(__proj1), std::move(__proj2),
> > +  __n1, __n2);
> > +  }
> > +
> > +template typename _Sent2,
> > +typename _Pred,
> > +typename _Proj1, typename _Proj2>
> > +  static constexpr bool
> > +  _S_impl(_Iter1 __first1, _Sent1 __last1,
> > + _Iter2 __first2, _Sent2 __last2,
> > + _Pred __pred,
> > + _Proj1 __proj1, _Proj2 __proj2,
> > + iter_difference_t<_Iter1> __n1,
> > + iter_difference_t<_Iter2> __n2)
> > +  {
> > +   if (__n1 != -1 && __n2 != -1)
> > + {
> > +   if (__n1 < __n2)
> > + return false;
>
> For random access iterators we end up doing this comparison twice,
> because ranges::equal will also do if ((last1 - first1) < (last2 -
> first2).



> So in theory, we could check if both ranges are random access
> as the first thing we do, and then just call ranges::equal and let
> that check the sizes.

We need to trim the first range t

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Jonathan Wakely

On Mon, 19 May 2025 at 05:46, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
> Define.
> (__ends_with_fn, ends_with): Define.
> * include/bits/version.def (ranges_starts_ends_with): Define.
> * include/bits/version.h: Regenerate.
> * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
> * src/c++23/std.cc.in (ranges::starts_with): Export.
> (ranges::ends_with): Export.
> * testsuite/25_algorithms/ends_with/1.cc: New test.
> * testsuite/25_algorithms/starts_with/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/algorithm|   1 +
>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>  7 files changed, 512 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f36e7dd59911..c59a555f528a 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -438,6 +438,238 @@ namespace ranges
>
>inline constexpr __search_n_fn search_n{};
>
> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> +  struct __starts_with_fn
> +  {
> +template _Sent1,
> +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1,
> +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   iter_difference_t<_Iter1> __n1 = -1;
> +   iter_difference_t<_Iter2> __n2 = -1;
> +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> + __n1 = __last1 - __first1;
> +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> + __n2 = __last2 - __first2;
> +   return _S_impl(std::move(__first1), __last1,
> +  std::move(__first2), __last2,
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2),
> +  __n1, __n2);
> +  }
> +
> +template +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable, 
> iterator_t<_Range2>,
> +_Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   range_difference_t<_Range1> __n1 = -1;
> +   range_difference_t<_Range1> __n2 = -1;
> +   if constexpr (sized_range<_Range1>)
> + __n1 = ranges::size(__r1);
> +   if constexpr (sized_range<_Range2>)
> + __n2 = ranges::size(__r2);
> +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
> +  ranges::begin(__r2), ranges::end(__r2),
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2),
> +  __n1, __n2);
> +  }
> +
> +template _Sent2,
> +typename _Pred,
> +typename _Proj1, typename _Proj2>
> +  static constexpr bool
> +  _S_impl(_Iter1 __first1, _Sent1 __last1,
> + _Iter2 __first2, _Sent2 __last2,
> + _Pred __pred,
> + _Proj1 __proj1, _Proj2 __proj2,
> + iter_difference_t<_Iter1> __n1,
> + iter_difference_t<_Iter2> __n2)
> +  {
> +   if (__n1 != -1 && __n2 != -1)
> + {
> +   if (__n1 < __n2)
> + return false;

For random access iterators we end up doing this comparison twice,
because ranges::equal will also do if ((last1 - first1) < (last2 -
first2). So in theory, we could check if both ranges are random access
as the first thing we do, and then just call ranges::equal and let
that check the sizes. That would need to be the very first thing we do
in both oeprator() overloads, and we'd still need all the same logic
in _S_impl to handle the case where only one of the ranges is random
access. So I don't think it is worth doing - with optimization enabled
I hope that the redundant distance calculations and n1 < n2
com

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-19 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, 16 May 2025, Richard Sandiford wrote:
>> > The simple prototype below uses a separate flag from the epilogue
>> > mode, but I wonder how we want to more generally want to handle
>> > whether to use masking or not when iterating over modes.  Currently
>> > we mostly rely on --param vect-partial-vector-usage.  aarch64
>> > and riscv have both variable-length modes but also fixed-size modes
>> > where for the latter, like on x86, the target couldn't request
>> > a mode specifically with or without masking.  It seems both
>> > aarch64 and riscv fully rely on cost comparison and fully
>> > exploiting the mode iteration space (but not masked vs. non-masked?!)
>> > here?
>> >
>> > I was thinking of adding a vectorization_mode class that would
>> > encapsulate the mode and whether to allow masking or alternatively
>> > to make the vector_modes array (and the m_suggested_epilogue_mode)
>> > a std::pair of mode and mask flag?
>> 
>> Predicated vs. non-predicated SVE is interesting for the main loop.
>> The class sounds like it would be useful for that.
>> 
>> I suppose predicated vs. non-predicated SVE is also potentially
>> interesting for an unrolled epilogue, although there, it would in
>> theory be better to predicate only the last vector iteration
>> (i.e. part predicated, part unpredicated).
>
> Yes, the latter is what we want for AVX512, keep the main loop
> not predicated but have the epilog predicated (using the same VF).

Reading it back, what I said was very ambiguous (as usual, unfortunately).
What I actually meant was that if we had, say, a 4x unrolled main loop
and a 2x unrolled first epilogue loop, we'd in theory want the 2x
unrolled epilogue loop to use unpredicated operations for the first
VF/2 elements and predicted operations for the second VF/2 elements.

That way, we get the benefit of the 2x unrolling for residues of >VF
elements, but skip to a second epilogue if there are VF or fewer
remaining elements.

That example assumes that the last quarter of each iteration of the
main loop is predicated in a similar way, with the rest of the iteration
being unpredicated.

Alternatively, we could have a fully-unpredicated 2x unrolled main
loop followed by the same kind of semi-predicated 2x unrolled
epilogue loop.

So if U == unpredicated and P == predicated:

  main loop: U U U P
  1st epilogue loop: U P
  2nd epilogue loop: P

  1st and 2nd epilogues might both be used

or:

  main loop: U U
  1st epilogue loop: U P
  2nd epilogue loop: P

  1st and 2nd epilogues are mutually exclusive

although the epilogues don't need to loop in either case.

>> So I suppose unpredicated SVE epilogue loops might be interesting
>> until that partial predication is implemented, but I'm not sure how
>> useful unpredicated SVE epilogue loops would be "once" the partial
>> predication is supported.
>> 
>> I don't imagine we'll often know a priori for AArch64 which type
>> of vector epilogue is best.  Since switching between SVE and
>> Advanced SIMD is assumed to be essentially free, I think we'll
>> still rely on the current approach of costing both and seeing
>> which is cheaper.
>
> So the other case we might run into on x86 is if you have a
> known loop tripcount but fully vectorizing the epilogue is
> still not possible because while we have half-SSE, like V8QImode,
> we don't have V4QI or V2QI, so even with multiple epilogues
> we'd still end up with an iterating scalar epilog.  Those
> cases might be good candidates for a predicated epilog as well.
> So in the end we'd prefer branchless epilogues.

Yeah, branchless is also the aim of the schemes above.

> Predication on x86 is quite a bit more expensive so I don't see
> us using a predicated main vector loop anytime soon, and I'd
> expect that to be the case for all archs when using a fixed-size
> mode?  Is that the case for -msve-vector-bits=X as well?  Is
> there an advantage for not using a predicated main vector loop?

I think it depends on the size of the loop.  I've seen large HPC
loops for which the overhead of predication and loop control is
subsumed by the inherent complexity of the work, and duplicating
the whole thing would probably be counterproductive.

But yeah, for tighter loops, SVE should benefit from unpredicated
main loops too.

Thanks,
Richard

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-19 Thread Richard Sandiford

Richard Sandiford  writes:
> Jeff Law  writes:
>> So two questions.  Is there any meanginful performance impact expected 
>> here using the array form rather than locals?   And does this impact how 
>> folks write their C/C++ fragments in the expanders and such?
>
> I don't think there should be any compile-time impact, and I can't
> measure one when compiling fold-const.ii -O0 (my go-to test for this).
>
> The md interface remains the same, in that all interaction is via the
> the operands[] array.  Any writes to the individual operandN variables
> (where present) are ignored both before and after the patch.
>
> However, I suppose this does make it possible to turn the operandN
> arguments into constants, to prevent accidents.  I'll try that.

The only problem case seemed to be sparc.md, which uses operandN
variables in code that always invokes DONE:

(define_expand "zero_extendhisi2"
  [(set (match_operand:SI 0 "register_operand" "")
(zero_extend:SI (match_operand:HI 1 "register_operand" "")))]
  ""
{
  rtx temp = gen_reg_rtx (SImode);
  rtx shift_16 = GEN_INT (16);
  int op1_subbyte = 0;

  if (GET_CODE (operand1) == SUBREG)
{
  op1_subbyte = SUBREG_BYTE (operand1);
  op1_subbyte /= GET_MODE_SIZE (SImode);
  op1_subbyte *= GET_MODE_SIZE (SImode);
  operand1 = XEXP (operand1, 0);
}

  emit_insn (gen_ashlsi3 (temp, gen_rtx_SUBREG (SImode, operand1, op1_subbyte),
  shift_16));
  emit_insn (gen_lshrsi3 (operand0, temp, shift_16));
  DONE;
})

So I suppose the question is whether we want to continue to allow that,
or whether it would be better to flag accidental writes to operandN
instead of operands[N] for code that doesn't invoke DONE.

Richard

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-19 Thread Kugan Vivekanandarajah



> On 16 May 2025, at 12:10 am, Andi Kleen  wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Wed, May 14, 2025 at 02:46:15AM +, Kugan Vivekanandarajah wrote:
>> Adding Eugene and Andi to CC as Sam suggested.
>>
>>> On 13 May 2025, at 12:57 am, Richard Sandiford 
>> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Kugan Vivekanandarajah  writes:
 diff --git a/configure.ac b/configure.ac
 index 730db3c1402..701284e38f2 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -621,6 +621,14 @@ case "${target}" in
;;
 esac

 +autofdo_target="i386"
 +case "${target}" in
 +  aarch64-*-*)
 +autofdo_target="aarch64"
 +;;
 +esac
 +AC_SUBST(autofdo_target)
 +
 # Disable libssp for some systems.
 case "${target}" in
  avr-*-*)
>>>
>>> Couldn't we use the existing $cpu_type, rather than adding a new variable?
>>> I don't think the two would ever need to diverge.
>>
>> I tried doing this but looks to me that $cpu_type is available only in 
>> libgcc.
>> Am I missing something  or do you want me to replicate that here?
>
> I guess replicating is fine. btw while you are looking at this
> profiledbootstrap is currently disabled for various languages like
> fortran. It would be good to enable it everywhere.
>

Thanks. I've updated the patch to utilize cpu_type. While adding support for 
other languages would be great, I'm currently encountering issues with profile 
annotation in the autoprofile pass (my other patches and some pending patches). 
I'll prioritise resolving these annotation problems before adding other 
language support. However, if anyone else is interested in exploring the 
additional language support now, please feel free to do so.

Is this OK?

Thanks,
Kugan



> Andi




0004-AUTOFDOi_v2-AARCH64-Add-support-for-profilebootstrap.patch
Description: 0004-AUTOFDOi_v2-AARCH64-Add-support-for-profilebootstrap.patch

[r16-334 Regression] FAIL: gcc.dg/Wincompatible-pointer-types-1.c (test for excess errors) on Linux/x86_64

2025-05-19 Thread haochen.jiang

On Linux/x86_64,

b6d37ec1dd2a228d94e7b5b438f3aa53684316bc is the first bad commit
commit b6d37ec1dd2a228d94e7b5b438f3aa53684316bc
Author: Florian Weimer 
Date:   Thu May 1 19:06:45 2025 +0200

c: Suppress -Wdeprecated-non-prototype warnings for builtins

caused

FAIL: gcc.c-torture/compile/callind.c   -O0  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -O1  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -O2  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -O3 -g  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/compile/callind.c   -Os  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/callind.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O0  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O1  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O2  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -O3 -g  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/compile/pr51694.c   -Os  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/compile/pr51694.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O0  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O1  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O2  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -O3 -g  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/pr67037.c   -Os  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/pr67037.c   -Os  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93379.c   -O0  (internal compiler error: 
Segmentation fault)
FAIL: gcc.dg/analyzer/torture/pr93379.c   -O0  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93379.c   -O1  (internal compiler error: 
Segmentation fault)
FAIL: gcc.dg/analyzer/torture/pr93379.c   -O1  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93379.c   -O2 -flto -fno-use-linker-plugi

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Jonathan Wakely

On Mon, 19 May 2025 at 10:35, Tomasz Kaminski  wrote:
>
>
>
> On Mon, May 19, 2025 at 11:29 AM Jonathan Wakely  wrote:
>>
>> On Mon, 19 May 2025 at 05:46, Patrick Palka  wrote:
>> >
>> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>> >
>> > -- >8 --
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> > * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>> > Define.
>> > (__ends_with_fn, ends_with): Define.
>> > * include/bits/version.def (ranges_starts_ends_with): Define.
>> > * include/bits/version.h: Regenerate.
>> > * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
>> > * src/c++23/std.cc.in (ranges::starts_with): Export.
>> > (ranges::ends_with): Export.
>> > * testsuite/25_algorithms/ends_with/1.cc: New test.
>> > * testsuite/25_algorithms/starts_with/1.cc: New test.
>> > ---
>> >  libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
>> >  libstdc++-v3/include/bits/version.def |   8 +
>> >  libstdc++-v3/include/bits/version.h   |  10 +
>> >  libstdc++-v3/include/std/algorithm|   1 +
>> >  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>> >  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>> >  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>> >  7 files changed, 512 insertions(+)
>> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>> >
>> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
>> > b/libstdc++-v3/include/bits/ranges_algo.h
>> > index f36e7dd59911..c59a555f528a 100644
>> > --- a/libstdc++-v3/include/bits/ranges_algo.h
>> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
>> > @@ -438,6 +438,238 @@ namespace ranges
>> >
>> >inline constexpr __search_n_fn search_n{};
>> >
>> > +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>> > +  struct __starts_with_fn
>> > +  {
>> > +template _Sent1,
>> > +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>> > +typename _Pred = ranges::equal_to,
>> > +typename _Proj1 = identity, typename _Proj2 = identity>
>> > +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, 
>> > _Proj2>
>> > +  constexpr bool
>> > +  operator()(_Iter1 __first1, _Sent1 __last1,
>> > +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
>> > +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> > +  {
>> > +   iter_difference_t<_Iter1> __n1 = -1;
>> > +   iter_difference_t<_Iter2> __n2 = -1;
>> > +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
>> > + __n1 = __last1 - __first1;
>> > +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
>> > + __n2 = __last2 - __first2;
>> > +   return _S_impl(std::move(__first1), __last1,
>> > +  std::move(__first2), __last2,
>> > +  std::move(__pred),
>> > +  std::move(__proj1), std::move(__proj2),
>> > +  __n1, __n2);
>> > +  }
>> > +
>> > +template> > +typename _Pred = ranges::equal_to,
>> > +typename _Proj1 = identity, typename _Proj2 = identity>
>> > +  requires indirectly_comparable, 
>> > iterator_t<_Range2>,
>> > +_Pred, _Proj1, _Proj2>
>> > +  constexpr bool
>> > +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
>> > +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> > +  {
>> > +   range_difference_t<_Range1> __n1 = -1;
>> > +   range_difference_t<_Range1> __n2 = -1;
>> > +   if constexpr (sized_range<_Range1>)
>> > + __n1 = ranges::size(__r1);
>> > +   if constexpr (sized_range<_Range2>)
>> > + __n2 = ranges::size(__r2);
>> > +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
>> > +  ranges::begin(__r2), ranges::end(__r2),
>> > +  std::move(__pred),
>> > +  std::move(__proj1), std::move(__proj2),
>> > +  __n1, __n2);
>> > +  }
>> > +
>> > +template> > _Sent2,
>> > +typename _Pred,
>> > +typename _Proj1, typename _Proj2>
>> > +  static constexpr bool
>> > +  _S_impl(_Iter1 __first1, _Sent1 __last1,
>> > + _Iter2 __first2, _Sent2 __last2,
>> > + _Pred __pred,
>> > + _Proj1 __proj1, _Proj2 __proj2,
>> > + iter_difference_t<_Iter1> __n1,
>> > + iter_difference_t<_Iter2> __n2)
>> > +  {
>> > +   if (__n1 != -1 && __n2 != -1)
>> > + {
>> > +   if (__n1 < __n2)
>> > + return false;
>>
>> For random access iterators we end up doing this comparison twice,
>> because ranges::equal will also do if ((last1 - first1) < (last2 -
>> first2).
>
>
>>
>> So in theor

Re: [PATCH v1 0/8] RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost

2025-05-19 Thread Robin Dapp


The series LGTM.  I didn't check all the tests in detail to be honest :)

--
Regards
Robin

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-19 Thread Tamar Christina

> >/* Complete the target-specific cost calculations.  */
> >loop_vinfo->vector_costs->finish_cost (loop_vinfo->scalar_costs);
> >vec_prologue_cost = loop_vinfo->vector_costs->prologue_cost ();
> > @@ -12373,6 +12394,13 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> gimple *loop_vectorized_call)
> > dump_printf_loc (MSG_NOTE, vect_location, "Disabling unrolling due to"
> >  " variable-length vectorization factor\n");
> >  }
> > +
> > +  /* When we have unrolled the loop due to a user requested value we should
> > + leave it up to the RTL unroll heuristics to determine if it's still 
> > worth
> > + while to unroll more.  */
> > +  if (LOOP_VINFO_USER_UNROLL (loop_vinfo))
> 
> What I meant with copying of LOOP_VINFO_USER_UNROLL is that I think
> you'll never get to this being true as you set the suggested unroll
> factor for the costing attempt of the not extra unrolled loop but
> the transform where you want to reset is is when the unrolling
> was actually applied?

It was being set on every analysis of the main loop body.  Since it wasn't
actually cleared until we've picked a mode and did codegen the condition would
be true.

However..

> 
> That said, it would be clearer if LOOP_VINFO_USER_UNROLL would be
> set in vect_analyze_loop_1 where we have
> 

I agree this is much nicer.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/extend.texi: Document pragma unroll interaction with vectorizer.
* tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_analyze_loop_1 ): Set
suggested_unroll_factor and retry.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/unroll-vect.c: New test.

-- inline copy of patch --

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
e87a3c271f8420d8fd175823b5bb655f76c89afe..f8261d13903afc90d3341c09ab3fdbd0ab96ea49
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10398,6 +10398,11 @@ unrolled @var{n} times regardless of any commandline 
arguments.
 When the option is @var{preferred} then the user is allowed to override the
 unroll amount through commandline options.
 
+If the loop was vectorized the unroll factor specified will be used to seed the
+vectorizer unroll factor.  Whether the loop is unrolled or not will be
+determined by target costing.  The resulting vectorized loop may still be
+unrolled more in later passes depending on the target costing.
+
 @end table
 
 @node Thread-Local
diff --git a/gcc/testsuite/gcc.target/aarch64/unroll-vect.c 
b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
new file mode 100644
index 
..3cb774ba95787ebee488fbe7306299ef28e6bb35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -march=armv8-a --param 
aarch64-autovec-preference=asimd-only -std=gnu99" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** f1:
+** ...
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** ...
+*/
+void f1 (int *restrict a, int n)
+{
+#pragma GCC unroll 16
+  for (int i = 0; i < n; i++)
+a[i] *= 2;
+}
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..f215b6bc7881e7e659272cefbe3d5c8892ef768c
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1073,6 +1073,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 peeling_for_gaps (false),
 peeling_for_niter (false),
 early_breaks (false),
+user_unroll (false),
 no_data_dependencies (false),
 has_mask_store (false),
 scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -3428,27 +3429,51 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 res ? "succeeded" : "failed",
 GET_MODE_NAME (loop_vinfo->vector_mode));
 
-  if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && suggested_unroll_factor > 
1)
+  auto user_unroll = LOOP_VINFO_LOOP (loop_vinfo)->unroll;
+  if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  /* Check to see if the user wants to unroll or if the target wants to.  
*/
+  && (suggested_unroll_factor > 1 || user_unroll > 1))
 {
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
+  if (suggested_unroll_factor == 1)
+   {
+ int assumed_vf = vect_vf_for_cost (loop_vinfo);
+ int unrol

Re: Add effective-target 'offload_device_usm', 'libgomp.c-c++-common/target-usm-1.c'

2025-05-19 Thread Tobias Burnus


Hi Thomas,

coming back to the patch itself – having sent comments in previous email.

As mentioned, I think you either want to have:

* Check whether the runtime knows that USM is supported for the default 
device (that is a non-host device), i.e. a no-host device is available 
with 'required unified_shared_memory/self_maps')


* Check the above and check in addition that self-mapping happens with 
'required unified_shared_memory'.


I am not sure whether we need the second one, given that using 'requires 
self_maps' or a runtime check in the program works. (IMHO, it does not 
harm but is not really needed.) But the first one is surely useful.


In any case, you add the second one, only – while 
'map-alloc-comp-9-usm.f90' requires the first one.


* * *

Thomas Schwinge wrote:

 From 46fc59b5cdaa42c4dc9edaee7d52194c1f45b6b3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge
Date: Fri, 9 May 2025 15:05:57 +0200
Subject: [PATCH] Add effective-target 'offload_device_usm',
  'libgomp.c-c++-common/target-usm-1.c'

Also use the new effective-target 'offload_device_usm' for restricting
'libgomp.fortran/map-alloc-comp-9-usm.f90' testing; the latter being a USM
variant of 'libgomp.fortran/map-alloc-comp-9.f90'.


The wording for effective-target should be expanded as it is not clear 
what is tested for. (Cf. comments above).



diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index a620f8c2a09..cd32be1ca68 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -449,6 +449,8 @@ proc check_effective_target_offload_device_nonshared_as { } 
{
  }

  # Return 1 if offload device is available and it has shared address space.

+# This doesn't consider whether '#pragma omp requires unified_shared_memory'
+# may be used to switch into shared-memory mode.


This one actually does not check whether USM works but
whether self-mapping happens by default.

For instance, when the following code does not fail,

int a = 6;
int *p = &a;
#pragma omp target firstprivate(p)
  if (*p != 6) __builtin_abort ();

... there is support for USM (for at least some host memory).

However, this does not imply hat 'libgomp' knows about this
(i.e. 'requires unified_shared_memory' might exclude that device)
and even if libgomp knows it, it might still default to copy
mapping.

Example for the former:
* Two AMD GPUs, only one supporting USM. The current check
  is not per device but for all AMD GPUs.

* Example for the second: All GPUs currently. The plan is
  to default to self mapping for APUs (MI300a, Grace-Hopper,
  ...) and to permit to switch to it by an env var, but
  that still has to be implemented.


BTW: In principle, we could also add such a check. Although, it
might have false positive in case some but not all host memory
is accessible.

That's actually what I want to have formap-alloc-comp-9.f90 Like: { dg-additional-options "-DUSM_SUPPORTED" { 
target ... } } while the requires USM (map-alloc-comp-9.f90)is a partial work around.

* * *


+# Return 1 if, with '#pragma omp requires unified_shared_memory' in effect, an
+# USM-capable offload device is available (not considering host-fallback
+# execution).
+proc check_effective_target_offload_device_usm { } {


Probably simpler to use 'self_maps' and state so in the name?


+return [check_runtime_nocache offload_device_usm {
+  #pragma omp requires unified_shared_memory
+  #include 
+  int main ()
+   {
+ int a;

This variable is not initialized! You surely want to use "= 0" here!

+ #pragma omp target map(from: a)


This checks currently whether self-maps is the default with USM.

With 'requires self_maps' or with 'map(tofrom:' it would check whether 
USM is (known to libgomp to be) supported by the device.


At least for map-alloc-comp-9-usm.f90, self map is not required (and 
actually not wanted) – but host-memory access is required.


 * * *


+++ b/libgomp/testsuite/libgomp.c-c++-common/target-usm-1.c
@@ -0,0 +1,46 @@
+/* If we have an offload device that is capable of USM...
+   { dg-do run { target offload_device_usm } } */
+
+/* ..., and we request USM...  */
+#pragma omp requires unified_shared_memory
+/* (..., which in the GCC implementation equals 'self_maps'...)  */


Can you please avoid mixing USM with self_maps? If I had more time 
during in GCC 15, that assumption wouldn't be true anymore.


Thus, if you just care about USM, use 'requires unified_shared_memory'. 
But if you assume self mapping, use 'requires self_maps'.


[Adding APU detection and an env var is planned for GCC 16]

* * *

However, this is a rather safe assumption (at least in GCC): USM 
supported → self_maps supported – and vice versa, even though OpenMP 
does not guarantee it and some system might not imply it. self_maps 
supported → USM supported is surely always the case.


At least as I intent to get this working in GCC, both will work.


+++ b/libgomp/testsuite/libgomp.fortran/map-alloc-comp-9-

Re: [PATCH v1 1/6] libstdc++: Implement layout_left from mdspan.

2025-05-19 Thread Tomasz Kaminski

On Sun, May 18, 2025 at 10:11 PM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 240 
>  1 file changed, 240 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 47cfa405e44..3c1c33d9e9a 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -144,6 +144,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   { return __exts[__i]; });
>   }
>
> +   static constexpr size_t
> +   _M_static_extents_prod(size_t __begin, size_t __end) noexcept
> +   {
> + size_t __ret = 1;
> + if constexpr (_S_rank > 0)
> +   for(size_t __i = __begin; __i < __end; ++__i)
> + __ret *= (_Extents[__i] == dynamic_extent ? 1 :
> _Extents[__i]);
> + return __ret;
> +   }
> +
> +   constexpr _IndexType
> +   _M_dynamic_extents_prod(size_t __begin, size_t __end) const
> noexcept
> +   {
> + _IndexType __ret = 1;
> + if constexpr (_S_rank_dynamic > 0)
> +   {
> + size_t __dyn_begin = _S_dynamic_index[__begin];
> + size_t __dyn_end = _S_dynamic_index[__end];
> +
> + for(size_t __i = __dyn_begin; __i < __dyn_end; ++__i)
> +   __ret *= _M_dynamic_extents[__i];
> +   }
> + return __ret;
> +   }
> +
> +   constexpr _IndexType
> +   _M_extents_prod(size_t __begin, size_t __end) const noexcept
> +   {
> + return _IndexType(_M_static_extents_prod(__begin, __end))
> +* _M_dynamic_extents_prod(__begin, __end);
> +   }
> +
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> [[no_unique_address]] _S_storage _M_dynamic_extents;
> @@ -190,6 +222,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   return _S_storage::_S_static_extent(__r);
>}
>
> +  constexpr index_type
> +  _M_fwd_prod(rank_type __r) const noexcept
> +  { return _M_dynamic_extents._M_extents_prod(0, __r); }
> +
> +  constexpr index_type
> +  _M_rev_prod(rank_type __r) const noexcept
> +  { return _M_dynamic_extents._M_extents_prod(__r + 1, rank()); }
>
I would prefer us to avoid exposing member functions, even with the
reserved names
in the public interface. What I would suggest instead, is to expose two
private member
functions in extents, dynamic extents.
static span _M_static_exts(size_t __begin, size_t __end):
span _M_dyn_exts(size_t __begin, size_t __end) const;

And then having a two  accessors in __mdspan:
template
static span __static_exts(size_t __begin, size_t __end) {
  return _Exts::__static_exts(__begin, __end);
}
template
span __dyn_exts(_Exts const&, size_t __begin, size_t
__end) const
{   return _Exts::_M_dyn_exts(__begin, __end); }

This would allow us to implement all of the helpers in __mdspan namespace
directly, without
extending class interface.



+
>constexpr index_type
>extent(rank_type __r) const noexcept
>{
> @@ -286,6 +326,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts._M_fwd_prod(__r); }
>
Because you made  _M_fwd_prod this is not ncessary.

> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts._M_rev_prod(__r); }
> +
>  template
>auto __build_dextents_type(integer_sequence)
> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
> @@ -304,6 +354,196 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  explicit extents(_Integrals...) ->
>extents()...>;
>
> +  struct layout_left
> +  {
> +template
> +  class mapping;
> +  };
> +
> +  namespace __mdspan
> +  {
> +template
> +  constexpr bool __is_extents = false;
> +
> +template
> +  constexpr bool __is_extents> =
> true;
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __linear_index_left(const _Extents& __exts, _Indices... __indices)
> +  {
> +   using _IndexType = typename _Extents::index_type;
> +   _IndexType __res = 0;
> +   if constexpr (sizeof...(__indices) > 0)
> + {
> +   _IndexType __mult = 1;
> +   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
> + {
> +   __res += __idx * __mult;
> +   __mult *= __exts.extent(__pos);
> +   ++__pos;
> + };
> +   (__update(__indices), ...);
> + }
> +   return __res;
> +  }
> +
> +template
> +  constexpr boo

RE: [PATCH 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-19 Thread Tamar Christina

> > +-param=vect-scalar-cost-multiplier=
> > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1)
> IntegerRange(0, 10) Param Optimization
> > +The scaling multiplier to add to all scalar loop costing when performing
> vectorization profitability analysis.  The default value is 1.
> > +
> 
> Note this only allows whole number scaling.  May I suggest to instead
> use percentage as unit, thus the multiplier is --param
> param_vect_scalar_cost_multiplier / 100?
> 

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* params.opt (vect-scalar-cost-multiplier): New.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
* doc/invoke.texi (vect-scalar-cost-multiplier): Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cost_model_16.c: New test.

-- inline copy of patch --

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17273,6 +17273,10 @@ this parameter.  The default value of this parameter 
is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vect-scalar-cost-multiplier
+Apply the given multiplier % to scalar loop costing during vectorization.
+Increasing the cost multiplier will make vector loops more profitable.
+
 @item vrp-block-limit
 Maximum number of basic blocks before VRP switches to a lower memory algorithm.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 
1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367
 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies to 
the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 
1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vect-scalar-cost-multiplier=
+Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) 
IntegerRange(0, 1) Param Optimization
+The scaling multiplier as a percentage to apply to all scalar loop costing 
when performing vectorization profitability analysis.  The default value is 100.
+
 -param=vrp-block-limit=
 Common Joined UInteger Var(param_vrp_block_limit) Init(15) Optimization 
Param
 Maximum number of basic blocks before VRP switches to a fast model with less 
memory requirements.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
new file mode 100644
index 
..c405591a101d50b4734bc6d65a6d6c01888bea48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization 
-fdump-tree-vect-details" } */
+
+void
+foo (char *restrict a, int *restrict b, int *restrict c,
+ int *restrict d, int stride)
+{
+if (stride <= 1)
+return;
+
+for (int i = 0; i < 3; i++)
+{
+int res = c[i];
+int t = b[i * stride];
+if (a[i] != 0)
+res = t * d[i];
+c[i] = res;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
  TODO: Consider assigning different costs to different scalar
  statements.  */
 
-  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
+  scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost ()
+* param_vect_scalar_cost_multiplier) / 100;
 
   /* Add additional cost for the peeled instructions in prologue and epilogue
  loop.  (For fully-masked loops there will be no peeling.)


rb19441.patch
Description: rb19441.patch

[PATCH 1/2] Match:Support IMM=-1 for signed scalar SAT_ADD IMM form1

2025-05-19 Thread Li Xu

From: xuli 

This patch would like to support .SAT_ADD when IMM=-1.

Form1:
T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -1, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  unsigned char x.0_1;
  unsigned char _2;
  unsigned char _3;
  int8_t iftmp.1_4;
  signed char _8;
  unsigned char _9;
  signed char _10;

   [local count: 1073741824]:
  x.0_1 = (unsigned char) x_5(D);
  _3 = -x.0_1;
  _10 = (signed char) _3;
  _8 = x_5(D) & _10;
  if (_8 < 0)
goto ; [1.40%]
  else
goto ; [98.60%]

   [local count: 434070867]:
  _2 = x.0_1 + 255;

   [local count: 1073741824]:
  # _9 = PHI <_2(3), 128(2)>
  iftmp.1_4 = (int8_t) _9;
  return iftmp.1_4;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t _4;

   [local count: 1073741824]:
  gimple_call <.SAT_ADD, _4, x_5(D), 255> [tail call]
  gimple_return <_4>

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 

gcc/ChangeLog:

* match.pd: Add signed scalar SAT_ADD IMM form1 with IMM=-1 matching.
* tree-ssa-math-opts.cc (match_unsigned_saturation_add): Adapt function 
name.
(match_saturation_add_with_assign): Match signed and unsigned SAT_ADD 
with assign.
(math_opts_dom_walker::after_dom_children): Match imm=-1 signed SAT_ADD 
with NOP_EXPR case.

---
 gcc/match.pd  | 19 ++-
 gcc/tree-ssa-math-opts.cc | 30 +-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 98411af3940..a07dbb808d2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3403,7 +3403,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_xor:c @0 INTEGER_CST@3)) integer_zerop)
 (signed_integer_sat_val @0)
 @2)
-  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0
+  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0)))
+
+(match (signed_integer_sat_add @0 @1)
+  /* T SUM = (T)((UT)X + (UT)-1);
+ SAT_S_ADD = (X ^ -1) < 0 ? SUM : (X ^ SUM) >= 0 ? SUM
+ : (x < 0) ? MIN : MAX  */
+  (convert (cond^ (lt (bit_and:c @0 (nop_convert (negate (nop_convert @0
+ integer_zerop)
+INTEGER_CST@2
+(plus (nop_convert @0) integer_all_onesp@1)))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int c1 = wi::to_wide (@1);
+ wide_int c2 = wi::to_wide (@2);
+ wide_int sum = wi::add (c1, c2);
+}
+(if (wi::eq_p (sum, wi::max_value (precision, SIGNED)))
 
 /* Saturation sub for signed integer.  */
 (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 292eb852f2d..f6a1bea2002 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4064,15 +4064,34 @@ build_saturation_binary_arith_call_and_insert 
(gimple_stmt_iterator *gsi,
  *   _10 = -_9;
  *   _12 = _7 | _10;
  *   =>
- *   _12 = .SAT_ADD (_4, _6);  */
+ *   _12 = .SAT_ADD (_4, _6);
+ *
+ * Try to match IMM=-1 saturation signed add with assign.
+ *  [local count: 1073741824]:
+ * x.0_1 = (unsigned char) x_5(D);
+ * _3 = -x.0_1;
+ * _10 = (signed char) _3;
+ * _8 = x_5(D) & _10;
+ * if (_8 < 0)
+ *   goto ; [1.40%]
+ * else
+ *   goto ; [98.60%]
+ *  [local count: 434070867]:
+ * _2 = x.0_1 + 255;
+ *  [local count: 1073741824]:
+ * # _9 = PHI <_2(3), 128(2)>
+ * _4 = (int8_t) _9;
+ *   =>
+ * _4 = .SAT_ADD (x_5, -1); */
 
 static void
-match_unsigned_saturation_add (gimple_stmt_iterator *gsi, gassign *stmt)
+match_saturation_add_with_assign (gimple_stmt_iterator *gsi, gassign *stmt)
 {
   tree ops[2];
   tree lhs = gimple_assign_lhs (stmt);
 
-  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
+  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
+  || gimple_signed_integer_sat_add (lhs, ops, NULL))
 build_saturation_binary_arith_call_and_replace (gsi, IFN_SAT_ADD, lhs,
ops[0], ops[1]);
 }
@@ -6363,7 +6382,7 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
  break;
 
case PLUS_EXPR:
- match_unsigned_saturation_add (&gsi, as_a (stmt));
+ match_saturation_add_with_assign (&gsi, as_a (stmt));
  match_unsigned_saturation_sub (&g

[PATCH] c++/modules: Ensure vtables are emitted when needed [PR120349]

2025-05-19 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Alternatively could go back to always marking vtables as DECL_EXTERNAL
as well but that doesn't seem to be necessary that I can tell.

-- >8 --

I missed a testcase in r16-688-gc875748cdc468e for whether a GM vtable
should be emitted in an importer when it has no non-inline key function.
Before that patch the code worked because always we marked all vtables
as DECL_EXTERNAL, which then meant that reading the definition marked
them as DECL_NOT_REALLY_EXTERN.

But it seems to me that really, all vtables should just be considered
DECL_NOT_REALLY_EXTERN until processed by maybe_emit_vtables (this is
how the frontend seems to behave in general); this patch makes that
adjustment.

PR c++/120349

gcc/cp/ChangeLog:

* module.cc (trees_in::read_var_def): Always mark vtables as
DECL_NOT_REALLY_EXTERN.

gcc/testsuite/ChangeLog:

* g++.dg/modules/vtt-3_a.C: New test.
* g++.dg/modules/vtt-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   |  2 +-
 gcc/testsuite/g++.dg/modules/vtt-3_a.C | 29 ++
 gcc/testsuite/g++.dg/modules/vtt-3_b.C | 14 +
 3 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/vtt-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/vtt-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 200e1c2deb3..d860940caa4 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12781,7 +12781,7 @@ trees_in::read_var_def (tree decl, tree maybe_template)
   if (installing)
 {
   DECL_INITIAL (decl) = init;
-  if (DECL_EXTERNAL (decl))
+  if (DECL_EXTERNAL (decl) || vtable)
DECL_NOT_REALLY_EXTERN (decl) = true;
   if (VAR_P (decl))
{
diff --git a/gcc/testsuite/g++.dg/modules/vtt-3_a.C 
b/gcc/testsuite/g++.dg/modules/vtt-3_a.C
new file mode 100644
index 000..f38f024ba1f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/vtt-3_a.C
@@ -0,0 +1,29 @@
+// PR c++/120349
+// { dg-additional-options "-fmodules -Wno-global-module" }
+// { dg-module-cmi M }
+
+module;
+
+// GMF types; should have vtables emitted in importers
+struct BGG {
+  virtual inline ~BGG() {}
+};
+struct BGM {
+  virtual inline ~BGM() {}
+};
+struct DGG : BGG {};
+
+export module M;
+
+export using ::DGG;
+
+// Module-local types; should have vtables emitted here
+struct BM {
+  virtual inline ~BM() {}
+};
+export struct DGM : BGM {};  // note: this emits BGM's vtable here too
+export struct DM : BM {};
+
+// { dg-final { scan-assembler-not "_ZTV3BGG:" } }
+// { dg-final { scan-assembler "_ZTV3BGM:" } }
+// { dg-final { scan-assembler "_ZTVW1M2BM:" } }
diff --git a/gcc/testsuite/g++.dg/modules/vtt-3_b.C 
b/gcc/testsuite/g++.dg/modules/vtt-3_b.C
new file mode 100644
index 000..ef7ae6ca4e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/vtt-3_b.C
@@ -0,0 +1,14 @@
+// PR c++/120349
+// { dg-additional-options "-fmodules -Wno-global-module" }
+
+import M;
+
+int main() {
+  DGG dgg;
+  DGM dgm;
+  DM dm;
+}
+
+// { dg-final { scan-assembler "_ZTV3BGG:" } }
+// { dg-final { scan-assembler "_ZTV3BGM:" } }
+// { dg-final { scan-assembler-not "_ZTVW1M2BM:" } }
-- 
2.47.0

Re: [PATCH v1 2/6] libstdc++: Add tests for layout_left.

2025-05-19 Thread Tomasz Kaminski

On Sun, May 18, 2025 at 10:14 PM Luc Grosheintz 
wrote:

> Implements a suite of tests for the currently implemented parts of
> layout_left. The individual tests are templated over the layout type, to
> allow reuse as more layouts are added.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: New
> test.
> * testsuite/23_containers/mdspan/layouts/ctors.cc: New test.
> * testsuite/23_containers/mdspan/layouts/mapping.cc: New test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  .../mdspan/layouts/class_mandate_neg.cc   |  22 +
>  .../23_containers/mdspan/layouts/ctors.cc | 258 ++
>  .../23_containers/mdspan/layouts/mapping.cc   | 445 ++
>  3 files changed, 725 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> new file mode 100644
> index 000..f122541b3e8
> --- /dev/null
> +++
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> @@ -0,0 +1,22 @@
> +// { dg-do compile { target c++23 } }
> +#include
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +static constexpr size_t n = (size_t(1) << 7) - 1;
>
I would use numeric_limits_max here.

> +
> +template
> +  struct A
> +  {
> +typename Layout::mapping> m0;
> +typename Layout::mapping> m1;
> +typename Layout::mapping> m2;
> +
> +using extents_type = std::extents;
> +typename Layout::mapping m3; // { dg-error "required
> from" }
> +  };
> +
> +A a_left; // { dg-error "required
> from" }
> +
> +// { dg-prune-output "must be representable as index_type" }
> diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> new file mode 100644
> index 000..4592a05dec8
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> @@ -0,0 +1,258 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +
> +template
> +  constexpr void
> +  verify_from_exts(OExtents exts)
> +  {
> +auto m = Mapping(exts);
> +VERIFY(m.extents() == exts);
> +  }
> +
> +
> +template
> +  constexpr void
> +  verify_from_mapping(OMapping other)
> +  {
> +auto m = SMapping(other);
> +VERIFY(m.extents() == other.extents());
> +  }
> +
> +template
> +  requires (std::__mdspan::__is_extents)
> +  constexpr void
> +  verify(OExtents oexts)
> +  {

In general, wen possible we prefer to not use internal details in tests.
I would use if constexpr with requires { typename Other::layout_type; }, ie.
template
cosntexpr void
verify(Source const& src)
{
  if constexpr (requires { typename Other::layout_type; })
 verify_from_mapping(src)
  else
 verify_from_extents(src);
}

+auto m = Mapping(oexts);
> +VERIFY(m.extents() == oexts);
> +  }
> +
> +template
> +  requires (std::__mdspan::__standardized_mapping)
> +  constexpr void
> +  verify(OMapping other)
> +  {
> +constexpr auto rank = Mapping::extents_type::rank();
> +auto m = Mapping(other);
> +VERIFY(m.extents() == other.extents());
> +if constexpr (rank > 0)
> +  for(size_t i = 0; i < rank; ++i)
> +   VERIFY(std::cmp_equal(m.stride(i), other.stride(i)));
>
Why is this not checked in verify_from_mapping?

> +  }
> +
> +
> +template
> +  constexpr void
> +  verify_nothrow_convertible(From from)
> +  {
> +static_assert(std::is_nothrow_constructible_v);
>
I would call  `verify_convertible` here, instead of these two lines.

> +static_assert(std::is_convertible_v);
> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  verify_convertible(From from)
> +  {
> +static_assert(std::is_convertible_v);
> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  verify_constructible(From from)
> +  {
> +static_assert(!std::is_convertible_v);
> +static_assert(!std::is_nothrow_constructible_v);
>
Implementations are allowed to add noexcept on the functions, so I would
not perform this checks.
See: https://eel.is/c++draft/res.on.exception.handling#5

> +static_assert(std::is_constructible_v);
> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  verify_nothrow_constructible(From from)
> +  {
> +static_assert(!std::is_convertible_v);
> +static_assert(std::is_nothrow_constructible_v);
>
With the change above, I would call verify  verify_constructible.

> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  assert_not_constructible()
> +  {
> +static_assert(!std::is_construc

Re: [PATCH v1 0/6] Implement layouts from mdspan.

2025-05-19 Thread Tomasz Kaminski

Thank you for the patches.
I have reviewed the layout_left and test, and posted some suggestions there.
I will follow up with a review for other mapping later.

On Sun, May 18, 2025 at 10:07 PM Luc Grosheintz 
wrote:

> Technically, this is the second iteration of these patches. Previous
> discussion can be found here:
>
> https://gcc.gnu.org/pipermail/libstdc++/2025-May/061350.html`
> 
>
> The implementation of `layout_stride::mapping::is_exhaustive` needs
> to be discussed, because for empty extents, the standard seems to
> require implementing a formula that doesn't require returning true
> in all cases.
>
> Luc Grosheintz (6):
>   libstdc++: Implement layout_left from mdspan.
>   libstdc++: Add tests for layout_left.
>   libstdc++: Implement layout_right from mdspan.
>   libstdc++: Add tests for layout_right.
>   libstdc++: Implement layout_stride from mdspan.
>   libstdc++: Add tests for layout_stride.
>
>  libstdc++-v3/include/std/mdspan   | 604 ++
>  .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
>  .../23_containers/mdspan/layouts/ctors.cc | 421 
>  .../23_containers/mdspan/layouts/mapping.cc   | 573 +
>  .../23_containers/mdspan/layouts/stride.cc| 494 ++
>  5 files changed, 2134 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc
>
> --
> 2.49.0
>
>

Re: [PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-19 Thread Jeff Law





On 5/18/25 2:19 PM, Richard Sandiford wrote:

Jeff Law  writes:

On 5/16/25 11:32 AM, Jeff Law wrote:



On 5/16/25 11:21 AM, Richard Sandiford wrote:

This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:

    array subscript 2 is above array bounds of ‘rtx_def* [2]’

gcc/
 * config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
 a local variable instead of operands[2].

Obviously OK.  IMHO you should just commit this kind of fix.

You might consider looking at pr100837 which looks like it'd be fixed by
this change.


Ah yeah, good spot.  I'll add it to the commit message.
Just happened to be grubbing around looking for good bugs for another 
intern and stumbled over it...


Jeff

[GCC15][committed] OpenMP - backporting testcases + 2 mapping fixes

2025-05-19 Thread Tobias Burnus

The next two commits fix two mapping issues, the rest are test cases 
that should have been in GCC 15 pre-branch. Namely: 
r15-9707-g57f73c3956572f   OpenMP/Fortran: Fix allocatable-component 
mapping of derived-type array comps r15-9706-gab9ca3a8b1af41   OpenMP: 
Fix mapping of zero-sized arrays with non-literal size: map(var[:n]), n 
= 0 r15-9705-g6f607c9174ea8c   libgomp.{c,fortran}/interop-{hip,cuda}: 
Fix dg-run target selection r15-9704-g24edffe147a7e5 
  libgomp.fortran/map-alloc-comp-9{,-usm}.f90: Add 
unified_shared_memory variant r15-9703-ge71170dc97caf1 
  'libgomp.c/interop-hsa.c': GCN offloading only 
r15-9702-g6ae29e2c1cab22   OpenMP: Restore lost Fortran testcase for 
'omp allocate' r15-9701-gc37fa5f8e5d2f9   OpenMP, GCN: Add interop-hsa 
testcase r15-9700-ge8b69eeb1c3d21   libgomp/testsuite: Fix 
hip_header_nvidia check, add workaround to test r15-9699-g951d02dde2b86c 
  libgomp: Add additional OpenMP interop runtime tests 
r15-9698-gf251e2748a9c6d   OpenMP: Add 
libgomp.fortran/target-enter-data-8.f90 Tobias

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-19 Thread Jeff Law





On 5/18/25 2:24 PM, Richard Sandiford wrote:



gcc/
* genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
operand%d.
(start_gen_insn): Store the incoming arguments to an operands array.
(gen_expand, gen_split): Remove copies into and out of the operands
array.
* config/stormy16/stormy16.md (negsi): Remove redundant assignment.

So two questions.  Is there any meanginful performance impact expected
here using the array form rather than locals?   And does this impact how
folks write their C/C++ fragments in the expanders and such?


I don't think there should be any compile-time impact, and I can't
measure one when compiling fold-const.ii -O0 (my go-to test for this).

Sounds good.



The md interface remains the same, in that all interaction is via the
the operands[] array.  Any writes to the individual operandN variables
(where present) are ignored both before and after the patch.
I must have mis-read a bit then.  My mental model was we were losing the 
operandN interface.




However, I suppose this does make it possible to turn the operandN
arguments into constants, to prevent accidents.  I'll try that.
If it works, it's a good idea as I've had to chase down bugs in this 
space a few times through the years...

jeff

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 9:59 AM Tomasz Kaminski  wrote:

>
>
> On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
> I would appreciate a short explanation on the approach being put here,
> in the message. Like passing -1 as means of saying, size not know.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>>
> From the non-stylistic changes, I have noticed that we need some explicit
> conversion between different types.
> I think we should add test that would check with size being __max_diff_t,
> maybe we should add such a range to
> testutils_iterators.
> Rest of the of comments are mostly stylistic.
>
>
>> -- >8 --
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>> Define.
>> (__ends_with_fn, ends_with): Define.
>> * include/bits/version.def (ranges_starts_ends_with): Define.
>> * include/bits/version.h: Regenerate.
>> * include/std/algorithm: Provide
>> __cpp_lib_ranges_starts_ends_with.
>> * src/c++23/std.cc.in (ranges::starts_with): Export.
>> (ranges::ends_with): Export.
>> * testsuite/25_algorithms/ends_with/1.cc: New test.
>> * testsuite/25_algorithms/starts_with/1.cc: New test.
>> ---
>>  libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
>>  libstdc++-v3/include/bits/version.def |   8 +
>>  libstdc++-v3/include/bits/version.h   |  10 +
>>  libstdc++-v3/include/std/algorithm|   1 +
>>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>>  7 files changed, 512 insertions(+)
>>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>>
>> diff --git a/libstdc++-v3/include/bits/ranges_algo.h
>> b/libstdc++-v3/include/bits/ranges_algo.h
>> index f36e7dd59911..c59a555f528a 100644
>> --- a/libstdc++-v3/include/bits/ranges_algo.h
>> +++ b/libstdc++-v3/include/bits/ranges_algo.h
>> @@ -438,6 +438,238 @@ namespace ranges
>>
>>inline constexpr __search_n_fn search_n{};
>>
>> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>> +  struct __starts_with_fn
>> +  {
>> +template _Sent1,
>> +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>> +typename _Pred = ranges::equal_to,
>> +typename _Proj1 = identity, typename _Proj2 = identity>
>> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1,
>> _Proj2>
>> +  constexpr bool
>> +  operator()(_Iter1 __first1, _Sent1 __last1,
>> +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
>> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> +  {
>>
> We could check __first == __last2 early here, before even computing size.

> +   iter_difference_t<_Iter1> __n1 = -1;
>> +   iter_difference_t<_Iter2> __n2 = -1;
>> +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
>> + __n1 = __last1 - __first1;
>> +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
>> + __n2 = __last2 - __first2;
>> +   return _S_impl(std::move(__first1), __last1,
>> +  std::move(__first2), __last2,
>> +  std::move(__pred),
>> +  std::move(__proj1), std::move(__proj2),
>> +  __n1, __n2);
>> +  }
>> +
>> +template> +typename _Pred = ranges::equal_to,
>> +typename _Proj1 = identity, typename _Proj2 = identity>
>> +  requires indirectly_comparable,
>> iterator_t<_Range2>,
>> +_Pred, _Proj1, _Proj2>
>> +  constexpr bool
>> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
>> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> +  {
>>
> Similar here:
auto __first2 = ranges::begin(__r2);
auto __last2 = ranges::end(__r2);
if (__first2 == __last2)
  return true;

And then move __first2 into the _S_impl call.


> +   range_difference_t<_Range1> __n1 = -1;
>> +   range_difference_t<_Range1> __n2 = -1;
>> +   if constexpr (sized_range<_Range1>)
>> + __n1 = ranges::size(__r1);
>> +   if constexpr (sized_range<_Range2>)
>> + __n2 = ranges::size(__r2);
>> +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
>> +  ranges::begin(__r2), ranges::end(__r2),
>> +  std::move(__pred),
>> +  std::move(__proj1), std::move(__proj2),
>> +  __n1, __n2);
>> +  }
>> +
>> +template> _Sent2,
>> +typename _Pred,
>> +typename _Proj1, typename _Proj2>
>> +  static constexpr bool
>> +  _S_impl(_Iter1 __first1, _Sent1 __last1,
>>
> I think I would make this function private.  The user should not look into
> wrappers,
> but it do

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-19 Thread Jeff Law





On 5/19/25 3:16 AM, Richard Sandiford wrote:

Richard Sandiford  writes:

Jeff Law  writes:

So two questions.  Is there any meanginful performance impact expected
here using the array form rather than locals?   And does this impact how
folks write their C/C++ fragments in the expanders and such?


I don't think there should be any compile-time impact, and I can't
measure one when compiling fold-const.ii -O0 (my go-to test for this).

The md interface remains the same, in that all interaction is via the
the operands[] array.  Any writes to the individual operandN variables
(where present) are ignored both before and after the patch.

However, I suppose this does make it possible to turn the operandN
arguments into constants, to prevent accidents.  I'll try that.


The only problem case seemed to be sparc.md, which uses operandN
variables in code that always invokes DONE:

(define_expand "zero_extendhisi2"
   [(set (match_operand:SI 0 "register_operand" "")
(zero_extend:SI (match_operand:HI 1 "register_operand" "")))]
   ""
{
   rtx temp = gen_reg_rtx (SImode);
   rtx shift_16 = GEN_INT (16);
   int op1_subbyte = 0;

   if (GET_CODE (operand1) == SUBREG)
 {
   op1_subbyte = SUBREG_BYTE (operand1);
   op1_subbyte /= GET_MODE_SIZE (SImode);
   op1_subbyte *= GET_MODE_SIZE (SImode);
   operand1 = XEXP (operand1, 0);
 }

   emit_insn (gen_ashlsi3 (temp, gen_rtx_SUBREG (SImode, operand1, op1_subbyte),
  shift_16));
   emit_insn (gen_lshrsi3 (operand0, temp, shift_16));
   DONE;
})

So I suppose the question is whether we want to continue to allow that,
or whether it would be better to flag accidental writes to operandN
instead of operands[N] for code that doesn't invoke DONE.
Given how long it's taken when I've had to track this kind of thing down 
(out of tree port) I'd say let's not allow it ;-)  If the above is the 
only known active instance in tree, let's fix and never have to think 
about that problem again.


Jeff


Richard

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

On Mon, 19 May 2025, Tomasz Kaminski wrote:

> 
> 
> On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
> I would appreciate a short explanation on the approach being put here, 
> in the message. Like passing -1 as means of saying, size not know.
> 
>   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> From the non-stylistic changes, I have noticed that we need some explicit 
> conversion between different types.
> I think we should add test that would check with size being __max_diff_t, 
> maybe we should add such a range to
> testutils_iterators.
> Rest of the of comments are mostly stylistic.
> 
> 
>   -- >8 --
> 
>   libstdc++-v3/ChangeLog:
> 
>           * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>           Define.
>           (__ends_with_fn, ends_with): Define.
>           * include/bits/version.def (ranges_starts_ends_with): Define.
>           * include/bits/version.h: Regenerate.
>           * include/std/algorithm: Provide 
> __cpp_lib_ranges_starts_ends_with.
>           * src/c++23/std.cc.in (ranges::starts_with): Export.
>           (ranges::ends_with): Export.
>           * testsuite/25_algorithms/ends_with/1.cc: New test.
>           * testsuite/25_algorithms/starts_with/1.cc: New test.
>   ---
>    libstdc++-v3/include/bits/ranges_algo.h       | 232 ++
>    libstdc++-v3/include/bits/version.def         |   8 +
>    libstdc++-v3/include/bits/version.h           |  10 +
>    libstdc++-v3/include/std/algorithm            |   1 +
>    libstdc++-v3/src/c++23/std.cc.in              |   4 +
>    .../testsuite/25_algorithms/ends_with/1.cc    | 129 ++
>    .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>    7 files changed, 512 insertions(+)
>    create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>    create mode 100644 
> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> 
>   diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
>   index f36e7dd59911..c59a555f528a 100644
>   --- a/libstdc++-v3/include/bits/ranges_algo.h
>   +++ b/libstdc++-v3/include/bits/ranges_algo.h
>   @@ -438,6 +438,238 @@ namespace ranges
> 
>      inline constexpr __search_n_fn search_n{};
> 
>   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>   +  struct __starts_with_fn
>   +  {
>   +    template _Sent1,
>   +            input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>   +            typename _Pred = ranges::equal_to,
>   +            typename _Proj1 = identity, typename _Proj2 = identity>
>   +      requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, 
> _Proj2>
>   +      constexpr bool
>   +      operator()(_Iter1 __first1, _Sent1 __last1,
>   +                _Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
>   +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>   +      {
>   +       iter_difference_t<_Iter1> __n1 = -1;
>   +       iter_difference_t<_Iter2> __n2 = -1;
>   +       if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
>   +         __n1 = __last1 - __first1;
>   +       if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
>   +         __n2 = __last2 - __first2;
>   +       return _S_impl(std::move(__first1), __last1,
>   +                      std::move(__first2), __last2,
>   +                      std::move(__pred),
>   +                      std::move(__proj1), std::move(__proj2),
>   +                      __n1, __n2);
>   +      }
>   +
>   +    template   +            typename _Pred = ranges::equal_to,
>   +            typename _Proj1 = identity, typename _Proj2 = identity>
>   +      requires indirectly_comparable, 
> iterator_t<_Range2>,
>   +                                    _Pred, _Proj1, _Proj2>
>   +      constexpr bool
>   +      operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
>   +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>   +      {
>   +       range_difference_t<_Range1> __n1 = -1;
>   +       range_difference_t<_Range1> __n2 = -1;
>   +       if constexpr (sized_range<_Range1>)
>   +         __n1 = ranges::size(__r1);
>   +       if constexpr (sized_range<_Range2>)
>   +         __n2 = ranges::size(__r2);
>   +       return _S_impl(ranges::begin(__r1), ranges::end(__r1),
>   +                      ranges::begin(__r2), ranges::end(__r2),
>   +                      std::move(__pred),
>   +                      std::move(__proj1), std::move(__proj2),
>   +                      __n1, __n2);
>   +      }
>   +
>   +    template typename _Sent2,
>   +            typename _Pred,
>   +            typename _Proj1, typename _Proj2>
>   +      static constexp

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 4:02 PM Patrick Palka  wrote:

> On Mon, 19 May 2025, Tomasz Kaminski wrote:
>
> >
> >
> > On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
> > I would appreciate a short explanation on the approach being put here,
> > in the message. Like passing -1 as means of saying, size not know.
> >
> >   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > From the non-stylistic changes, I have noticed that we need some
> explicit conversion between different types.
> > I think we should add test that would check with size being
> __max_diff_t, maybe we should add such a range to
> > testutils_iterators.
> > Rest of the of comments are mostly stylistic.
> >
> >
> >   -- >8 --
> >
> >   libstdc++-v3/ChangeLog:
> >
> >   * include/bits/ranges_algo.h (__starts_with_fn,
> starts_with):
> >   Define.
> >   (__ends_with_fn, ends_with): Define.
> >   * include/bits/version.def (ranges_starts_ends_with):
> Define.
> >   * include/bits/version.h: Regenerate.
> >   * include/std/algorithm: Provide
> __cpp_lib_ranges_starts_ends_with.
> >   * src/c++23/std.cc.in (ranges::starts_with): Export.
> >   (ranges::ends_with): Export.
> >   * testsuite/25_algorithms/ends_with/1.cc: New test.
> >   * testsuite/25_algorithms/starts_with/1.cc: New test.
> >   ---
> >libstdc++-v3/include/bits/ranges_algo.h   | 232
> ++
> >libstdc++-v3/include/bits/version.def |   8 +
> >libstdc++-v3/include/bits/version.h   |  10 +
> >libstdc++-v3/include/std/algorithm|   1 +
> >libstdc++-v3/src/c++23/std.cc.in  |   4 +
> >.../testsuite/25_algorithms/ends_with/1.cc| 129 ++
> >.../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
> >7 files changed, 512 insertions(+)
> >create mode 100644
> libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
> >create mode 100644
> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> >
> >   diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> >   index f36e7dd59911..c59a555f528a 100644
> >   --- a/libstdc++-v3/include/bits/ranges_algo.h
> >   +++ b/libstdc++-v3/include/bits/ranges_algo.h
> >   @@ -438,6 +438,238 @@ namespace ranges
> >
> >  inline constexpr __search_n_fn search_n{};
> >
> >   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> >   +  struct __starts_with_fn
> >   +  {
> >   +template _Sent1,
> >   +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> >   +typename _Pred = ranges::equal_to,
> >   +typename _Proj1 = identity, typename _Proj2 =
> identity>
> >   +  requires indirectly_comparable<_Iter1, _Iter2, _Pred,
> _Proj1, _Proj2>
> >   +  constexpr bool
> >   +  operator()(_Iter1 __first1, _Sent1 __last1,
> >   +_Iter2 __first2, _Sent2 __last2, _Pred __pred =
> {},
> >   +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> >   +  {
> >   +   iter_difference_t<_Iter1> __n1 = -1;
> >   +   iter_difference_t<_Iter2> __n2 = -1;
> >   +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> >   + __n1 = __last1 - __first1;
> >   +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> >   + __n2 = __last2 - __first2;
> >   +   return _S_impl(std::move(__first1), __last1,
> >   +  std::move(__first2), __last2,
> >   +  std::move(__pred),
> >   +  std::move(__proj1), std::move(__proj2),
> >   +  __n1, __n2);
> >   +  }
> >   +
> >   +template >   +typename _Pred = ranges::equal_to,
> >   +typename _Proj1 = identity, typename _Proj2 =
> identity>
> >   +  requires indirectly_comparable,
> iterator_t<_Range2>,
> >   +_Pred, _Proj1, _Proj2>
> >   +  constexpr bool
> >   +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred =
> {},
> >   +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> >   +  {
> >   +   range_difference_t<_Range1> __n1 = -1;
> >   +   range_difference_t<_Range1> __n2 = -1;
> >   +   if constexpr (sized_range<_Range1>)
> >   + __n1 = ranges::size(__r1);
> >   +   if constexpr (sized_range<_Range2>)
> >   + __n2 = ranges::size(__r2);
> >   +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
> >   +  ranges::begin(__r2), ranges::end(__r2),
> >   +  std::move(__pred),
> >   +  std::move(__proj1), std::move(_

[PATCH v3 2/3] sbitmap: Add bitmap_is_range_set_p function

2025-05-19 Thread Konstantinos Eleftheriou

This patch adds the `bitmap_is_range_set_p` function in sbitmap,
which checks if all the bits in a range are set. This function
calls `bitmap_bit_in_range_p_1`, which has been updated to use
the `any_inverted` parameter. When `any_inverted` is true, the helper
function checks if any of the bits in the range is unset, instead of
checking the opposite.

Function `bitmap_bit_in_range_p` has been updated to call
`bitmap_bit_in_range_p_1` with the `any_inverted` parameter
set to false, retaining its previous functionality.

Function `bitmap_is_range_set_p` calls `bitmap_bit_in_range_p_1`
with `any_inverted` set to true and returns the negation of the
result, i.e. true if all the bits in the range are set.

gcc/ChangeLog:

* sbitmap.cc (bitmap_bit_in_range_p_1): Added the `any_inverted`
parameter and changed the logic to check if any of the bits in
the range is unset, when the value of the parameter is "true".
(bitmap_is_range_set_p): New function.
(bitmap_bit_in_range_p): Call and return the result of
`bitmap_bit_in_range_p_1` with the `any_inverted` parameter set
to false.
* sbitmap.h (bitmap_is_range_set_p): New function.

Signed-off-by: Konstantinos Eleftheriou 
---

(no changes since v1)

 gcc/sbitmap.cc | 27 ---
 gcc/sbitmap.h  |  1 +
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/gcc/sbitmap.cc b/gcc/sbitmap.cc
index 94f2bbd6c8fd..99f1db540ab6 100644
--- a/gcc/sbitmap.cc
+++ b/gcc/sbitmap.cc
@@ -326,12 +326,13 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
unsigned int count)
   bmap->elms[start_word] |= mask;
 }
 
-/* Return TRUE if any bit between START and END inclusive is set within
-   the simple bitmap BMAP.  Return FALSE otherwise.  */
+/* Helper function for bitmap_bit_in_range_p and bitmap_is_range_set_p.
+   If ANY_INVERTED is true, the function checks if any bit in the range
+   is unset.  */
 
 bool
 bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int start,
-unsigned int end)
+unsigned int end, bool any_inverted)
 {
   gcc_checking_assert (start <= end);
   bitmap_check_index (bmap, end);
@@ -351,7 +352,8 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
start,
 
   SBITMAP_ELT_TYPE low_mask = ((SBITMAP_ELT_TYPE)1 << start_bitno) - 1;
   SBITMAP_ELT_TYPE mask = high_mask - low_mask;
-  if (bmap->elms[start_word] & mask)
+  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
+  if ((bmap->elms[start_word] & mask) != expected_partial)
return true;
   start_word++;
 }
@@ -361,9 +363,10 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
start,
 
   /* Now test words at a time until we hit a partial word.  */
   unsigned int nwords = (end_word - start_word);
+  const SBITMAP_ELT_TYPE expected = any_inverted ? ~(SBITMAP_ELT_TYPE)0 : 0;
   while (nwords)
 {
-  if (bmap->elms[start_word])
+  if (bmap->elms[start_word] != expected)
return true;
   start_word++;
   nwords--;
@@ -373,7 +376,17 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
start,
   SBITMAP_ELT_TYPE mask = ~(SBITMAP_ELT_TYPE)0;
   if (end_bitno + 1 < SBITMAP_ELT_BITS)
 mask = ((SBITMAP_ELT_TYPE)1 << (end_bitno + 1)) - 1;
-  return (bmap->elms[start_word] & mask) != 0;
+  const SBITMAP_ELT_TYPE expected_partial = any_inverted ? mask : 0;
+  return (bmap->elms[start_word] & mask) != expected_partial;
+}
+
+/* Return TRUE if all bits between START and END inclusive are set within
+   the simple bitmap BMAP.  Return FALSE otherwise.  */
+
+bool
+bitmap_is_range_set_p (const_sbitmap bmap, unsigned int start, unsigned int 
end)
+{
+  return !bitmap_bit_in_range_p_1 (bmap, start, end, true);
 }
 
 /* Return TRUE if any bit between START and END inclusive is set within
@@ -382,7 +395,7 @@ bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int 
start,
 bool
 bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
end)
 {
-  return bitmap_bit_in_range_p_1 (bmap, start, end);
+  return bitmap_bit_in_range_p_1 (bmap, start, end, false);
 }
 
 #if GCC_VERSION < 3400
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index 66f9e138503c..4ff93e7a98f9 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -288,6 +288,7 @@ extern bool bitmap_ior (sbitmap, const_sbitmap, 
const_sbitmap);
 extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
 extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned int);
+extern bool bitmap_is_range_set_p (const_sbitmap, unsigned int, unsigned int);
 
 extern int bitmap_first_set_bit (const_sbitmap);
 extern int bitmap_last_set_bit (const_sbitmap);
-- 
2.49.0

[PATCH v3 0/3] asf: Fix ICE on emit_move_insn [PR119884]

2025-05-19 Thread Konstantinos Eleftheriou

During the base register initialization, when we are eliminating the load
instruction, we were calling `emit_move_insn` on registers of the same
size but of different mode in some cases, causing an ICE.

We update the base register initialization to use `lowpart_subreg`
instead of zero-extending the register's value, as the zero-extension
was wrong in the first place. There was an underlying issue, caused
by having multiple stores in the sequence with the same offset as the
load.

We fix this by removing stores that contribute nothing and are followed
by later ones that overwrite the values that they have written. We use
a bitmap to keep track of the bytes that each store writes to.

In order to simplify this process we are updating sbitmap with a function
that checks if all the bits in a range of the bitmap are set (similar
to `bitmap_bit_in_range_p` which checks for any byte in a range) and
use this to check if all the bytes that a store writes to have
already been written (the store sequence is reversed at that stage,
starting from the stores that are closer to the load instruction).

Changes in v3:
- Remove redundant stores, instead of generating a register move for
the first store that has the same offset as the load only.

Changes in v2:
- Use `lowpart_subreg` for the base register initialization, but
only for the first store that has the same offset as the load.

Changes in v1:
- Add a check for the register modes to match before calling `emit_mov_insn`.

Konstantinos Eleftheriou (3):
  sbitmap: Add bitmap_bit_in_range_p_1 helper function
  sbitmap: Add bitmap_is_range_set_p function
  asf: Fix calling of emit_move_insn on registers of different modes
[PR119884]

 gcc/avoid-store-forwarding.cc| 45 ++--
 gcc/sbitmap.cc   | 35 ++
 gcc/sbitmap.h|  1 +
 gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++
 4 files changed, 78 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c

-- 
2.49.0

[PATCH v3 1/3] sbitmap: Add bitmap_bit_in_range_p_1 helper function

2025-05-19 Thread Konstantinos Eleftheriou

This patch adds the `bitmap_bit_in_range_p_1` helper function,
in order to be used by `bitmap_bit_in_range_p`. The helper function
contains the previous implementation of `bitmap_bit_in_range_p` and
`bitmap_bit_in_range_p` has been updated to call the helper function.

gcc/ChangeLog:

* sbitmap.cc (bitmap_bit_in_range_p): Call `bitmap_bit_in_range_p_1`.
(bitmap_bit_in_range_p_1): New function.

Signed-off-by: Konstantinos Eleftheriou 
---

(no changes since v1)

 gcc/sbitmap.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/sbitmap.cc b/gcc/sbitmap.cc
index df2e1aa49358..94f2bbd6c8fd 100644
--- a/gcc/sbitmap.cc
+++ b/gcc/sbitmap.cc
@@ -330,7 +330,8 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
unsigned int count)
the simple bitmap BMAP.  Return FALSE otherwise.  */
 
 bool
-bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
end)
+bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int start,
+unsigned int end)
 {
   gcc_checking_assert (start <= end);
   bitmap_check_index (bmap, end);
@@ -375,6 +376,15 @@ bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int 
start, unsigned int end)
   return (bmap->elms[start_word] & mask) != 0;
 }
 
+/* Return TRUE if any bit between START and END inclusive is set within
+   the simple bitmap BMAP.  Return FALSE otherwise.  */
+
+bool
+bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
end)
+{
+  return bitmap_bit_in_range_p_1 (bmap, start, end);
+}
+
 #if GCC_VERSION < 3400
 /* Table of number of set bits in a character, indexed by value of char.  */
 static const unsigned char popcount_table[] =
-- 
2.49.0

[PATCH v2] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

Changes in v2:
  Addressed Tomasz's review comments, namely:
  * Added explicit iter_difference_t casts
  * Made _S_impl member private
  * Optimized sized bidirectional case of ends_with
  * Rearranged control flow of starts_with::_S_impl

Still left to do:
  * Add tests for integer-class types
  * Still working on a better commit description ;)

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__starts_with_fn, starts_with):
Define.
(__ends_with_fn, ends_with): Define.
* include/bits/version.def (ranges_starts_ends_with): Define.
* include/bits/version.h: Regenerate.
* include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
* src/c++23/std.cc.in (ranges::starts_with): Export.
(ranges::ends_with): Export.
* testsuite/25_algorithms/ends_with/1.cc: New test.
* testsuite/25_algorithms/starts_with/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 239 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/algorithm|   1 +
 libstdc++-v3/src/c++23/std.cc.in  |   4 +
 .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
 .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
 7 files changed, 519 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index f36e7dd59911..f889257a8ee2 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -438,6 +438,245 @@ namespace ranges
 
   inline constexpr __search_n_fn search_n{};
 
+#if __glibcxx_ranges_starts_ends_with // C++ >= 23
+  struct __starts_with_fn
+  {
+template _Sent1,
+input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
+typename _Pred = ranges::equal_to,
+typename _Proj1 = identity, typename _Proj2 = identity>
+  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Iter1 __first1, _Sent1 __last1,
+_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   iter_difference_t<_Iter1> __n1 = -1;
+   iter_difference_t<_Iter2> __n2 = -1;
+   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
+ __n1 = __last1 - __first1;
+   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
+ __n2 = __last2 - __first2;
+   return _S_impl(std::move(__first1), __last1, __n1,
+  std::move(__first2), __last2, __n2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+  }
+
+template
+  requires indirectly_comparable, iterator_t<_Range2>,
+_Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   range_difference_t<_Range1> __n1 = -1;
+   range_difference_t<_Range1> __n2 = -1;
+   if constexpr (sized_range<_Range1>)
+ __n1 = ranges::size(__r1);
+   if constexpr (sized_range<_Range2>)
+ __n2 = ranges::size(__r2);
+   return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
+  ranges::begin(__r2), ranges::end(__r2), __n2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+  }
+
+  private:
+template
+  static constexpr bool
+  _S_impl(_Iter1 __first1, _Sent1 __last1, iter_difference_t<_Iter1> __n1,
+ _Iter2 __first2, _Sent2 __last2, iter_difference_t<_Iter2> __n2,
+ _Pred __pred,
+ _Proj1 __proj1, _Proj2 __proj2)
+  {
+   if (__first2 == __last2)
+ return true;
+   else if (__n1 == -1 || __n2 == -1)
+ return ranges::mismatch(std::move(__first1), __last1,
+ std::move(__first2), __last2,
+ std::move(__pred),
+ std::move(__proj1), std::move(__proj2)).in2 
== __last2;
+   else if (__n1 < __n2)
+ return false;
+   else
+ {
+   if constexpr (random_access_iterator<_Iter1>)
+ return ranges::equal(__first1, __first1 + 
iter_difference_t<_Iter1>(__n2),
+  std::move(__first2), __last2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+   else
+ return ranges::equal(counted_iterator(std::move(__first1),
+   
iter_dif

[PATCH v3 3/3] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-19 Thread Konstantinos Eleftheriou

This patch uses `lowpart_subreg` for the base register initialization,
instead of zero-extending it. We had tried this solution before, but
we were leaving undefined bytes in the upper part of the register.
This shouldn't be happening as we are supposed to write the whole
register when the load is eliminated. This was occurring when having
multiple stores with the same offset as the load, generating a
register move for all of them, overwriting the bit inserts that
were inserted before them.

In order to overcome this, we are removing redundant stores from the sequence,
i.e. stores that write to addresses that will be overwritten by stores that
come after them in the sequence. We are using the same bitmap that is used
for the load elimination check, to keep track of the bytes that are written
by each store.

Also, we are now allowing the load to be eliminated even when there are
overlaps between the stores, as there is no obvious reason why we shouldn't
do that, we just want the stores to cover all of the load's bytes.

Bootstrapped/regtested on AArch64 and x86_64.

PR rtl-optimization/119884

gcc/ChangeLog:

* avoid-store-forwarding.cc (process_store_forwarding):
Use `lowpart_subreg` for the base register initialization,
and remove redundant stores from the store/load sequence.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr119884.c: New test.

Signed-off-by: Konstantinos Eleftheriou 
---

Changes in v3:
- Remove redundant stores, instead of generating a register move for
the first store that has the same offset as the load only.

Changes in v2:
- Use `lowpart_subreg` for the base register initialization, but
only for the first store that has the same offset as the load.

Changes in v1:
- Add a check for the register modes to match before calling `emit_mov_insn`.

 gcc/avoid-store-forwarding.cc| 45 ++--
 gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++
 2 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c

diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
index 5d960adec359..f88a001e5717 100644
--- a/gcc/avoid-store-forwarding.cc
+++ b/gcc/avoid-store-forwarding.cc
@@ -176,20 +176,28 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_insn,
   /* Memory sizes should be constants at this stage.  */
   HOST_WIDE_INT load_size = MEM_SIZE (load_mem).to_constant ();
 
-  /* If the stores cover all the bytes of the load without overlap then we can
- eliminate the load entirely and use the computed value instead.  */
+  /* If the stores cover all the bytes of the load, then we can eliminate
+ the load entirely and use the computed value instead.
+ We can also eliminate stores on addresses that are overwritten
+ by later stores.  */
 
   sbitmap forwarded_bytes = sbitmap_alloc (load_size);
   bitmap_clear (forwarded_bytes);
 
   unsigned int i;
   store_fwd_info* it;
+  auto_vec redundant_stores;
+  auto_vec store_ind_to_remove;
   FOR_EACH_VEC_ELT (stores, i, it)
 {
   HOST_WIDE_INT store_size = MEM_SIZE (it->store_mem).to_constant ();
-  if (bitmap_bit_in_range_p (forwarded_bytes, it->offset,
+  if (bitmap_is_range_set_p (forwarded_bytes, it->offset,
 it->offset + store_size - 1))
-   break;
+   {
+ redundant_stores.safe_push (*it);
+ store_ind_to_remove.safe_push (i);
+ continue;
+   }
   bitmap_set_range (forwarded_bytes, it->offset, store_size);
 }
 
@@ -215,6 +223,11 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_insn,
fprintf (dump_file, "(Load elimination candidate)\n");
 }
 
+  /* Remove redundant stores from the vector.  */
+  store_ind_to_remove.reverse ();
+  for (int i : store_ind_to_remove)
+stores.ordered_remove (i);
+
   rtx load = single_set (load_insn);
   rtx dest;
 
@@ -231,18 +244,16 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_insn,
 {
   it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
   rtx_insn *insns = NULL;
+  const bool has_zero_offset = it->offset == 0;
 
   /* If we're eliminating the load then find the store with zero offset
 and use it as the base register to avoid a bit insert if possible.  */
-  if (load_elim && it->offset == 0)
+  if (load_elim && has_zero_offset)
{
  start_sequence ();
 
- machine_mode dest_mode = GET_MODE (dest);
- rtx base_reg = it->mov_reg;
- if (known_gt (GET_MODE_BITSIZE (dest_mode),
-   GET_MODE_BITSIZE (GET_MODE (it->mov_reg
-   base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg);
+ rtx base_reg = lowpart_subreg (GET_MODE (dest), it->mov_reg,
+GET_MODE (it->mov_reg));
 
  if (base_reg)
{
@@ -380,6 +391,16 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_i

[PATCH] c++/modules: Always mark tinfo vars as TREE_ADDRESSABLE [PR120350]

2025-05-19 Thread Nathaniel Shead

Regtested on x86_64-pc-linux-gnu (so far just modules.exp), OK for
trunk if full bootstrap+regtest succeeds?  And maybe 15?

-- >8 --

We need to mark type info decls as addressable if we take them by
reference; this is done by walking the declaration during parsing and
marking the decl as needed.

However, with modules we don't stream tinfo decls directly; rather we
stream just their name and type and reconstruct them in the importer
directly.  This means that any addressable flags are not propagated, and
we error because TREE_ADDRESSABLE is not set despite taking its address.

But tinfo decls should always have TREE_ADDRESSABLE set, as any attempt
to use the tinfo decl will go through build_address anyway.  So this
patch fixes the issue by eagerly marking the constructed decl as
TREE_ADDRESSABLE so that modules gets this flag correctly set as well.

PR c++/120350

gcc/cp/ChangeLog:

* rtti.c (get_tinfo_decl_direct): Mark TREE_ADDRESSABLE.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tinfo-3_a.H: New test.
* g++.dg/modules/tinfo-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/rtti.cc   | 1 +
 gcc/testsuite/g++.dg/modules/tinfo-3_a.H | 7 +++
 gcc/testsuite/g++.dg/modules/tinfo-3_b.C | 8 
 3 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/tinfo-3_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/tinfo-3_b.C

diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index 18bc479dc50..c06a18b3ff1 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -468,6 +468,7 @@ get_tinfo_decl_direct (tree type, tree name, int pseudo_ix)
   DECL_IGNORED_P (d) = 1;
   TREE_READONLY (d) = 1;
   TREE_STATIC (d) = 1;
+  TREE_ADDRESSABLE (d) = 1;
   /* Tell equal_address_to that different tinfo decls never
 overlap.  */
   if (vec_safe_is_empty (unemitted_tinfo_decls))
diff --git a/gcc/testsuite/g++.dg/modules/tinfo-3_a.H 
b/gcc/testsuite/g++.dg/modules/tinfo-3_a.H
new file mode 100644
index 000..8b53e9848b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tinfo-3_a.H
@@ -0,0 +1,7 @@
+// PR c++/120350
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+#include 
+struct S {};
+inline const std::type_info& tinfo = typeid(S);
diff --git a/gcc/testsuite/g++.dg/modules/tinfo-3_b.C 
b/gcc/testsuite/g++.dg/modules/tinfo-3_b.C
new file mode 100644
index 000..95e02ab5c81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tinfo-3_b.C
@@ -0,0 +1,8 @@
+// PR c++/120350
+// { dg-additional-options "-fmodules" }
+
+import "tinfo-3_a.H";
+
+int main() {
+  return tinfo == typeid(int);
+}
-- 
2.47.0

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 4:05 PM Patrick Palka  wrote:

> On Mon, 19 May 2025, Tomasz Kaminski wrote:
>
> >
> >
> > On Mon, May 19, 2025 at 9:59 AM Tomasz Kaminski 
> wrote:
> >
> >
> > On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
> > I would appreciate a short explanation on the approach being put here,
> > in the message. Like passing -1 as means of saying, size not know.
> >
> >   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > From the non-stylistic changes, I have noticed that we need some
> explicit conversion between different types.
> > I think we should add test that would check with size being
> __max_diff_t, maybe we should add such a range to
> > testutils_iterators.
> > Rest of the of comments are mostly stylistic.
> >
> >
> >   -- >8 --
> >
> >   libstdc++-v3/ChangeLog:
> >
> >   * include/bits/ranges_algo.h (__starts_with_fn,
> starts_with):
> >   Define.
> >   (__ends_with_fn, ends_with): Define.
> >   * include/bits/version.def (ranges_starts_ends_with):
> Define.
> >   * include/bits/version.h: Regenerate.
> >   * include/std/algorithm: Provide
> __cpp_lib_ranges_starts_ends_with.
> >   * src/c++23/std.cc.in (ranges::starts_with): Export.
> >   (ranges::ends_with): Export.
> >   * testsuite/25_algorithms/ends_with/1.cc: New test.
> >   * testsuite/25_algorithms/starts_with/1.cc: New test.
> >   ---
> >libstdc++-v3/include/bits/ranges_algo.h   | 232
> ++
> >libstdc++-v3/include/bits/version.def |   8 +
> >libstdc++-v3/include/bits/version.h   |  10 +
> >libstdc++-v3/include/std/algorithm|   1 +
> >libstdc++-v3/src/c++23/std.cc.in  |   4 +
> >.../testsuite/25_algorithms/ends_with/1.cc| 129 ++
> >.../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
> >7 files changed, 512 insertions(+)
> >create mode 100644
> libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
> >create mode 100644
> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> >
> >   diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> >   index f36e7dd59911..c59a555f528a 100644
> >   --- a/libstdc++-v3/include/bits/ranges_algo.h
> >   +++ b/libstdc++-v3/include/bits/ranges_algo.h
> >   @@ -438,6 +438,238 @@ namespace ranges
> >
> >  inline constexpr __search_n_fn search_n{};
> >
> >   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> >   +  struct __starts_with_fn
> >   +  {
> >   +template _Sent1,
> >   +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> >   +typename _Pred = ranges::equal_to,
> >   +typename _Proj1 = identity, typename _Proj2 =
> identity>
> >   +  requires indirectly_comparable<_Iter1, _Iter2, _Pred,
> _Proj1, _Proj2>
> >   +  constexpr bool
> >   +  operator()(_Iter1 __first1, _Sent1 __last1,
> >   +_Iter2 __first2, _Sent2 __last2, _Pred __pred =
> {},
> >   +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> >   +  {
> >
> > We could check __first == __last2 early here, before even computing size.
>
> Hmm, computing the size should be very cheap though?

It's a bit complex for cartesian_view of sized views.

> I'm not sure this
> first would be worthwhile since an empty needle is a very uncommon case.
>
Then I would mark it with [[unlikely]] to indicate so.

>
> > +   iter_difference_t<_Iter1> __n1 = -1;
> > +   iter_difference_t<_Iter2> __n2 = -1;
> > +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> > + __n1 = __last1 - __first1;
> > +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> > + __n2 = __last2 - __first2;
> > +   return _S_impl(std::move(__first1), __last1,
> > +  std::move(__first2), __last2,
> > +  std::move(__pred),
> > +  std::move(__proj1),
> std::move(__proj2),
> > +  __n1, __n2);
> > +  }
> > +
> > +template > +typename _Pred = ranges::equal_to,
> > +typename _Proj1 = identity, typename _Proj2 =
> identity>
> > +  requires indirectly_comparable,
> iterator_t<_Range2>,
> > +_Pred, _Proj1, _Proj2>
> > +  constexpr bool
> > +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred
> __pred = {},
> > +_Proj1 __proj1 = {}, _Proj2 __proj2 = {})
> const
> > +  {
> >
> > Similar here:

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 4:09 PM Tomasz Kaminski  wrote:

>
>
> On Mon, May 19, 2025 at 4:02 PM Patrick Palka  wrote:
>
>> On Mon, 19 May 2025, Tomasz Kaminski wrote:
>>
>> >
>> >
>> > On Mon, May 19, 2025 at 6:47 AM Patrick Palka 
>> wrote:
>> > I would appreciate a short explanation on the approach being put here,
>> > in the message. Like passing -1 as means of saying, size not know.
>> >
>> >   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>> >
>> > From the non-stylistic changes, I have noticed that we need some
>> explicit conversion between different types.
>> > I think we should add test that would check with size being
>> __max_diff_t, maybe we should add such a range to
>> > testutils_iterators.
>> > Rest of the of comments are mostly stylistic.
>> >
>> >
>> >   -- >8 --
>> >
>> >   libstdc++-v3/ChangeLog:
>> >
>> >   * include/bits/ranges_algo.h (__starts_with_fn,
>> starts_with):
>> >   Define.
>> >   (__ends_with_fn, ends_with): Define.
>> >   * include/bits/version.def (ranges_starts_ends_with):
>> Define.
>> >   * include/bits/version.h: Regenerate.
>> >   * include/std/algorithm: Provide
>> __cpp_lib_ranges_starts_ends_with.
>> >   * src/c++23/std.cc.in (ranges::starts_with): Export.
>> >   (ranges::ends_with): Export.
>> >   * testsuite/25_algorithms/ends_with/1.cc: New test.
>> >   * testsuite/25_algorithms/starts_with/1.cc: New test.
>> >   ---
>> >libstdc++-v3/include/bits/ranges_algo.h   | 232
>> ++
>> >libstdc++-v3/include/bits/version.def |   8 +
>> >libstdc++-v3/include/bits/version.h   |  10 +
>> >libstdc++-v3/include/std/algorithm|   1 +
>> >libstdc++-v3/src/c++23/std.cc.in  |   4 +
>> >.../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>> >.../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>> >7 files changed, 512 insertions(+)
>> >create mode 100644
>> libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>> >create mode 100644
>> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>> >
>> >   diff --git a/libstdc++-v3/include/bits/ranges_algo.h
>> b/libstdc++-v3/include/bits/ranges_algo.h
>> >   index f36e7dd59911..c59a555f528a 100644
>> >   --- a/libstdc++-v3/include/bits/ranges_algo.h
>> >   +++ b/libstdc++-v3/include/bits/ranges_algo.h
>> >   @@ -438,6 +438,238 @@ namespace ranges
>> >
>> >  inline constexpr __search_n_fn search_n{};
>> >
>> >   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>> >   +  struct __starts_with_fn
>> >   +  {
>> >   +template _Sent1,
>> >   +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>> >   +typename _Pred = ranges::equal_to,
>> >   +typename _Proj1 = identity, typename _Proj2 =
>> identity>
>> >   +  requires indirectly_comparable<_Iter1, _Iter2, _Pred,
>> _Proj1, _Proj2>
>> >   +  constexpr bool
>> >   +  operator()(_Iter1 __first1, _Sent1 __last1,
>> >   +_Iter2 __first2, _Sent2 __last2, _Pred __pred =
>> {},
>> >   +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> >   +  {
>> >   +   iter_difference_t<_Iter1> __n1 = -1;
>> >   +   iter_difference_t<_Iter2> __n2 = -1;
>> >   +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
>> >   + __n1 = __last1 - __first1;
>> >   +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
>> >   + __n2 = __last2 - __first2;
>> >   +   return _S_impl(std::move(__first1), __last1,
>> >   +  std::move(__first2), __last2,
>> >   +  std::move(__pred),
>> >   +  std::move(__proj1), std::move(__proj2),
>> >   +  __n1, __n2);
>> >   +  }
>> >   +
>> >   +template> >   +typename _Pred = ranges::equal_to,
>> >   +typename _Proj1 = identity, typename _Proj2 =
>> identity>
>> >   +  requires indirectly_comparable,
>> iterator_t<_Range2>,
>> >   +_Pred, _Proj1, _Proj2>
>> >   +  constexpr bool
>> >   +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred =
>> {},
>> >   +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>> >   +  {
>> >   +   range_difference_t<_Range1> __n1 = -1;
>> >   +   range_difference_t<_Range1> __n2 = -1;
>> >   +   if constexpr (sized_range<_Range1>)
>> >   + __n1 = ranges::size(__r1);
>> >   +   if constexpr (sized_range<_Range2>)
>> >   + __n2 = ranges::size(__r2);
>> >   +   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
>> >   +

Re: [PATCH v2] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-19 Thread Konstantinos Eleftheriou

Hi Richard, thanks again for the feedback.

We have sent an updated version, which removes the redundant store
operations (https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684096.html).
This one is a series as we needed to update sbitmap with a function
that checks if a range in the bitmap is set. It could be done in other
ways, but this seems cleaner.

On Thu, May 8, 2025 at 5:55 PM Richard Sandiford
 wrote:
>
> Konstantinos Eleftheriou  writes:
> > During the base register initialization, when we are eliminating the load
> > instruction, we were calling `emit_move_insn` on registers of the same
> > size but of different mode in some cases, causing an ICE.
> >
> > This patch uses `lowpart_subreg` for the base register initialization,
> > instead of zero-extending it. We had tried this solution before, but
> > we were leaving undefined bytes in the upper part of the register.
> > This shouldn't be happening as we are supposed to write the whole
> > register when the load is eliminated. This was occurring when having
> > multiple stores with the same offset as the load, generating a
> > register move for all of them, overwriting the bit inserts that
> > were inserted before them. With this patch we are generating a register
> > move only for the first store of this kind, using a bit insert for the
> > rest of them.
>
> That feels wrong though.  If there are multiple stores to the same offset
> then it becomes a question of which bytes of which stores survive until
> the load.  E.g. for a QI store followed by an HI store followed by an SI
> store, the final SI store wins and the previous ones should be ignored.
> If it's QI, SI, HI, then for little endian, the low two bytes come from
> the HI and the next two bytes come from the SI.  The QI store should
> again be ignored.
>
> So I would expect this to depend on which store is widest, with ties
> broken by picking later stores (i.e. those earlier in the list).
>
> I'm also not sure why this is only a problem with using lowparts.
> Wouldn't the same issue apply when using zero_extend?  The bytes are
> fully-defined for zero_extend, but not necessarily to the right values.
>
This is an issue with zero_extend too. Zero-extending was a wrong fix
in the first place, it just masked the issue that we had in an older
PR.

Thanks,
Konstantinos

Re: [PATCH 5/5 v2] c++, coroutines: Clean up the ramp cleanups.

2025-05-19 Thread Jason Merrill


On 5/16/25 10:15 AM, Iain Sandoe wrote:

Hi Jason,


+   = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
+  do_fr_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
+ do_fr_cleanup, coro_before_return);



This also needs reversing (and similarly below).


Fixed.


+  tree fr_cleanup_if = begin_if_stmt ();
+  finish_if_stmt_cond (do_fr_cleanup, fr_cleanup_if);
+  finish_expr_stmt (delete_frame_call);
+  finish_then_clause (fr_cleanup_if);
+  finish_if_stmt (fr_cleanup_if);



You could build a COND_EXPR instead of taking several statements to build an 
IF_STMT?  i.e.

frame_cleanup = build3 (COND_EXPR, void_type_node, fr_cleanup_if,
   delete_frame_call, void_node);


done.

OK for trunk now?
thanks
Iain

+  /* deref the frame pointer, to use in member access code.  */


Let's capitalize this comment while we're moving it.  OK with that tweak.

Jason

Re: [PATCH v2] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

On Mon, 19 May 2025, Patrick Palka wrote:

> Changes in v2:
>   Addressed Tomasz's review comments, namely:
>   * Added explicit iter_difference_t casts
>   * Made _S_impl member private
>   * Optimized sized bidirectional case of ends_with
>   * Rearranged control flow of starts_with::_S_impl
> 
> Still left to do:
>   * Add tests for integer-class types
>   * Still working on a better commit description ;)
> 
> -- >8 --
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>   Define.
>   (__ends_with_fn, ends_with): Define.
>   * include/bits/version.def (ranges_starts_ends_with): Define.
>   * include/bits/version.h: Regenerate.
>   * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
>   * src/c++23/std.cc.in (ranges::starts_with): Export.
>   (ranges::ends_with): Export.
>   * testsuite/25_algorithms/ends_with/1.cc: New test.
>   * testsuite/25_algorithms/starts_with/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 239 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/algorithm|   1 +
>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>  7 files changed, 519 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> 
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f36e7dd59911..f889257a8ee2 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -438,6 +438,245 @@ namespace ranges
>  
>inline constexpr __search_n_fn search_n{};
>  
> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> +  struct __starts_with_fn
> +  {
> +template _Sent1,
> +  input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +  typename _Pred = ranges::equal_to,
> +  typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1,
> +  _Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> +  _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> + iter_difference_t<_Iter1> __n1 = -1;
> + iter_difference_t<_Iter2> __n2 = -1;
> + if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> +   __n1 = __last1 - __first1;
> + if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> +   __n2 = __last2 - __first2;
> + return _S_impl(std::move(__first1), __last1, __n1,
> +std::move(__first2), __last2, __n2,
> +std::move(__pred),
> +std::move(__proj1), std::move(__proj2));
> +  }
> +
> +template +  typename _Pred = ranges::equal_to,
> +  typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable, 
> iterator_t<_Range2>,
> +  _Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +  _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> + range_difference_t<_Range1> __n1 = -1;
> + range_difference_t<_Range1> __n2 = -1;
> + if constexpr (sized_range<_Range1>)
> +   __n1 = ranges::size(__r1);
> + if constexpr (sized_range<_Range2>)
> +   __n2 = ranges::size(__r2);
> + return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
> +ranges::begin(__r2), ranges::end(__r2), __n2,
> +std::move(__pred),
> +std::move(__proj1), std::move(__proj2));
> +  }
> +
> +  private:
> +template _Sent2,
> +  typename _Pred,
> +  typename _Proj1, typename _Proj2>
> +  static constexpr bool
> +  _S_impl(_Iter1 __first1, _Sent1 __last1, iter_difference_t<_Iter1> 
> __n1,
> +   _Iter2 __first2, _Sent2 __last2, iter_difference_t<_Iter2> __n2,
> +   _Pred __pred,
> +   _Proj1 __proj1, _Proj2 __proj2)
> +  {
> + if (__first2 == __last2)
> +   return true;
> + else if (__n1 == -1 || __n2 == -1)
> +   return ranges::mismatch(std::move(__first1), __last1,
> +   std::move(__first2), __last2,
> +   std::move(__pred),
> +   std::move(__proj1), std::move(__proj2)).in2 
> == __last2;
> + else if (__n1 < __n2)
> +   return false;
> + else
> +   {
> + if constexpr (random_access_iterator<_Iter1>)

Oops, I forgot to also flatten this nested if. Consider that fixed

>

Re: [PATCH v3 1/3] sbitmap: Add bitmap_bit_in_range_p_1 helper function

2025-05-19 Thread Philipp Tomsich

On Mon, 19 May 2025 at 16:10, Konstantinos Eleftheriou
 wrote:
>
> This patch adds the `bitmap_bit_in_range_p_1` helper function,
> in order to be used by `bitmap_bit_in_range_p`. The helper function
> contains the previous implementation of `bitmap_bit_in_range_p` and
> `bitmap_bit_in_range_p` has been updated to call the helper function.
>
> gcc/ChangeLog:
>
> * sbitmap.cc (bitmap_bit_in_range_p): Call `bitmap_bit_in_range_p_1`.
> (bitmap_bit_in_range_p_1): New function.
>
> Signed-off-by: Konstantinos Eleftheriou 
> ---
>
> (no changes since v1)
>
>  gcc/sbitmap.cc | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sbitmap.cc b/gcc/sbitmap.cc
> index df2e1aa49358..94f2bbd6c8fd 100644
> --- a/gcc/sbitmap.cc
> +++ b/gcc/sbitmap.cc
> @@ -330,7 +330,8 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
> unsigned int count)
> the simple bitmap BMAP.  Return FALSE otherwise.  */
>
>  bool

Make this "static" (but no need to send a new version just for this).
I missed this in the internal reviews.

> -bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
> end)
> +bitmap_bit_in_range_p_1 (const_sbitmap bmap, unsigned int start,
> +unsigned int end)
>  {
>gcc_checking_assert (start <= end);
>bitmap_check_index (bmap, end);
> @@ -375,6 +376,15 @@ bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int 
> start, unsigned int end)
>return (bmap->elms[start_word] & mask) != 0;
>  }
>
> +/* Return TRUE if any bit between START and END inclusive is set within
> +   the simple bitmap BMAP.  Return FALSE otherwise.  */
> +
> +bool
> +bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
> end)
> +{
> +  return bitmap_bit_in_range_p_1 (bmap, start, end);
> +}
> +
>  #if GCC_VERSION < 3400
>  /* Table of number of set bits in a character, indexed by value of char.  */
>  static const unsigned char popcount_table[] =
> --
> 2.49.0
>

Re: [PATCH v2] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Tomasz Kaminski

On Mon, May 19, 2025 at 4:11 PM Patrick Palka  wrote:

> Changes in v2:
>   Addressed Tomasz's review comments, namely:
>   * Added explicit iter_difference_t casts
>   * Made _S_impl member private
>   * Optimized sized bidirectional case of ends_with
>   * Rearranged control flow of starts_with::_S_impl
>
> Still left to do:
>   * Add tests for integer-class types
>   * Still working on a better commit description ;)
>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
> Define.
> (__ends_with_fn, ends_with): Define.
> * include/bits/version.def (ranges_starts_ends_with): Define.
> * include/bits/version.h: Regenerate.
> * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
> * src/c++23/std.cc.in (ranges::starts_with): Export.
> (ranges::ends_with): Export.
> * testsuite/25_algorithms/ends_with/1.cc: New test.
> * testsuite/25_algorithms/starts_with/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 239 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/algorithm|   1 +
>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>  7 files changed, 519 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f36e7dd59911..f889257a8ee2 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -438,6 +438,245 @@ namespace ranges
>
>inline constexpr __search_n_fn search_n{};
>
> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> +  struct __starts_with_fn
> +  {
> +template _Sent1,
> +input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1,
> _Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1,
> +_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   iter_difference_t<_Iter1> __n1 = -1;
> +   iter_difference_t<_Iter2> __n2 = -1;
> +   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> + __n1 = __last1 - __first1;
> +   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> + __n2 = __last2 - __first2;
> +   return _S_impl(std::move(__first1), __last1, __n1,
> +  std::move(__first2), __last2, __n2,
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2));
> +  }
> +
> +template +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable,
> iterator_t<_Range2>,
> +_Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   range_difference_t<_Range1> __n1 = -1;
> +   range_difference_t<_Range1> __n2 = -1;
> +   if constexpr (sized_range<_Range1>)
> + __n1 = ranges::size(__r1);
> +   if constexpr (sized_range<_Range2>)
> + __n2 = ranges::size(__r2);
> +   return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
> +  ranges::begin(__r2), ranges::end(__r2), __n2,
> +  std::move(__pred),
> +  std::move(__proj1), std::move(__proj2));
> +  }
> +
> +  private:
> +template _Sent2,
> +typename _Pred,
> +typename _Proj1, typename _Proj2>
> +  static constexpr bool
> +  _S_impl(_Iter1 __first1, _Sent1 __last1, iter_difference_t<_Iter1>
> __n1,
> + _Iter2 __first2, _Sent2 __last2, iter_difference_t<_Iter2>
> __n2,
> + _Pred __pred,
> + _Proj1 __proj1, _Proj2 __proj2)
> +  {
> +   if (__first2 == __last2)
> + return true;
> +   else if (__n1 == -1 || __n2 == -1)
> + return ranges::mismatch(std::move(__first1), __last1,
> + std::move(__first2), __last2,
> + std::move(__pred),
> + std::move(__proj1),
> std::move(__proj2)).in2 == __last2;
> +   else if (__n1 < __n2)
> + return false;
> +   else
> + {
> +   i

[PING][PATCH v3] reassoc: Optimize CMP/XOR expressions [PR116860]

2025-05-19 Thread Konstantinos Eleftheriou

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677788.html .

Thanks,
Konstantinos

Re: [PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-19 Thread Konstantinos Eleftheriou

On Wed, May 7, 2025 at 11:29 AM Richard Sandiford
 wrote:
>
> Konstantinos Eleftheriou  writes:
> > Hi Richard,
> >
> > Thanks for the feedback! We have sent a new version that uses
> > lowpart_subreg 
> > (https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682835.html).
> > We had tried that before, but we were mishandling the case where there
> > are multiple stores with the same offset as the load.
>
> Thanks, I'll have a look.
>
> > As for `it->offset`, that's actually the offset difference between the
> > store and the load (we're trying to find the store with the same
> > offset as the load), so the endianness should be irrelevant in that
> > case.
>
> But I thought the code was allowing multiple stores to be forwarded to
> a single (wider) load.  E.g. 4 individual byte stores at address X, X+1,
> X+2 and X+3 could be forwarded to a 4-byte load at address X.  And the code
> I mentioned is handling the least significant byte by zero-extending it.
>
> For big-endian targets, the least significant byte should come from
> address X+3 rather than address X.  The byte at address X (i.e. the
> byte with the equal offset) should instead go in the most significant
> byte, typically using a shift left.

You are right about that. We will submit a fix for this in a separate patch.

Thanks,
Konstantinos

Re: [PATCH v6 1/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 1) [PR109071, PR85788, PR88771, PR106762,

2025-05-19 Thread Richard Biener

On Fri, May 16, 2025 at 3:34 PM Qing Zhao  wrote:
>
> Control this with a new option -fdiagnostics-details.
>
> $ cat t.c
> extern void warn(void);
> static inline void assign(int val, int *regs, int *index)
> {
>   if (*index >= 4)
> warn();
>   *regs = val;
> }
> struct nums {int vals[4];};
>
> void sparx5_set (int *ptr, struct nums *sg, int index)
> {
>   int *val = &sg->vals[index];
>
>   assign(0,ptr, &index);
>   assign(*val, ptr, &index);
> }
>
> $ gcc -Wall -O2  -c -o t.o t.c
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
> [-Warray-bounds=]
>12 |   int *val = &sg->vals[index];
>   |   ^~~
> t.c:8:18: note: while referencing ‘vals’
> 8 | struct nums {int vals[4];};
>   |  ^~~~
>
> In the above, Although the warning is correct in theory, the warning message
> itself is confusing to the end-user since there is information that cannot
> be connected to the source code directly.
>
> It will be a nice improvement to add more information in the warning message
> to report where such index value come from.
>
> In order to achieve this, we add a new data structure "move_history" to record
> 1. the "condition" that triggers the code movement;
> 2. whether the code movement is on the true path of the "condition";
> 3. the "compiler transformation" that triggers the code movement.
>
> Whenever there is a code movement along control flow graph due to some
> specific transformations, such as jump threading, path isolation, tree
> sinking, etc., a move_history structure is created and attached to the
> moved gimple statement.
>
> During array out-of-bound checking or -Wstringop-* warning checking, the
> "move_history" that was attached to the gimple statement is used to form
> a sequence of diagnostic events that are added to the corresponding rich
> location to be used to report the warning message.
>
> This behavior is controled by the new option -fdiagnostics-details
> which is off by default.
>
> With this change, by adding -fdiagnostics-details,
> the warning message for the above testing case is now:
>
> $ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
> [-Warray-bounds=]
>12 |   int *val = &sg->vals[index];
>   |   ^~~
>   ‘sparx5_set’: events 1-2
> 4 |   if (*index >= 4)
>   |  ^
>   |  |
>   |  (1) when the condition is evaluated to true
> ..
>12 |   int *val = &sg->vals[index];
>   |   ~~~
>   |   |
>   |   (2) out of array bounds here
> t.c:8:18: note: while referencing ‘vals’
> 8 | struct nums {int vals[4];};
>   |  ^~~~
>
> The change was divided into 3 parts:
>
> Part 1: Add new data structure move_history, record move_history during
> transformation;
> Part 2: In warning analysis, Use the new move_history to form a rich
> location with a sequence of events, to report more context info
> of the warnings.
> Part 3: Add debugging mechanism for move_history.

Thanks for working on this.  I'm pasting my review notes on [1/3] below.


ove histories are allocated from a global obstack - it seems they are
never released, just the association between stmt and history is
eventually broken by remove_move_history?  Do you have any statistics
on the memory usage?  Might be an interesting bit to dump with
-fmem-report (the size of the obstack).  Likewise the statistics
on the hash-map are interesting.

static bool
+is_move_history_existed (location_t cond_location, bool is_true_path,
+enum move_reason

that's a bit of an odd name - 'move_history_exists_p'?  What are
we supposed to do on a duplicate?  This is a linear search - how
many do we accumulate in practice?  Does it make sense to put a
limit on the number of transforms we record?

static gimple *
+get_cond_stmt (edge entry, bool is_destination, bool *is_true_path)
+{

I don't quite understand the is_destination parameter, likewise
for the two APIs using this function?  Usually the control edge
and the BB the stmt is in can be different from edge->dest, and
I'd expect the callers to know, so I wonder why we would want to
search here?  In particular the use in path isolation for PHI
arguments passes in the edge of the problematic PHI arg but
the first reached gcond * might not be the full controlling
condition when there is more than two PHI arguments.  Not to
mention a switch statement might also be the control stmt.
While this can be extended in future I'd like the caller
to compute the control dependence that's relevant - using
an edge (or in future a vector of edges) is fine, it should
be the edge outgoing from the control stmt.

@opindex fdiagnostics-details
+@item -fdiagnostics-details

Not entirely happy w

[PATCH v3] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

Changes in v3:
  * Use the forward_range code path for a (non-sized) bidirectional
haystack, since it's slightly fewer increments/decrements
overall.
  * Fix wrong iter_difference_t cast in starts_with.

Changes in v2:
  Addressed Tomasz's review comments, namely:
  * Added explicit iter_difference_t casts
  * Made _S_impl member private
  * Optimized sized bidirectional case of ends_with
  * Rearranged control flow of starts_with::_S_impl

Still left to do:
  * Add tests for integer-class types
  * Still working on a better commit description ;)

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__starts_with_fn, starts_with):
Define.
(__ends_with_fn, ends_with): Define.
* include/bits/version.def (ranges_starts_ends_with): Define.
* include/bits/version.h: Regenerate.
* include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
* src/c++23/std.cc.in (ranges::starts_with): Export.
(ranges::ends_with): Export.
* testsuite/25_algorithms/ends_with/1.cc: New test.
* testsuite/25_algorithms/starts_with/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 236 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/algorithm|   1 +
 libstdc++-v3/src/c++23/std.cc.in  |   4 +
 .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
 .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
 7 files changed, 516 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index f36e7dd59911..54646ae62f7e 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -438,6 +438,242 @@ namespace ranges
 
   inline constexpr __search_n_fn search_n{};
 
+#if __glibcxx_ranges_starts_ends_with // C++ >= 23
+  struct __starts_with_fn
+  {
+template _Sent1,
+input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
+typename _Pred = ranges::equal_to,
+typename _Proj1 = identity, typename _Proj2 = identity>
+  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Iter1 __first1, _Sent1 __last1,
+_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   iter_difference_t<_Iter1> __n1 = -1;
+   iter_difference_t<_Iter2> __n2 = -1;
+   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
+ __n1 = __last1 - __first1;
+   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
+ __n2 = __last2 - __first2;
+   return _S_impl(std::move(__first1), __last1, __n1,
+  std::move(__first2), __last2, __n2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+  }
+
+template
+  requires indirectly_comparable, iterator_t<_Range2>,
+_Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   range_difference_t<_Range1> __n1 = -1;
+   range_difference_t<_Range1> __n2 = -1;
+   if constexpr (sized_range<_Range1>)
+ __n1 = ranges::size(__r1);
+   if constexpr (sized_range<_Range2>)
+ __n2 = ranges::size(__r2);
+   return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
+  ranges::begin(__r2), ranges::end(__r2), __n2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+  }
+
+  private:
+template
+  static constexpr bool
+  _S_impl(_Iter1 __first1, _Sent1 __last1, iter_difference_t<_Iter1> __n1,
+ _Iter2 __first2, _Sent2 __last2, iter_difference_t<_Iter2> __n2,
+ _Pred __pred,
+ _Proj1 __proj1, _Proj2 __proj2)
+  {
+   if (__first2 == __last2) [[unlikely]]
+ return true;
+   else if (__n1 == -1 || __n2 == -1)
+ return ranges::mismatch(std::move(__first1), __last1,
+ std::move(__first2), __last2,
+ std::move(__pred),
+ std::move(__proj1), std::move(__proj2)).in2 
== __last2;
+   else if (__n1 < __n2)
+ return false;
+   else if constexpr (random_access_iterator<_Iter1>)
+ return ranges::equal(__first1, __first1 + 
iter_difference_t<_Iter1>(__n2),
+  std::move(__first2), __last2,
+  std::move(__pred),
+  std::mo

Re: [PATCH v2] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

On Mon, 19 May 2025, Tomasz Kaminski wrote:

> 
> 
> On Mon, May 19, 2025 at 4:11 PM Patrick Palka  wrote:
>   Changes in v2:
>     Addressed Tomasz's review comments, namely:
>     * Added explicit iter_difference_t casts
>     * Made _S_impl member private
>     * Optimized sized bidirectional case of ends_with
>     * Rearranged control flow of starts_with::_S_impl
> 
>   Still left to do:
>     * Add tests for integer-class types
>     * Still working on a better commit description ;)
> 
>   -- >8 --
> 
>   libstdc++-v3/ChangeLog:
> 
>           * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>           Define.
>           (__ends_with_fn, ends_with): Define.
>           * include/bits/version.def (ranges_starts_ends_with): Define.
>           * include/bits/version.h: Regenerate.
>           * include/std/algorithm: Provide 
> __cpp_lib_ranges_starts_ends_with.
>           * src/c++23/std.cc.in (ranges::starts_with): Export.
>           (ranges::ends_with): Export.
>           * testsuite/25_algorithms/ends_with/1.cc: New test.
>           * testsuite/25_algorithms/starts_with/1.cc: New test.
>   ---
>    libstdc++-v3/include/bits/ranges_algo.h       | 239 ++
>    libstdc++-v3/include/bits/version.def         |   8 +
>    libstdc++-v3/include/bits/version.h           |  10 +
>    libstdc++-v3/include/std/algorithm            |   1 +
>    libstdc++-v3/src/c++23/std.cc.in              |   4 +
>    .../testsuite/25_algorithms/ends_with/1.cc    | 129 ++
>    .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>    7 files changed, 519 insertions(+)
>    create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>    create mode 100644 
> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> 
>   diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
>   index f36e7dd59911..f889257a8ee2 100644
>   --- a/libstdc++-v3/include/bits/ranges_algo.h
>   +++ b/libstdc++-v3/include/bits/ranges_algo.h
>   @@ -438,6 +438,245 @@ namespace ranges
> 
>      inline constexpr __search_n_fn search_n{};
> 
>   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>   +  struct __starts_with_fn
>   +  {
>   +    template _Sent1,
>   +            input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>   +            typename _Pred = ranges::equal_to,
>   +            typename _Proj1 = identity, typename _Proj2 = identity>
>   +      requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, 
> _Proj2>
>   +      constexpr bool
>   +      operator()(_Iter1 __first1, _Sent1 __last1,
>   +                _Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
>   +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>   +      {
>   +       iter_difference_t<_Iter1> __n1 = -1;
>   +       iter_difference_t<_Iter2> __n2 = -1;
>   +       if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
>   +         __n1 = __last1 - __first1;
>   +       if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
>   +         __n2 = __last2 - __first2;
>   +       return _S_impl(std::move(__first1), __last1, __n1,
>   +                      std::move(__first2), __last2, __n2,
>   +                      std::move(__pred),
>   +                      std::move(__proj1), std::move(__proj2));
>   +      }
>   +
>   +    template   +            typename _Pred = ranges::equal_to,
>   +            typename _Proj1 = identity, typename _Proj2 = identity>
>   +      requires indirectly_comparable, 
> iterator_t<_Range2>,
>   +                                    _Pred, _Proj1, _Proj2>
>   +      constexpr bool
>   +      operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
>   +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>   +      {
>   +       range_difference_t<_Range1> __n1 = -1;
>   +       range_difference_t<_Range1> __n2 = -1;
>   +       if constexpr (sized_range<_Range1>)
>   +         __n1 = ranges::size(__r1);
>   +       if constexpr (sized_range<_Range2>)
>   +         __n2 = ranges::size(__r2);
>   +       return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
>   +                      ranges::begin(__r2), ranges::end(__r2), __n2,
>   +                      std::move(__pred),
>   +                      std::move(__proj1), std::move(__proj2));
>   +      }
>   +
>   +  private:
>   +    template typename _Sent2,
>   +            typename _Pred,
>   +            typename _Proj1, typename _Proj2>
>   +      static constexpr bool
>   +      _S_impl(_Iter1 __first1, _Sent1 __last1, 
> iter_difference_t<_Iter1> __n1,
>   +             _Iter2 __first2, _Se

Re: [PATCH v2] c++, coroutines: Use decltype(auto) for the g_r_o.

2025-05-19 Thread Jason Merrill


On 5/16/25 10:11 AM, Iain Sandoe wrote:

Hi Jason,


+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+= finish_decltype_type (get_ro, /*id_expression_or_member_access_p*/true,



This should be false, not true; a call is not an id-expr or member access.


fixed.


 }
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */
-  coro_build_and_push_artificial_var_with_dve
-(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
- build_zero_cst (short_unsigned_type_node), deref_fp);



Moving this initialization doesn't seem connected to the type of gro, or 
mentioned above?


A fly-by tidy up.. removed.


I still see it in the new patch?


-promise_live = true;
+promise_life++;
 }



Please add a Promise copy constructor, whether deleted or defined so that 
promise_life is accurate if it's called.


Done.

OK for trunk now?
thanks
Iain

--- 8<--

The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references. The
test is expected to fail, since it attempts to initialize the
return object from an object that has already been destroyed.


This message doesn't quote my comment


Yes, though this is terrible, as noted in my email to core today.  Why doesn't 
this also break folly::Optional/Expected?


In DM you suggested that folly works just because nobody has noticed the 
UB yet.  This makes sense to me, but is more evidence that this is a 
serious defect; the promise (et al) must not be destroyed before the 
ramp returns.


So please xfail the test than test for the defect.

An approach to fixing this (in a later patch) might be to replace the 
new before_return with a flag in the frame such that, if set, the actor 
avoids destroying the state and instead clears the flag to indicate to 
the ramp that it still needs to destroy the state?  Then the ramp 
cleanup would no longer be EH-only or controlled by IARC, just 
controlled by this flag.  Thoughts?



gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc   | 22 ---
  gcc/testsuite/g++.dg/coroutines/pr115908.C | 74 +++---
  2 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 6169a81cea5..bd51c31e615 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -5120,8 +5120,11 @@ cp_coroutine_transform::build_ramp_function ()
/* Check for a bad get return object type.
   [dcl.fct.def.coroutine] / 7 requires:
   The expression promise.get_return_object() is used to initialize the
- returned reference or prvalue result object ... */
-  tree gro_type = TREE_TYPE (get_ro);
+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+= finish_decltype_type (get_ro, /*id_expression_or_member_access_p*/false,
+   tf_warning_or_error); // TREE_TYPE (get_ro);


Let's drop this comment, it looks vestigial.


if (VOID_TYPE_P (gro_type) && !void_ramp_p)
  {
error_at (fn_start, "no viable conversion from % provided by"
@@ -5129,11 +5132,6 @@ cp_coroutine_transform::build_ramp_function ()
return false;
  }
  
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */

-  coro_build_and_push_artificial_var_with_dve
-(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
- build_zero_cst (short_unsigned_type_node), deref_fp);
-
/* We must manage the cleanups ourselves, with the exception of the g_r_o,
   because the responsibility for them changes after the initial suspend.
   However, any use of cxx_maybe_build_cleanup () in preceding code can
@@ -5159,7 +5157,7 @@ cp_coroutine_transform::build_ramp_function ()
= coro_build_and_push_artificial_var (loc, "_Coro_gro", gro_type,
  orig_fn_decl, NULL_TREE);
  
-  r = cp_build_init_expr (coro_gro, get_ro);

+  r = cp_build_init_expr (coro_gro, STRIP_REFERENCE_REF (get_ro));
finish_expr_stmt (r);
tree coro_gro_cleanup
= cxx_maybe_build_cleanup (coro_gro, tf_warning_or_error);
@@ -5167,6 +5165,11 @@ cp_coroutine_transform::build_ramp_function ()
push_cleanup (coro_gro, coro_gro_cleanup, /*eh_only*/false);
  }
  
+  /* Initialize the resume_idx_var to 0, meaning "not started".  */

+  coro_build_and_push_artificial_var_with_dve
+(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
+ build_zero_cst (short_unsi

Re: [PATCH v3] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

On Mon, 19 May 2025, Patrick Palka wrote:

> Changes in v3:
>   * Use the forward_range code path for a (non-sized) bidirectional
> haystack, since it's slightly fewer increments/decrements
> overall.
>   * Fix wrong iter_difference_t cast in starts_with.
> 
> Changes in v2:
>   Addressed Tomasz's review comments, namely:
>   * Added explicit iter_difference_t casts
>   * Made _S_impl member private
>   * Optimized sized bidirectional case of ends_with
>   * Rearranged control flow of starts_with::_S_impl
> 
> Still left to do:
>   * Add tests for integer-class types
>   * Still working on a better commit description ;)
> 
> -- >8 --
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>   Define.
>   (__ends_with_fn, ends_with): Define.
>   * include/bits/version.def (ranges_starts_ends_with): Define.
>   * include/bits/version.h: Regenerate.
>   * include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
>   * src/c++23/std.cc.in (ranges::starts_with): Export.
>   (ranges::ends_with): Export.
>   * testsuite/25_algorithms/ends_with/1.cc: New test.
>   * testsuite/25_algorithms/starts_with/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 236 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/algorithm|   1 +
>  libstdc++-v3/src/c++23/std.cc.in  |   4 +
>  .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
>  .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>  7 files changed, 516 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> 
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f36e7dd59911..54646ae62f7e 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -438,6 +438,242 @@ namespace ranges
>  
>inline constexpr __search_n_fn search_n{};
>  
> +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
> +  struct __starts_with_fn
> +  {
> +template _Sent1,
> +  input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +  typename _Pred = ranges::equal_to,
> +  typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1,
> +  _Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
> +  _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> + iter_difference_t<_Iter1> __n1 = -1;
> + iter_difference_t<_Iter2> __n2 = -1;
> + if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> +   __n1 = __last1 - __first1;
> + if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> +   __n2 = __last2 - __first2;
> + return _S_impl(std::move(__first1), __last1, __n1,
> +std::move(__first2), __last2, __n2,
> +std::move(__pred),
> +std::move(__proj1), std::move(__proj2));
> +  }
> +
> +template +  typename _Pred = ranges::equal_to,
> +  typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable, 
> iterator_t<_Range2>,
> +  _Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +  _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> + range_difference_t<_Range1> __n1 = -1;
> + range_difference_t<_Range1> __n2 = -1;
> + if constexpr (sized_range<_Range1>)
> +   __n1 = ranges::size(__r1);
> + if constexpr (sized_range<_Range2>)
> +   __n2 = ranges::size(__r2);
> + return _S_impl(ranges::begin(__r1), ranges::end(__r1), __n1,
> +ranges::begin(__r2), ranges::end(__r2), __n2,
> +std::move(__pred),
> +std::move(__proj1), std::move(__proj2));
> +  }
> +
> +  private:
> +template _Sent2,
> +  typename _Pred,
> +  typename _Proj1, typename _Proj2>
> +  static constexpr bool
> +  _S_impl(_Iter1 __first1, _Sent1 __last1, iter_difference_t<_Iter1> 
> __n1,
> +   _Iter2 __first2, _Sent2 __last2, iter_difference_t<_Iter2> __n2,
> +   _Pred __pred,
> +   _Proj1 __proj1, _Proj2 __proj2)
> +  {
> + if (__first2 == __last2) [[unlikely]]
> +   return true;
> + else if (__n1 == -1 || __n2 == -1)
> +   return ranges::mismatch(std::move(__first1), __last1,
> +   std::move(__first2), __last2,
> +   std::move(__pred),
> +   std::move(__proj1), std::move

Re: [PATCH v2] c++, coroutines: Address CWG2563 return value init [PR119916].

2025-05-19 Thread Jason Merrill


On 5/16/25 10:04 AM, Iain Sandoe wrote:

+  /* We must manage the cleanups ourselves, because the responsibility for
+ them changes after the initial suspend.  However, any use of
+ cxx_maybe_build_cleanup () can set the throwing_cleanup flag.  */
+  cp_function_chain->throwing_cleanup = false;



Hmm...what if the gro cleanup throws after initializing the (different type) 
return value?  That seems like a case that we need throwing_cleanup set for.


So I moved this to the position before the g_r_o is initialized
(since we only manage cleanups of the entities that come before that, although
  that's a bit hard to see from the patch).


This will probably need reevaluation if you take my suggestion from the 
decltype patch for addressing 115908, but this is fine for now.



@@ -5245,8 +5195,11 @@ cp_coroutine_transform::build_ramp_function ()
 tree not_iarc
= build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
+  tree do_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
+   not_iarc, coro_before_return);



As with the 14 patch, this should be reversed.


Yes, the same goof was C&P to several places.

if (flag_exceptions)
  {
+  coro_before_return
+   = coro_build_and_push_artificial_var (loc, "_Coro_before_return",
+ boolean_type_node, orig_fn_decl,
+ boolean_true_node);

...

+  if (flag_exceptions)
+{
+  r = cp_build_init_expr (coro_before_return, boolean_false_node);


This should be MODIFY_EXPR, not INIT_EXPR; it got an initial value 
already in the DECL_EXPR.


Jason

Re: [PATCH] c++/modules: Always mark tinfo vars as TREE_ADDRESSABLE [PR120350]

2025-05-19 Thread Jason Merrill


On 5/19/25 9:37 AM, Nathaniel Shead wrote:

Regtested on x86_64-pc-linux-gnu (so far just modules.exp), OK for
trunk if full bootstrap+regtest succeeds?  And maybe 15?


OK for both.


-- >8 --

We need to mark type info decls as addressable if we take them by
reference; this is done by walking the declaration during parsing and
marking the decl as needed.

However, with modules we don't stream tinfo decls directly; rather we
stream just their name and type and reconstruct them in the importer
directly.  This means that any addressable flags are not propagated, and
we error because TREE_ADDRESSABLE is not set despite taking its address.

But tinfo decls should always have TREE_ADDRESSABLE set, as any attempt
to use the tinfo decl will go through build_address anyway.  So this
patch fixes the issue by eagerly marking the constructed decl as
TREE_ADDRESSABLE so that modules gets this flag correctly set as well.

PR c++/120350

gcc/cp/ChangeLog:

* rtti.c (get_tinfo_decl_direct): Mark TREE_ADDRESSABLE.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tinfo-3_a.H: New test.
* g++.dg/modules/tinfo-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/rtti.cc   | 1 +
  gcc/testsuite/g++.dg/modules/tinfo-3_a.H | 7 +++
  gcc/testsuite/g++.dg/modules/tinfo-3_b.C | 8 
  3 files changed, 16 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/tinfo-3_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tinfo-3_b.C

diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index 18bc479dc50..c06a18b3ff1 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -468,6 +468,7 @@ get_tinfo_decl_direct (tree type, tree name, int pseudo_ix)
DECL_IGNORED_P (d) = 1;
TREE_READONLY (d) = 1;
TREE_STATIC (d) = 1;
+  TREE_ADDRESSABLE (d) = 1;
/* Tell equal_address_to that different tinfo decls never
 overlap.  */
if (vec_safe_is_empty (unemitted_tinfo_decls))
diff --git a/gcc/testsuite/g++.dg/modules/tinfo-3_a.H 
b/gcc/testsuite/g++.dg/modules/tinfo-3_a.H
new file mode 100644
index 000..8b53e9848b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tinfo-3_a.H
@@ -0,0 +1,7 @@
+// PR c++/120350
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+#include 
+struct S {};
+inline const std::type_info& tinfo = typeid(S);
diff --git a/gcc/testsuite/g++.dg/modules/tinfo-3_b.C 
b/gcc/testsuite/g++.dg/modules/tinfo-3_b.C
new file mode 100644
index 000..95e02ab5c81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tinfo-3_b.C
@@ -0,0 +1,8 @@
+// PR c++/120350
+// { dg-additional-options "-fmodules" }
+
+import "tinfo-3_a.H";
+
+int main() {
+  return tinfo == typeid(int);
+}

Re: [PATCH v2] c++, coroutines: Use decltype(auto) for the g_r_o.

2025-05-19 Thread Jason Merrill


On 5/19/25 11:54 AM, Jason Merrill wrote:

On 5/16/25 10:11 AM, Iain Sandoe wrote:

Hi Jason,


+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+    = finish_decltype_type (get_ro, / 
*id_expression_or_member_access_p*/true,


This should be false, not true; a call is not an id-expr or member 
access.


fixed.


 }
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */
-  coro_build_and_push_artificial_var_with_dve
-    (loc, coro_resume_index_id, short_unsigned_type_node,  
orig_fn_decl,

- build_zero_cst (short_unsigned_type_node), deref_fp);


Moving this initialization doesn't seem connected to the type of gro, 
or mentioned above?


A fly-by tidy up.. removed.


I still see it in the new patch?


-    promise_live = true;
+    promise_life++;
 }


Please add a Promise copy constructor, whether deleted or defined so 
that promise_life is accurate if it's called.


Done.

OK for trunk now?
thanks
Iain

--- 8<--

The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references. The
test is expected to fail, since it attempts to initialize the
return object from an object that has already been destroyed.


This message doesn't quote my comment

Yes, though this is terrible, as noted in my email to core today.  
Why doesn't this also break folly::Optional/Expected?


In DM you suggested that folly works just because nobody has noticed the 
UB yet.  This makes sense to me, but is more evidence that this is a 
serious defect; the promise (et al) must not be destroyed before the 
ramp returns.


Ah, now I see the difference between the folly pattern (at least as 
reduced in the 119916 patch) and 115908 or my "eager" testcase 
(https://compiler-explorer.com/z/nb1b3od61): folly puts a pointer to the 
proxy into the promise, so return_value can store the value in the 
proxy.  So the lifetime of the promise doesn't matter, it doesn't hold 
the data.



+  template  void return_value(U u) { *value_ = u; }

...

+  OptionalPromiseReturn(OptionalPromise &p) : pointer_{p.value_} {
+pointer_ = &storage_;
+  }


I still think we should make the "eager" pattern work, but it looks like 
folly isn't actually broken.



So please xfail the test than test for the defect.

An approach to fixing this (in a later patch) might be to replace the 
new before_return with a flag in the frame such that, if set, the actor 
avoids destroying the state and instead clears the flag to indicate to 
the ramp that it still needs to destroy the state?  Then the ramp 
cleanup would no longer be EH-only or controlled by IARC, just 
controlled by this flag.  Thoughts?



gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc   | 22 ---
  gcc/testsuite/g++.dg/coroutines/pr115908.C | 74 +++---
  2 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 6169a81cea5..bd51c31e615 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -5120,8 +5120,11 @@ cp_coroutine_transform::build_ramp_function ()
    /* Check for a bad get return object type.
   [dcl.fct.def.coroutine] / 7 requires:
   The expression promise.get_return_object() is used to 
initialize the

- returned reference or prvalue result object ... */
-  tree gro_type = TREE_TYPE (get_ro);
+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+    = finish_decltype_type (get_ro, / 
*id_expression_or_member_access_p*/false,

+    tf_warning_or_error); // TREE_TYPE (get_ro);


Let's drop this comment, it looks vestigial.


    if (VOID_TYPE_P (gro_type) && !void_ramp_p)
  {
    error_at (fn_start, "no viable conversion from % 
provided by"

@@ -5129,11 +5132,6 @@ cp_coroutine_transform::build_ramp_function ()
    return false;
  }
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */
-  coro_build_and_push_artificial_var_with_dve
-    (loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
- build_zero_cst (short_unsigned_type_node), deref_fp);
-
    /* We must manage the cleanups ourselves, with the exception of 
the g_r_o,
   because the responsibility for them changes after the initial 
suspend.
   However, any use of cxx_maybe_build_cleanup () in preceding 
code can

@@ -5159,7 +5157,7 @@ cp_coroutine_transform::build_ramp_function ()
  = coro_build_and_push_artificial_var (loc, "_Coro_gro", gro_type,
    orig_fn_decl, NULL_TREE);
-

Re: [PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-19 Thread Patrick Palka

On Mon, 19 May 2025, Tomasz Kaminski wrote:

> 
> 
> On Mon, May 19, 2025 at 9:59 AM Tomasz Kaminski  wrote:
> 
> 
> On Mon, May 19, 2025 at 6:47 AM Patrick Palka  wrote:
> I would appreciate a short explanation on the approach being put here, 
> in the message. Like passing -1 as means of saying, size not know.
> 
>   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> From the non-stylistic changes, I have noticed that we need some explicit 
> conversion between different types.
> I think we should add test that would check with size being __max_diff_t, 
> maybe we should add such a range to
> testutils_iterators.
> Rest of the of comments are mostly stylistic.
> 
> 
>   -- >8 --
> 
>   libstdc++-v3/ChangeLog:
> 
>           * include/bits/ranges_algo.h (__starts_with_fn, starts_with):
>           Define.
>           (__ends_with_fn, ends_with): Define.
>           * include/bits/version.def (ranges_starts_ends_with): Define.
>           * include/bits/version.h: Regenerate.
>           * include/std/algorithm: Provide 
> __cpp_lib_ranges_starts_ends_with.
>           * src/c++23/std.cc.in (ranges::starts_with): Export.
>           (ranges::ends_with): Export.
>           * testsuite/25_algorithms/ends_with/1.cc: New test.
>           * testsuite/25_algorithms/starts_with/1.cc: New test.
>   ---
>    libstdc++-v3/include/bits/ranges_algo.h       | 232 ++
>    libstdc++-v3/include/bits/version.def         |   8 +
>    libstdc++-v3/include/bits/version.h           |  10 +
>    libstdc++-v3/include/std/algorithm            |   1 +
>    libstdc++-v3/src/c++23/std.cc.in              |   4 +
>    .../testsuite/25_algorithms/ends_with/1.cc    | 129 ++
>    .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
>    7 files changed, 512 insertions(+)
>    create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
>    create mode 100644 
> libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc
> 
>   diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
>   index f36e7dd59911..c59a555f528a 100644
>   --- a/libstdc++-v3/include/bits/ranges_algo.h
>   +++ b/libstdc++-v3/include/bits/ranges_algo.h
>   @@ -438,6 +438,238 @@ namespace ranges
> 
>      inline constexpr __search_n_fn search_n{};
> 
>   +#if __glibcxx_ranges_starts_ends_with // C++ >= 23
>   +  struct __starts_with_fn
>   +  {
>   +    template _Sent1,
>   +            input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>   +            typename _Pred = ranges::equal_to,
>   +            typename _Proj1 = identity, typename _Proj2 = identity>
>   +      requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, 
> _Proj2>
>   +      constexpr bool
>   +      operator()(_Iter1 __first1, _Sent1 __last1,
>   +                _Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
>   +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
>   +      {
> 
> We could check __first == __last2 early here, before even computing size.

Hmm, computing the size should be very cheap though?  I'm not sure this
first would be worthwhile since an empty needle is a very uncommon case.

> +       iter_difference_t<_Iter1> __n1 = -1;
> +       iter_difference_t<_Iter2> __n2 = -1;
> +       if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
> +         __n1 = __last1 - __first1;
> +       if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
> +         __n2 = __last2 - __first2;
> +       return _S_impl(std::move(__first1), __last1,
> +                      std::move(__first2), __last2,
> +                      std::move(__pred),
> +                      std::move(__proj1), std::move(__proj2),
> +                      __n1, __n2);
> +      }
> +
> +    template +            typename _Pred = ranges::equal_to,
> +            typename _Proj1 = identity, typename _Proj2 = 
> identity>
> +      requires indirectly_comparable, 
> iterator_t<_Range2>,
> +                                    _Pred, _Proj1, _Proj2>
> +      constexpr bool
> +      operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = 
> {},
> +                _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +      {
> 
> Similar here:
> auto __first2 = ranges::begin(__r2);
> auto __last2 = ranges::end(__r2);
> if (__first2 == __last2)
>   return true;
> 
> And then move __first2 into the _S_impl call.
>  
> +       range_difference_t<_Range1> __n1 = -1;
> +       range_difference_t<_Range1> __n2 = -1;
> +       if constexpr (sized_range<_Range1>

[PATCH] arm: fully validate mem_noofs_operand [PR120351]

2025-05-19 Thread Richard Earnshaw

It's not enough to just check that a memory operand is of the form
mem(reg); after RA we also need to validate the register being used.
The safest way to do this is to call memory_operand.

PR target/120351

gcc/ChangeLog:

* config/arm/predicates.md (mem_noofs_operand): Also check the op
is a valid memory_operand.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr120351.c: New test.
---
 gcc/config/arm/predicates.md|  3 +-
 gcc/testsuite/gcc.target/arm/pr120351.c | 47 +
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr120351.c

diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 57d4ec66088..c683ec2c607 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -901,7 +901,8 @@ (define_special_predicate "add_operator"
 
 (define_predicate "mem_noofs_operand"
   (and (match_code "mem")
-   (match_code "reg" "0")))
+   (match_code "reg" "0")
+   (match_operand 0 "memory_operand")))
 
 (define_predicate "call_insn_operand"
   (ior (and (match_code "symbol_ref")
diff --git a/gcc/testsuite/gcc.target/arm/pr120351.c 
b/gcc/testsuite/gcc.target/arm/pr120351.c
new file mode 100644
index 000..d8e9d73275c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr120351.c
@@ -0,0 +1,47 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O2" } */
+
+
+typedef struct A
+{
+  int f1;
+} A;
+
+__inline void ref (A* x)
+{
+  __atomic_fetch_add(&x->f1, 1, 0);
+}
+
+typedef struct B
+{
+  A *d;
+  int *ptr;
+} B;
+
+void insertOne (B*, B*);
+
+void init (B *);
+__inline void copy (B *p, B *q)
+{
+  p->d  = q->d;
+  p->ptr = q->ptr;
+  ref (p->d);
+}
+
+__inline void emplace(B* x)
+{
+  B dummy;
+  B _tmp;
+  init (&dummy);
+  copy (&_tmp, &dummy);
+  insertOne(x, &_tmp);
+}
+
+void testing ()
+{
+  B test;
+  init (&test);
+  emplace(&test);
+}
-- 
2.43.0

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-19 Thread Richard Sandiford

Kugan Vivekanandarajah  writes:
> diff --git a/Makefile.in b/Makefile.in
> index b1ed67d3d4f..b5e3e520791 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -4271,7 +4271,7 @@ all-stageautoprofile-bfd: configure-stageautoprofile-bfd
>   $(HOST_EXPORTS) \
>   $(POSTSTAGE1_HOST_EXPORTS)  \
>   cd $(HOST_SUBDIR)/bfd && \
> - $$s/gcc/config/i386/$(AUTO_PROFILE) \
> + $$s/gcc/config/@cpu_type@/$(AUTO_PROFILE) \
>   $(MAKE) $(BASE_FLAGS_TO_PASS) \
>   CFLAGS="$(STAGEautoprofile_CFLAGS)" \
>   GENERATOR_CFLAGS="$(STAGEautoprofile_GENERATOR_CFLAGS)" \

The usual style seems to be to assign @foo@ to a makefile variable
called foo or FOO, rather than to use @foo@ directly in rules.  Otherwise
the makefile stuff looks good.

I don't feel qualified to review the script, but some general shell stuff:

> diff --git a/gcc/config/aarch64/gcc-auto-profile 
> b/gcc/config/aarch64/gcc-auto-profile
> new file mode 100755
> index 000..0ceec035e69
> --- /dev/null
> +++ b/gcc/config/aarch64/gcc-auto-profile
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +# Profile workload for gcc profile feedback (autofdo) using Linux perf.
> +# Copyright The GNU Toolchain Authors.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .  */
> +
> +# Run perf record with branch stack sampling and check for
> +# specific error message to see if it is supported.
> +use_brbe=true
> +output=$(perf record -j any,u ls 2>&1)

How about using /bin/true rather than ls for the test program?

> +if [[ "$output" = *"Error::P: PMU Hardware or event type doesn't support 
> branch stack sampling."* ]]; then

[[ isn't POSIX, or at least dash doesn't accept it.  Since this script
is effectively linux-specific, we can probably assume that /bin/bash
exists and use that in the #! line.

If we use bash, then the test could use =~ rather than an exact match.
This could be useful if perf prints other diagnostics besides the
one being tested for, or if future versions of perf alter the wording
slightly.

> +  use_brbe=false
> +fi
> +
> +FLAGS=u
> +if [ "$1" = "--kernel" ] ; then
> +  FLAGS=k
> +  shift
> +fi
> +if [ "$1" = "--all" ] ; then

How about making this an elif, so that we don't accept --kernel --all?

> +  FLAGS=u,k
> +  shift
> +fi
> +
> +if [ "$use_brbe" = true ] ; then
> +  if grep -q hypervisor /proc/cpuinfo ; then
> +echo >&2 "Warning: branch profiling may not be functional in VMs"
> +  fi
> +  set -x
> +  perf record -j any,$FLAGS "$@"
> +  set +x
> +else
> +  set -x
> +  echo >&2 "Warning: branch profiling may not be functional without BRBE"
> +  perf record "$@"
> +  set +x

Putting the set -x after the echo seems better, as for the "then" branch.

Thanks,
Richard

> +fi

Re: [PATCH] Fortran: fix FAIL of gfortran.dg/specifics_1.f90 after r16-372 [PR120099]

2025-05-19 Thread Harald Anlauf


Pushed as r16-734-gbf98b735ae01c6 after an off-list OK by Jerry,
and no other responses to the opposite.

Harald

On 5/18/25 22:53, Harald Anlauf wrote:

Dear all,

the attached proposed patch fixes PR120099 by modifying
gfc_return_by_reference so that it returns true with -ff2c
also for intrinsics returning complex numbers, as these are
not pure in the GCC IR sense, and wrapper functions for the
intrinsics were optimized out by DCE.

The change only affects compilation with -ff2c, so I guess
there will be only few people able to test any performance
impact on real-world code...

Regtested on x86_64-pc-linux-gnu with testcases only that
use the -ff2c flag explicitly.

OK for mainline?

Thanks,
Harald

[PUSHED] aarch64: Fix an oversight in aarch64_evpc_reencode

2025-05-19 Thread Pengxuan Zheng

Some fields (e.g., zero_op0_p and zero_op1_p) of the struct "newd" may be left
uninitialized in aarch64_evpc_reencode. This can cause reading of uninitialized
data. I found this oversight when testing my patches on and/fmov
optimizations. This patch fixes the bug by zero initializing the struct.

Pushed as obvious after bootstrap/test on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_evpc_reencode): Zero initialize
newd.
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2371541ef1b..c067e099d83 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26246,7 +26246,7 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
 static bool
 aarch64_evpc_reencode (struct expand_vec_perm_d *d)
 {
-  expand_vec_perm_d newd;
+  expand_vec_perm_d newd = {};
 
   /* The subregs that we'd create are not supported for big-endian SVE;
  see aarch64_modes_compatible_p for details.  */
-- 
2.17.1

[PATCH] libgcc: Small bitint_reduce_prec big-endian fixes

2025-05-19 Thread Jakub Jelinek

Hi!

The big-endian _BitInt support in libgcc was written without any
testing and so I haven't discovered I've made one mistake in it
(in multiple places).
The bitint_reduce_prec function attempts to optimize inputs
which have some larger precision but at runtime they are found
to need smaller number of limbs.
For little-endian that is handled just by returning smaller
precision (or negative precision for signed), but for
big-endian we need to adjust the passed in limb pointer so that
when it returns smaller precision the argument still contains
the least significant limbs for the returned precision.

Bootstrapped/regtested on x86_64-linux and i686-linux (where it
doesn't do anything) and tested with all the _BitInt related
tests on s390x-linux, ok for trunk?

2025-05-19  Jakub Jelinek  

* libgcc2.c (bitint_reduce_prec): For big endian
__LIBGCC_BITINT_ORDER__ use ++*p and --*p instead of
++p and --p.
* soft-fp/bitint.h (bitint_reduce_prec): Likewise.

--- libgcc/libgcc2.c.jj 2025-04-08 14:09:53.632413447 +0200
+++ libgcc/libgcc2.c2025-05-14 17:16:48.642879943 +0200
@@ -1333,7 +1333,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec >= -1)
return -2;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -1347,7 +1347,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec >= -1)
return -2;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -1358,7 +1358,7 @@ bitint_reduce_prec (const UBILtype **p,
  if ((Wtype) mslimb >= 0)
{
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- --p;
+ --*p;
 #endif
  return prec - 1;
}
@@ -1387,7 +1387,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec == 0)
return 1;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -1400,7 +1400,7 @@ bitint_reduce_prec (const UBILtype **p,
   if (prec == 0)
return 1;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
-  ++p;
+  ++*p;
 #else
   --i;
 #endif
--- libgcc/soft-fp/bitint.h.jj  2024-02-13 10:32:57.730666010 +0100
+++ libgcc/soft-fp/bitint.h 2025-05-14 17:17:00.418723808 +0200
@@ -76,7 +76,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec >= -1)
return -2;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -90,7 +90,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec >= -1)
return -2;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -101,7 +101,7 @@ bitint_reduce_prec (const UBILtype **p,
  if ((BILtype) mslimb >= 0)
{
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- --p;
+ --*p;
 #endif
  return prec - 1;
}
@@ -130,7 +130,7 @@ bitint_reduce_prec (const UBILtype **p,
  if (prec == 0)
return 1;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
- ++p;
+ ++*p;
 #else
  --i;
 #endif
@@ -143,7 +143,7 @@ bitint_reduce_prec (const UBILtype **p,
   if (prec == 0)
return 1;
 #if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
-  ++p;
+  ++*p;
 #else
   --i;
 #endif

Jakub

[PATCH] tree-chrec: Use signed_type_for in convert_affine_scev

2025-05-19 Thread Jakub Jelinek

Hi!

On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.

The patch fixes it similarly what has been done for GCC 14 in various
other spots.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-05-19  Jakub Jelinek  

* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.

--- gcc/tree-chrec.cc.jj2025-04-08 14:09:24.743815607 +0200
+++ gcc/tree-chrec.cc   2025-05-15 16:06:05.383354229 +0200
@@ -1490,7 +1490,7 @@ convert_affine_scev (class loop *loop, t
   new_step = *step;
   if (TYPE_PRECISION (step_type) > TYPE_PRECISION (ct) && TYPE_UNSIGNED (ct))
 {
-  tree signed_ct = build_nonstandard_integer_type (TYPE_PRECISION (ct), 0);
+  tree signed_ct = signed_type_for (ct);
   new_step = chrec_convert (signed_ct, new_step, at_stmt,
 use_overflow_semantics);
 }

Jakub

[committed][RISC-V] Fix false positive from Wuninitialized

2025-05-19 Thread Jeff Law

As Mark and I independently tripped, there's a Wuninitialized issue in 
the RISC-V backend.  While *I* know the value would always be properly 
initialized, it'd be somewhat painful to either eliminate the infeasible 
paths or do deep enough analysis to suppress the false positive.


So this initializes OUTPUT and verifies it's got a reasonable value 
before using it for the final copy into operands[0].


Bootstrapped on the BPI (regression testing still has ~12hrs to go).

Pushing to the trunk.

Jeff
commit cbc258cd318756db8b5f0e4055dd8f1c1d618d22
Author: Jeff Law 
Date:   Mon May 19 12:00:56 2025 -0600

[RISC-V] Fix false positive from Wuninitialized

As Mark and I independently tripped, there's a Wuninitialized issue in the
RISC-V backend.  While *I* know the value would always be properly 
initialized,
it'd be somewhat painful to either eliminate the infeasible paths or do deep
enough analysis to suppress the false positive.

So this initializes OUTPUT and verifies it's got a reasonable value before
using it for the final copy into operands[0].

Bootstrapped on the BPI (regression testing still has ~12hrs to go).

gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): Initialize OUTPUT and
verify it's non-null before emitting the final copy insn.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 54395b8d3a7..0b10842d176 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -14429,7 +14429,7 @@ synthesize_ior_xor (rtx_code code, rtx operands[3])
   /* Synthesis is better than loading the constant.  */
   ival = INTVAL (operands[2]);
   rtx input = operands[1];
-  rtx output;
+  rtx output = NULL_RTX;
 
   /* Emit the [x]ori insn that sets the low 11 bits into
  the proper state.  */
@@ -14458,6 +14458,8 @@ synthesize_ior_xor (rtx_code code, rtx operands[3])
   input = output;
   ival &= ~tmpval;
 }
+
+  gcc_assert (output);
   emit_move_insn (operands[0], output);
   return true;
 }

Re: [to-be-committed][RISC-V] Avoid setting output object more than once in IOR/XOR synthesis

2025-05-19 Thread Jeff Law





On 5/18/25 8:53 AM, Mark Wielaard wrote:



Should output here be initialized to NULL_RTX?
Yea, plus an assert to verify it's sensible after the two blobs of code 
that emit the [x]ori + bclr/binv sequences.  I've pushed a fix to the trunk.


jeff

[PUSHED] Fix libgomp.oacc-fortran/lib-13.f90 async bug

2025-05-19 Thread Thomas Schwinge

From: Julian Brown 

libgomp/
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
wait API calls.
---
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
index deb2c288604..f6bd27a4701 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -19,11 +19,10 @@ program main
 end do
   !$acc end parallel
 end do
-  !$acc end data
 
   call acc_wait_all_async (nprocs + 1)
-
   call acc_wait (nprocs + 1)
+  !$acc end data
 
   if (acc_async_test (1) .neqv. .TRUE.) stop 1
   if (acc_async_test (2) .neqv. .TRUE.) stop 2
-- 
2.34.1

[PUSHED] Add 'libgomp.c-c++-common/target-abi-struct-1-O0.c', 'libgomp.oacc-c-c++-common/abi-struct-1.c'

2025-05-19 Thread Thomas Schwinge

libgomp/
* testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c: New.
* testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c: Likewise.
---
 .../target-abi-struct-1-O0.c  |  3 +
 .../libgomp.oacc-c-c++-common/abi-struct-1.c  | 96 +++
 2 files changed, 99 insertions(+)
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c 
b/libgomp/testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c
new file mode 100644
index 000..35ec75d648d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c
@@ -0,0 +1,3 @@
+/* { dg-additional-options -O0 } */
+
+#include "../libgomp.oacc-c-c++-common/abi-struct-1.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c
new file mode 100644
index 000..379e9fd3a97
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c
@@ -0,0 +1,96 @@
+/* Inspired by 'gcc.target/nvptx/abi-struct-arg.c', 
'gcc.target/nvptx/abi-struct-ret.c'.  */
+
+/* See also '../libgomp.c-c++-common/target-abi-struct-1-O0.c'.  */
+
+typedef struct {char a;} schar;
+typedef struct {short a;} sshort;
+typedef struct {int a;} sint;
+typedef struct {long long a;} slonglong;
+typedef struct {int a, b[12];} sint_13;
+
+#pragma omp declare target
+
+#define M(T) ({T t; t.a = sizeof t; t;})
+
+#pragma acc routine
+static schar rschar(void)
+{
+  return M(schar);
+}
+
+#pragma acc routine
+static sshort rsshort(void)
+{
+  return M(sshort);
+}
+
+#pragma acc routine
+static sint rsint(void)
+{
+  return M(sint);
+}
+
+#pragma acc routine
+static slonglong rslonglong(void)
+{
+  return M(slonglong);
+}
+
+#pragma acc routine
+static sint_13 rsint_13(void)
+{
+  return M(sint_13);
+}
+
+#pragma acc routine
+static void aschar(schar schar)
+{
+  if (schar.a != sizeof (char))
+__builtin_abort();
+}
+
+#pragma acc routine
+static void asshort(sshort sshort)
+{
+  if (sshort.a != sizeof (short))
+__builtin_abort();
+}
+
+#pragma acc routine
+static void asint(sint sint)
+{
+  if (sint.a != sizeof (int))
+__builtin_abort();
+}
+
+#pragma acc routine
+static void aslonglong(slonglong slonglong)
+{
+  if (slonglong.a != sizeof (long long))
+__builtin_abort();
+}
+
+#pragma acc routine
+static void asint_13(sint_13 sint_13)
+{
+  if (sint_13.a != (sizeof (int) * 13))
+__builtin_abort();
+}
+
+#pragma omp end declare target
+
+int main()
+{
+#pragma omp target
+#pragma acc serial
+  /* { dg-bogus {using 'vector_length \(32\)', ignoring 1} {} { target 
openacc_nvidia_accel_selected xfail *-*-* } .-1 } */
+  {
+aschar(rschar());
+asshort(rsshort());
+asint(rsint());
+aslonglong(rslonglong());
+asint_13(rsint_13());
+  }
+
+  return 0;
+}
-- 
2.34.1

Re: [PATCH v6 0/3][Middle-end]Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-19 Thread Kees Cook

On Fri, May 16, 2025 at 01:34:14PM +, Qing Zhao wrote:
> Adding -fdiagnotics-details into GCC to provide more hints to the
> end users on how the warnings come from, in order to help the user
> to locate the exact location in source code on the specific warnings
> due to compiler optimizations.

I just needed to examine an unexpected -Wrestrict warning, and
discovered that this patch didn't help with it, but in looking at the
implementation details, it turned out to be trivial to expand coverage
to include -Wrestrict, which worked for me, and got me the
diagnostics I needed[1].

Could you include this patch in the next version of the series too? I'll
put it to use! :)

-Kees

[1] https://lore.kernel.org/all/202505191117.C094A90F88@keescook/


diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
index a52307866cc4..0f6eddf01e4e 100644
--- a/gcc/gimple-ssa-warn-restrict.cc
+++ b/gcc/gimple-ssa-warn-restrict.cc
@@ -1448,6 +1448,8 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
 
   tree func = gimple_call_fndecl (call);
 
+  rich_location_with_details richloc (loc, call);
+
   /* To avoid a combinatorial explosion of diagnostics format the offsets
  or their ranges as strings and use them in the warning calls below.  */
   char offstr[3][64];
@@ -1493,7 +1495,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
   if (sizrange[0] == sizrange[1])
{
  if (ovlsiz[0] == ovlsiz[1])
-   warning_at (loc, OPT_Wrestrict,
+   warning_at (&richloc, OPT_Wrestrict,
sizrange[0] == 1
? (ovlsiz[0] == 1
   ? G_("%qD accessing %wu byte at offsets %s "
@@ -1510,7 +1512,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
func, sizrange[0],
offstr[0], offstr[1], ovlsiz[0], offstr[2]);
  else if (ovlsiz[1] >= 0 && ovlsiz[1] < maxobjsize.to_shwi ())
-   warning_n (loc, OPT_Wrestrict, sizrange[0],
+   warning_n (&richloc, OPT_Wrestrict, sizrange[0],
   "%qD accessing %wu byte at offsets %s "
   "and %s overlaps between %wu and %wu bytes "
   "at offset %s",
@@ -1520,7 +1522,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
   func, sizrange[0], offstr[0], offstr[1],
   ovlsiz[0], ovlsiz[1], offstr[2]);
  else
-   warning_n (loc, OPT_Wrestrict, sizrange[0],
+   warning_n (&richloc, OPT_Wrestrict, sizrange[0],
   "%qD accessing %wu byte at offsets %s and "
   "%s overlaps %wu or more bytes at offset %s",
   "%qD accessing %wu bytes at offsets %s and "
@@ -1533,7 +1535,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
   if (sizrange[1] >= 0 && sizrange[1] < maxobjsize.to_shwi ())
{
  if (ovlsiz[0] == ovlsiz[1])
-   warning_n (loc, OPT_Wrestrict, ovlsiz[0],
+   warning_n (&richloc, OPT_Wrestrict, ovlsiz[0],
   "%qD accessing between %wu and %wu bytes "
   "at offsets %s and %s overlaps %wu byte at "
   "offset %s",
@@ -1543,7 +1545,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
   func, sizrange[0], sizrange[1],
   offstr[0], offstr[1], ovlsiz[0], offstr[2]);
  else if (ovlsiz[1] >= 0 && ovlsiz[1] < maxobjsize.to_shwi ())
-   warning_at (loc, OPT_Wrestrict,
+   warning_at (&richloc, OPT_Wrestrict,
"%qD accessing between %wu and %wu bytes at "
"offsets %s and %s overlaps between %wu and %wu "
"bytes at offset %s",
@@ -1551,7 +1553,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
offstr[0], offstr[1], ovlsiz[0], ovlsiz[1],
offstr[2]);
  else
-   warning_at (loc, OPT_Wrestrict,
+   warning_at (&richloc, OPT_Wrestrict,
"%qD accessing between %wu and %wu bytes at "
"offsets %s and %s overlaps %wu or more bytes "
"at offset %s",
@@ -1564,7 +1566,7 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)
ovlsiz[1] = maxobjsize.to_shwi ();
 
   if (ovlsiz[0] == ovlsiz[1])
-   warning_n (loc, OPT_Wrestrict, ovlsiz[0],
+   warning_n (&richloc, OPT_Wrestrict, ovlsiz[0],
   "%qD accessing %wu or more bytes at offsets "
   "%s and %s overlaps %wu byte at offset %s",
   "%qD accessing %wu or more bytes at offsets "
@@ -1572,14 +1574,14 @@ maybe_diag_overlap (location_t loc, gimple *call, 
builtin_access &acs)

[PUSHED] 'TYPE_EMPTY_P' vs. code offloading [PR120308]

2025-05-19 Thread Thomas Schwinge

We've got 'gcc/stor-layout.cc:finalize_type_size':

/* Handle empty records as per the x86-64 psABI.  */
TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);

(Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P',
calling 'gcc/tree.cc-default_is_empty_record'.)

And so it happens that for an empty struct used in code offloaded from x86_64
host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in
offloading compilation (where the offload targets (currently?) don't use it
themselves, and therefore aren't prepared to handle it).

For nvptx offloading compilation, this causes wrong code generation:
'ptxas [...] error : Call has wrong number of parameters', as nvptx code
generation for function definition doesn't pay attention to this flag (say, in
'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be
appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call'
via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus
get mismatching function definition vs. function call.

This issue apparently isn't a problem for GCN offloading, but I don't know if
that's by design or by accident.

Richard Biener:
> It looks like TYPE_EMPTY_P is only used during RTL expansion for ABI
> purposes, so computing it during layout_type is premature as shown here.
>
> I would suggest to simply re-compute it at offload stream-in time.

(For avoidance of doubt, the additions to 'gcc.target/nvptx/abi-struct-arg.c',
'gcc.target/nvptx/abi-struct-ret.c' are not dependent on the offload streaming
code changes, but are just to mirror the changes to
'libgomp.oacc-c-c++-common/abi-struct-1.c'.)

PR lto/120308
gcc/
* lto-streamer-out.cc (hash_tree): Don't handle 'TYPE_EMPTY_P' for
'lto_stream_offload_p'.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c: Add empty
structure testing.
gcc/testsuite/
* gcc.target/nvptx/abi-struct-arg.c: Add empty structure testing.
* gcc.target/nvptx/abi-struct-ret.c: Likewise.
---
 gcc/lto-streamer-out.cc   |  3 ++-
 .../gcc.target/nvptx/abi-struct-arg.c | 10 
 .../gcc.target/nvptx/abi-struct-ret.c | 11 
 gcc/tree-streamer-in.cc   | 12 -
 gcc/tree-streamer-out.cc  |  3 ++-
 .../libgomp.oacc-c-c++-common/abi-struct-1.c  | 25 +++
 6 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 86d338461c0..308ab3416b3 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -1398,7 +1398,8 @@ hash_tree (struct streamer_tree_cache_d *cache, 
hash_map *map,
   hstate.commit_flag ();
   hstate.add_int (TYPE_PRECISION_RAW (t));
   hstate.add_int (TYPE_ALIGN (t));
-  hstate.add_int (TYPE_EMPTY_P (t));
+  if (!lto_stream_offload_p)
+   hstate.add_int (TYPE_EMPTY_P (t));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_TRANSLATION_UNIT_DECL))
diff --git a/gcc/testsuite/gcc.target/nvptx/abi-struct-arg.c 
b/gcc/testsuite/gcc.target/nvptx/abi-struct-arg.c
index 54ae651dcca..c2cc4de115e 100644
--- a/gcc/testsuite/gcc.target/nvptx/abi-struct-arg.c
+++ b/gcc/testsuite/gcc.target/nvptx/abi-struct-arg.c
@@ -3,12 +3,16 @@
 
 /* Struct arg.  Passed via pointer.  */
 
+typedef struct {} empty;  /* See 'gcc/doc/extend.texi', "Empty Structures".  */
 typedef struct {char a;} one;
 typedef struct {short a;} two;
 typedef struct {int a;} four;
 typedef struct {long long a;} eight;
 typedef struct {int a, b[12];} big;
 
+/* { dg-final { scan-assembler-times ".extern .func dcl_aempty \\(.param.u64 
%\[_a-z0-9\]*\\);" 1 } } */
+void dcl_aempty (empty);
+
 /* { dg-final { scan-assembler-times ".extern .func dcl_aone \\(.param.u64 
%\[_a-z0-9\]*\\);" 1 } } */
 void dcl_aone (one);
 
@@ -28,6 +32,7 @@ void dcl_abig (big);
 
 void test_1 (void)
 {
+  dcl_aempty (({empty t; t;}));
   dcl_aone (M (one, 1));
   dcl_atwo (M (two, 2));
   dcl_afour (M (four, 3));
@@ -35,6 +40,11 @@ void test_1 (void)
   dcl_abig (M (big, 5));
 }
 
+/* { dg-final { scan-assembler-times ".visible .func dfn_aempty \\(.param.u64 
%\[_a-z0-9\]*\\)(?:;|\[\r\n\]+\{)" 2 } } */
+void dfn_aempty (empty empty)
+{
+}
+
 /* { dg-final { scan-assembler-times ".visible .func dfn_aone \\(.param.u64 
%\[_a-z0-9\]*\\)(?:;|\[\r\n\]+\{)" 2 } } */
 void dfn_aone (one one)
 {
diff --git a/gcc/testsuite/gcc.target/nvptx/abi-struct-ret.c 
b/gcc/testsuite/gcc.target/nvptx/abi-struct-ret.c
index d48a82d26ce..13e50212dc3 100644
--- a/gcc/testsuite/gcc.target/nvptx/abi-struct-ret.c
+++ b/gcc/testsuite/gcc.target/nvptx/abi-struct-ret.c
@@ -3,12 +3,16 @@
 
 /* Struct return.  Returned via pointer.  */
 
+typedef struct {} empty;

Re: [patch, fortran] PR120049 - ICE when using IS_C_ASSOCIATED ()

2025-05-19 Thread Harald Anlauf


Hi Jerry,

so contrary to what the name of patch claims (pr120049-final.diff),
it fixes only the case of direct use of iso_c_binding, but not the
indirect one thru the other module, which is the reason for the
original ICE and the PR.

So if you want to push the incremental patch now, go ahead.

Cheers,
Harald


Am 18.05.25 um 23:46 schrieb Jerry D:

On 5/18/25 2:34 PM, Jerry D wrote:

On 5/18/25 2:10 PM, Harald Anlauf wrote:

Hi Jerry,

I found 2 corner invalid cases which are silently accepted with
your patch when iso_c_binding is used indirectly:

   print *, c_associated(c_loc(val), C_NULL_FUNPTR)
   print *, c_associated(C_NULL_FUNPTR, c_loc(val))

These should get rejected, too.  Can you see how to catch these, too?

Thanks,
Harald


Yes, will do! I try to think of cases to run through on. This helps.

Thanks,

Jerry
--- snip ---


Will this was easy.  I added those two lines to my current test2.f90 and 
they are rejected.  I will update the testcase in the ready to commit copy.


OK to push then?

$ gfc test2.f90
test2.f90:46:36:

    46 |   print *, c_associated(c_loc(val), C_NULL_FUNPTR)
   |    1
Error: Argument C_PTR_2 at (1) to C_ASSOCIATED shall have the same type 
as C_PTR_1: TYPE(c_ptr) instead of TYPE(c_funptr)

test2.f90:47:39:

    47 |   print *, c_associated(C_NULL_FUNPTR, c_loc(val))
   |   1
Error: Argument C_PTR_2 at (1) to C_ASSOCIATED shall have the same type 
as C_PTR_1: TYPE(c_funptr) instead of TYPE(c_ptr)

Re: [PATCH] c++/modules: Ensure vtables are emitted when needed [PR120349]

2025-05-19 Thread Jason Merrill


On 5/19/25 8:27 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Alternatively could go back to always marking vtables as DECL_EXTERNAL
as well but that doesn't seem to be necessary that I can tell.


DECL_NOT_REALLY_EXTERN doesn't make sense without DECL_EXTERNAL.  The 
antiquated linkage handling in the front end marks everything that might 
or might not get emitted with those two flags (and DECL_DEFER_OUTPUT for 
functions).  I'd like to simplify that, it's been excessive since the 
introduction of cgraph, but for now we should be marking vtables as 
DECL_EXTERNAL.



-- >8 --

I missed a testcase in r16-688-gc875748cdc468e for whether a GM vtable
should be emitted in an importer when it has no non-inline key function.
Before that patch the code worked because always we marked all vtables
as DECL_EXTERNAL, which then meant that reading the definition marked
them as DECL_NOT_REALLY_EXTERN.

But it seems to me that really, all vtables should just be considered
DECL_NOT_REALLY_EXTERN until processed by maybe_emit_vtables (this is
how the frontend seems to behave in general); this patch makes that
adjustment.

PR c++/120349

gcc/cp/ChangeLog:

* module.cc (trees_in::read_var_def): Always mark vtables as
DECL_NOT_REALLY_EXTERN.

gcc/testsuite/ChangeLog:

* g++.dg/modules/vtt-3_a.C: New test.
* g++.dg/modules/vtt-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   |  2 +-
  gcc/testsuite/g++.dg/modules/vtt-3_a.C | 29 ++
  gcc/testsuite/g++.dg/modules/vtt-3_b.C | 14 +
  3 files changed, 44 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/vtt-3_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/vtt-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 200e1c2deb3..d860940caa4 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12781,7 +12781,7 @@ trees_in::read_var_def (tree decl, tree maybe_template)
if (installing)
  {
DECL_INITIAL (decl) = init;
-  if (DECL_EXTERNAL (decl))
+  if (DECL_EXTERNAL (decl) || vtable)
DECL_NOT_REALLY_EXTERN (decl) = true;
if (VAR_P (decl))
{
diff --git a/gcc/testsuite/g++.dg/modules/vtt-3_a.C 
b/gcc/testsuite/g++.dg/modules/vtt-3_a.C
new file mode 100644
index 000..f38f024ba1f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/vtt-3_a.C
@@ -0,0 +1,29 @@
+// PR c++/120349
+// { dg-additional-options "-fmodules -Wno-global-module" }
+// { dg-module-cmi M }
+
+module;
+
+// GMF types; should have vtables emitted in importers
+struct BGG {
+  virtual inline ~BGG() {}
+};
+struct BGM {
+  virtual inline ~BGM() {}
+};
+struct DGG : BGG {};
+
+export module M;
+
+export using ::DGG;
+
+// Module-local types; should have vtables emitted here
+struct BM {
+  virtual inline ~BM() {}
+};
+export struct DGM : BGM {};  // note: this emits BGM's vtable here too
+export struct DM : BM {};
+
+// { dg-final { scan-assembler-not "_ZTV3BGG:" } }
+// { dg-final { scan-assembler "_ZTV3BGM:" } }
+// { dg-final { scan-assembler "_ZTVW1M2BM:" } }
diff --git a/gcc/testsuite/g++.dg/modules/vtt-3_b.C 
b/gcc/testsuite/g++.dg/modules/vtt-3_b.C
new file mode 100644
index 000..ef7ae6ca4e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/vtt-3_b.C
@@ -0,0 +1,14 @@
+// PR c++/120349
+// { dg-additional-options "-fmodules -Wno-global-module" }
+
+import M;
+
+int main() {
+  DGG dgg;
+  DGM dgm;
+  DM dm;
+}
+
+// { dg-final { scan-assembler "_ZTV3BGG:" } }
+// { dg-final { scan-assembler "_ZTV3BGM:" } }
+// { dg-final { scan-assembler-not "_ZTVW1M2BM:" } }

Re: [PATCH] c++/modules: Fix ICE on merge of instantiation with partial spec [PR120013]

2025-05-19 Thread Jason Merrill


On 5/17/25 10:38 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?


OK.


-- >8 --

When we import a pending instantiation that matches an existing partial
specialisation, we don't find the slot in the entity map because for
partial specialisations we register the TEMPLATE_DECL but for normal
implicit instantiations we instead register the inner TYPE_DECL.

Because the DECL_MODULE_ENTITY_P flag is set we correctly realise that
it is in the entity map, but ICE when attempting to use that slot in
partition handling.

This patch fixes the issue by detecting this case and instead looking
for the slot for the TEMPLATE_DECL.  It doesn't matter that we never add
a slot for the inner decl because we're about to discard it anyway.

PR c++/120013

gcc/cp/ChangeLog:

* module.cc (trees_in::install_entity): Handle re-registering
the inner TYPE_DECL of a partial specialisation.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-8.h: New test.
* g++.dg/modules/partial-8_a.C: New test.
* g++.dg/modules/partial-8_b.C: New test.
* g++.dg/modules/partial-8_c.C: New test.
* g++.dg/modules/partial-8_d.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 41 --
  gcc/testsuite/g++.dg/modules/partial-8.h   |  8 +
  gcc/testsuite/g++.dg/modules/partial-8_a.C | 10 ++
  gcc/testsuite/g++.dg/modules/partial-8_b.C |  8 +
  gcc/testsuite/g++.dg/modules/partial-8_c.C |  7 
  gcc/testsuite/g++.dg/modules/partial-8_d.C |  9 +
  6 files changed, 72 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-8.h
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-8_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-8_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-8_c.C
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-8_d.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 200e1c2deb3..f728275612e 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8083,18 +8083,37 @@ trees_in::install_entity (tree decl)
gcc_checking_assert (!existed);
slot = ident;
  }
-  else if (state->is_partition ())
-{
-  /* The decl is already in the entity map, but we see it again now from a
-partition: we want to overwrite if the original decl wasn't also from
-a (possibly different) partition.  Otherwise, for things like template
-instantiations, make_dependency might not realise that this is also
-provided from a partition and should be considered part of this module
-(and thus always emitted into the primary interface's CMI).  */
+  else
+{
unsigned *slot = entity_map->get (DECL_UID (decl));
-  module_state *imp = import_entity_module (*slot);
-  if (!imp->is_partition ())
-   *slot = ident;
+
+  /* The entity must be in the entity map already.  However, DECL may
+be the DECL_TEMPLATE_RESULT of an existing partial specialisation
+if we matched it while streaming another instantiation; in this
+case we already registered that TEMPLATE_DECL.  */
+  if (!slot)
+   {
+ tree type = TREE_TYPE (decl);
+ gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
+  && CLASS_TYPE_P (type)
+  && CLASSTYPE_TEMPLATE_SPECIALIZATION (type));
+ slot = entity_map->get (DECL_UID (CLASSTYPE_TI_TEMPLATE (type)));
+   }
+  gcc_checking_assert (slot);
+
+  if (state->is_partition ())
+   {
+ /* The decl is already in the entity map, but we see it again now
+from a partition: we want to overwrite if the original decl
+wasn't also from a (possibly different) partition.  Otherwise,
+for things like template instantiations, make_dependency might
+not realise that this is also provided from a partition and
+should be considered part of this module (and thus always
+emitted into the primary interface's CMI).  */
+ module_state *imp = import_entity_module (*slot);
+ if (!imp->is_partition ())
+   *slot = ident;
+   }
  }
  
return true;

diff --git a/gcc/testsuite/g++.dg/modules/partial-8.h 
b/gcc/testsuite/g++.dg/modules/partial-8.h
new file mode 100644
index 000..d9a83a83e54
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-8.h
@@ -0,0 +1,8 @@
+// PR c++/120013
+
+template  struct tuple_element;
+template  tuple_element get(T);
+
+// This case wasn't an issue for the PR, but worth double-checking
+template  constexpr int var = 123;
+template  void foo(T, int = var);
diff --git a/gcc/testsuite/g++.dg/modules/partial-8_a.C 
b/gcc/testsuite/g++.dg/modules/partial-8_a.C
new file mode 100644
index 000..d6848c78360
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-8

[PATCH] libstdc++: Implement stringstream from string_view [P2495R3]

2025-05-19 Thread Nathan Myers

Add constructors to stringbuf, stringstream, istringstream,
and ostringstream, and a matching overload of str(sv) in each,
that take anything convertible to a string_view where the
existing functions take a string.

libstdc++-v3/ChangeLog:

P2495R3 stringstream to init from string_view-ish
* include/std/sstream: full implementation, really just
decls, requires clause and plumbing.
* include/std/bits/version.def, .h: new preprocessor symbol
__cpp_lib_sstream_from_string_view.
* testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
* testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
* testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
* testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.
---
 libstdc++-v3/ChangeLog|  11 +
 libstdc++-v3/include/bits/version.def |  11 +-
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/sstream  | 181 +--
 .../27_io/basic_istringstream/cons/char/2.cc  | 193 
 .../27_io/basic_ostringstream/cons/char/4.cc  | 193 
 .../27_io/basic_stringbuf/cons/char/3.cc  | 216 ++
 .../27_io/basic_stringstream/cons/char/2.cc   | 194 
 8 files changed, 990 insertions(+), 19 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/2.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_ostringstream/cons/char/4.cc
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_stringbuf/cons/char/3.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/2.cc

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index b45f8c2c7a5..ac0ff4a386f 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -41,6 +41,17 @@
PR libstdc++/119246
* include/std/format: Updated check for _GLIBCXX_FORMAT_F128.
 
+2025-05-14  Nathan Myers  
+   P2495R3 stringstream to init from string_view-ish
+   * include/std/sstream: full implementation, really just
+   decls, requires clause and plumbing.
+   * include/std/bits/version.def, .h: new preprocessor symbol
+   __cpp_lib_sstream_from_string_view.
+   * testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
+   * testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
+   * testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
+   * testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.
+
 2025-05-14  Tomasz Kamiński  
 
PR libstdc++/119125
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 6ca148f0488..567c56b4117 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -649,7 +649,7 @@ ftms = {
   };
   values = {
 v = 1;
-/* For when there's no gthread.  */
+// For when there is no gthread.
 cxxmin = 17;
 hosted = yes;
 gthread = no;
@@ -1961,6 +1961,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = sstream_from_string_view;
+  values = {
+v = 202302;
+cxxmin = 26;
+hosted = yes;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 48a090c14a3..5d1beb83a25 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2193,4 +2193,14 @@
 #endif /* !defined(__cpp_lib_modules) && defined(__glibcxx_want_modules) */
 #undef __glibcxx_want_modules
 
+#if !defined(__cpp_lib_sstream_from_string_view)
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED
+#  define __glibcxx_sstream_from_string_view 202302L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_sstream_from_string_view)
+#   define __cpp_lib_sstream_from_string_view 202302L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_sstream_from_string_view) && 
defined(__glibcxx_want_sstream_from_string_view) */
+#undef __glibcxx_want_sstream_from_string_view
+
 #undef __glibcxx_want_all
diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index ad0c16a91e8..9b2b0eb53fc 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -38,9 +38,14 @@
 #endif
 
 #include  // iostream
+#include 
 
 #include 
 #include 
+#ifdef __cpp_lib_sstream_from_string_view
+# include   // is_convertible_v
+#endif
+
 #include  // allocator_traits, __allocator_like
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
@@ -52,8 +57,6 @@
 # define _GLIBCXX_SSTREAM_ALWAYS_INLINE [[__gnu__::__always_inline__]]
 #endif
 
-
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -159,6 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0); }
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI

[PATCH] libstdc++: Implement stringstream from string_view [P2495R3]

2025-05-19 Thread Nathan Myers

Add constructors to stringbuf, stringstream, istringstream,
and ostringstream, and a matching overload of str(sv) in each,
that take anything convertible to a string_view where the
existing functions take a string.

libstdc++-v3/ChangeLog:

P2495R3 stringstream to init from string_view-ish
* include/std/sstream: full implementation, really just
decls, requires clause and plumbing.
* include/std/bits/version.def, .h: new preprocessor symbol
__cpp_lib_sstream_from_string_view.
* testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
* testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
* testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
* testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.
---
 libstdc++-v3/ChangeLog|  11 +
 libstdc++-v3/include/bits/version.def |  11 +-
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/sstream  | 181 +--
 .../27_io/basic_istringstream/cons/char/2.cc  | 193 
 .../27_io/basic_ostringstream/cons/char/4.cc  | 193 
 .../27_io/basic_stringbuf/cons/char/3.cc  | 216 ++
 .../27_io/basic_stringstream/cons/char/2.cc   | 194 
 8 files changed, 990 insertions(+), 19 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/2.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_ostringstream/cons/char/4.cc
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_stringbuf/cons/char/3.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/2.cc

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index b45f8c2c7a5..ac0ff4a386f 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -41,6 +41,17 @@
PR libstdc++/119246
* include/std/format: Updated check for _GLIBCXX_FORMAT_F128.
 
+2025-05-14  Nathan Myers  
+   P2495R3 stringstream to init from string_view-ish
+   * include/std/sstream: full implementation, really just
+   decls, requires clause and plumbing.
+   * include/std/bits/version.def, .h: new preprocessor symbol
+   __cpp_lib_sstream_from_string_view.
+   * testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
+   * testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
+   * testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
+   * testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.
+
 2025-05-14  Tomasz Kamiński  
 
PR libstdc++/119125
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 6ca148f0488..567c56b4117 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -649,7 +649,7 @@ ftms = {
   };
   values = {
 v = 1;
-/* For when there's no gthread.  */
+// For when there is no gthread.
 cxxmin = 17;
 hosted = yes;
 gthread = no;
@@ -1961,6 +1961,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = sstream_from_string_view;
+  values = {
+v = 202302;
+cxxmin = 26;
+hosted = yes;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 48a090c14a3..5d1beb83a25 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2193,4 +2193,14 @@
 #endif /* !defined(__cpp_lib_modules) && defined(__glibcxx_want_modules) */
 #undef __glibcxx_want_modules
 
+#if !defined(__cpp_lib_sstream_from_string_view)
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED
+#  define __glibcxx_sstream_from_string_view 202302L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_sstream_from_string_view)
+#   define __cpp_lib_sstream_from_string_view 202302L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_sstream_from_string_view) && 
defined(__glibcxx_want_sstream_from_string_view) */
+#undef __glibcxx_want_sstream_from_string_view
+
 #undef __glibcxx_want_all
diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index ad0c16a91e8..9b2b0eb53fc 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -38,9 +38,14 @@
 #endif
 
 #include  // iostream
+#include 
 
 #include 
 #include 
+#ifdef __cpp_lib_sstream_from_string_view
+# include   // is_convertible_v
+#endif
+
 #include  // allocator_traits, __allocator_like
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
@@ -52,8 +57,6 @@
 # define _GLIBCXX_SSTREAM_ALWAYS_INLINE [[__gnu__::__always_inline__]]
 #endif
 
-
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -159,6 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0); }
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI

[PATCH] c++: substituting fn parm redeclared with dep alias tmpl [PR120224]

2025-05-19 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/15/14?

-- >8 --

Here we declare f twice, first ordinarily and then using a dependent
alias template.  Due to alias template transparency these logically
declare the same overload.  But now the function type of f, which was
produced from the first declaration, diverges from the type of its
formal parameter, which is produced from the subsequent redefinition,
in that substituting T=int succeeds for the function type but not for
the formal parameter type.  This eventually causes us to produce an
undiagnosed error_mark_node in the AST of function call, leading to
a sanity check failure that added in r14-6343-g0c018a74eb1aff.

Before r14-6343, we would reject the testcase from
regenerate_decl_from_template when instantiating the definition of f,
making this a regression.

To fix this, it seems we just need to check for errors when substituting
the type of a PARM_DECL, since that could still fail despite substitution
into the function type succeeding.

PR c++/120224

gcc/cp/ChangeLog:

* pt.cc (tsubst_function_decl): Return error_mark_node if any
of the substituted function parameters are erroneous.
(tsubst_decl) : Return error_mark_node if
the substituted function parameter type is erroneous.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-80.C: New test.
---
 gcc/cp/pt.cc   |  9 -
 gcc/testsuite/g++.dg/cpp0x/alias-decl-80.C | 14 ++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-80.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1973d25b61a0..df6d7bb136ea 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14903,7 +14903,11 @@ tsubst_function_decl (tree t, tree args, 
tsubst_flags_t complain,
 parms = DECL_CHAIN (parms);
   parms = tsubst (parms, args, complain, t);
   for (tree parm = parms; parm; parm = DECL_CHAIN (parm))
-DECL_CONTEXT (parm) = r;
+{
+  if (parm == error_mark_node)
+   return error_mark_node;
+  DECL_CONTEXT (parm) = r;
+}
   if (closure && DECL_IOBJ_MEMBER_FUNCTION_P (t))
 {
   tree tparm = build_this_parm (r, closure, type_memfn_quals (type));
@@ -15474,6 +15478,9 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain,
   /* We're dealing with a normal parameter.  */
   type = tsubst (TREE_TYPE (t), args, complain, in_decl);
 
+   if (type == error_mark_node)
+ RETURN (error_mark_node);
+
 type = type_decays_to (type);
 TREE_TYPE (r) = type;
 cp_apply_type_quals_to_decl (cp_type_quals (type), r);
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-80.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-80.C
new file mode 100644
index ..e2ff663843de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-80.C
@@ -0,0 +1,14 @@
+// PR c++/120224
+// { dg-do compile { target c++11 } }
+
+template using void_t = void;
+
+template
+void f(void*); // #1
+
+template
+void f(void_t*) { } // { dg-error "not a class" } #2
+
+int main() {
+  f(0); // { dg-error "no match" }
+}
-- 
2.49.0.608.gcb96e1697a

1 2 >

1 - 100 of 119 matches

Mail list logo