RE: [RFC] PR81358: Enable automatic linking of libatomic

2024-12-20 Thread Prathamesh Kulkarni


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 18 December 2024 21:09
> To: Tobias Burnus ; Joseph Myers
> 
> Cc: Xi Ruoyao ; Matthew Malcomson
> ; gcc-patches@gcc.gnu.org
> Subject: RE: [RFC] PR81358: Enable automatic linking of libatomic
> 
> External email: Use caution opening links or attachments
> 
> 
> > -Original Message-
> > From: Tobias Burnus 
> > Sent: 18 December 2024 17:46
> > To: Prathamesh Kulkarni ; Joseph Myers
> > 
> > Cc: Xi Ruoyao ; Matthew Malcomson
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC] PR81358: Enable automatic linking of libatomic
> >
> > External email: Use caution opening links or attachments
> >
> >
> > A x86_64-gnu-linux bootstrap which also builds -m32 now fails.
> >
> > While I have local patches, they should not affect this, hence, I
> fear
> > that it has been caused by this patch.
> >
> > Namely, if I do a bootstrap into an empty directory with:
> >
> >$ /configure --prefix=...
> > --enable-languages=c,c++,fortran,lto,objc
> > --enable-offload-targets=nvptx-none,amdgcn-amdhsa
> >
> > I get during stage1 bootstrap:
> >
> > checking whether -lc should be explicitly linked in... no checking
> > dynamic linker characteristics... configure: error: Link tests are
> not
> > allowed after GCC_NO_EXECUTABLES.
> > make[2]: *** [Makefile:17281: configure-stage1-target-libstdc++-v3]
> > Error 1
> >
> > And while ./x86_64-pc-linux-gnu/libstdc++-v3/config.log is okay,
> > ./x86_64-pc-linux-gnu/32/libstdc++-v3/config.log fails with:
> >
> > configure:10962: $? = 0
> > configure:10976: result: no
> > configure:11141: checking dynamic linker characteristics
> > configure:11577: error: Link tests are not allowed after
> > GCC_NO_EXECUTABLES.
> >
> > Can you check?
> Hi Tobias,
> Sorry for the breakage. IIUC, the issue seems to be happening with
> libatomic/Makefile.am all: rule added in the patch, that incorrectly
> copies libatomic.a in $(gcc_objdir).
> I have a local patch that instead copies it over to
> $(gcc_objdir)$(MULTISUBDIR)/, which seems to fix the issue with stage-
> 1 multilib build (with -m32) on x86_64-linux-gnu.
> I will post it after running it thru bootstrap+test.
Hi,
The previous patch (now reverted) had two different issues both stemming from 
the rule added in libatomic/Makefile.am:
(1) As mentioned above, it broke multilib builds because it incorrectly copies 
libatomic.a in $(gcc_objdir). The attached patch fixes it by
instead copying libatomic.a  over to $(gcc_objdir)$(MULTISUBDIR)/, and can 
confirm that 64-bit libatomic.a is copied to $build/gcc/ and 32-bit
libatomic.a is copied to $build/gcc/32/.

(2) libatomic_convenience.la was not getting generated for some reason, which 
resulted in build failure while building libdruntime.
The patch adds libatomic_convenience.la as a dependency, and I can see it now 
getting generated, which seems to fix the build issue with libdruntime.

Patch passes bootstrap+test with multilib enabled for --enable-languages=all on 
x86_64-linux-gnu, and for --enable-languages=c,c++,fortran on aarch64-linux-gnu.
Does this version look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> 
> Thanks,
> Prathamesh
> >
> > Tobias
PR81358: Enable automatic linking of libatomic.

ChangeLog:
PR driver/81358
* Makefile.def: Add dependencies so libatomic is built before target
libraries are configured.
* Makefile.tpl: Export TARGET_CONFIGDIRS.
* configure.ac: Add libatomic to bootstrap_target_libs.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/ChangeLog:
PR driver/81358
* common.opt: New option -flink-libatomic.
* gcc.cc (LINK_LIBATOMIC_SPEC): New macro.
* config/gnu-user.h (GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC): Use
LINK_LIBATOMIC_SPEC.
* doc/invoke.texi: Document -flink-libatomic.
* configure.ac: Define TARGET_PROVIDES_LIBATOMIC.
* configure: Regenerate.
* config.in: Regenerate.

libatomic/ChangeLog:
PR driver/81358
* Makefile.am: Pass -fno-link-libatomic.
New rule all.
* configure.ac: Assert that CFLAGS is set and pass -fno-link-libatomic. 
* Makefile.in: Regenerate.
* configure: Regenerate.

Signed-off-by: Prathamesh Kulkarni 
Co-authored-by: Matthew Malcolmson 

diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..90899fa28cf 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -656,6 +656,26 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
 
+dependencies = { module=configure-target-libbacktrace; 
on=all-target-libatomic; };
+dependencies = { module=configure-target-libgloss; on=all-target-libatomic; };
+dependencies = { module=configure-target-newlib; on=all-target-libatomic; };
+dependencies = { module=configure-target-libgomp; on=all-target-libatomic; }

Re: [PATCH 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
On Fri, Dec 20, 2024 at 4:27 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 20, 2024 at 04:22:19PM +0100, Christoph Müllner wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> > @@ -0,0 +1,37 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
> > +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
>
> This line should be dg-additional-options, otherwise the last one overrides
> the previous one.

Oh, sorry for that.
I wonder why this did not trigger FAILs with
  --target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}

>
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
> > +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
>
> Ditto.
>
> Jakub
>


[PATCH v3 2/2] testsuite: tree-ssa: Fix i686/-m32 fails for vector-*.c tests

2024-12-20 Thread Christoph Müllner
FAILs have been reported for several tree-ssa vector-*.c tests
on i686-linux or on x86_64-linux with -m32.
This patch addresses these fails by setting the necessary -msse2 flags.

This patch also streamlines all tests to use dg-options instead
of dg-additional-options.  This is in line with most other tests
in gcc.dg/tree-ssa.

Tested with the following board config in RUNTESTFLAGS:
  --target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-10.c: Rename dg-additional-options
to dg-options and add -msse2 to it.
* gcc.dg/tree-ssa/vector-11.c: Likewise.
* gcc.dg/tree-ssa/vector-8.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-9.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vector-10.c | 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/vector-11.c | 6 ++
 gcc/testsuite/gcc.dg/tree-ssa/vector-8.c  | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vector-9.c  | 2 +-
 5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c 
b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
index 6042378f165..b0d4f4c4a83 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop4-details" } */
+/* { dg-options "-O3 -fdump-tree-forwprop4-details" } */
 /* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
index bb1ed92dc90..4a095a1b54a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -105,7 +106,7 @@ void f4 (vec *p_v_in_1, vec *p_v_out_1, vec *p_v_out_2)
   v_y = v_1 - v_2;
   v_out_1 = __builtin_shuffle (v_x, v_y, sel);
 
-  /* Won't merge because of dependency.  */
+  /* Won't blend because of dependency.  */
   v_in_2 = foo (v_out_1);
 
   /* Second vec perm sequence.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
index e4102d318d2..b1f0c74152a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -27,9 +28,6 @@ void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
   v_y = v_1 + v_2;
   v_out_2 = __builtin_shuffle (v_y, v_x, sel);
 
-  /* Won't blend because the narrowed sequence
- utilizes three of the four lanes.  */
-
   *p_v_out_1 = v_out_1;
   *p_v_out_2 = v_out_2;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
index ba9a0187c10..b7e0a529c28 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details" } */
 /* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
index 1aa2ef99c3c..2fb2be6e2a5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details" } */
 /* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
-- 
2.47.1



Re: [PATCH 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
On Fri, Dec 20, 2024 at 4:51 PM Christoph Müllner
 wrote:
>
> On Fri, Dec 20, 2024 at 4:27 PM Jakub Jelinek  wrote:
> >
> > On Fri, Dec 20, 2024 at 04:22:19PM +0100, Christoph Müllner wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> > > @@ -0,0 +1,37 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
> > > +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> >
> > This line should be dg-additional-options, otherwise the last one overrides
> > the previous one.
>
> Oh, sorry for that.
> I wonder why this did not trigger FAILs with
>   --target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}

I've identified the reason for this:
The overwrite removed fdump-tree-forwprop1-details, so no dump file
was generated leading to an UNRESOLVED and not a FAIL.


>
> >
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
> > > +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> >
> > Ditto.
> >
> > Jakub
> >


[PATCH v3 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
This patch adds two test cases for PR118149.
These tests are different than other tests in that one of the
vec-perm selectors contains indices in descending order (1, 1, 0, 0),
which is the root cause for the ICE observed in PR118149.

PR 118149

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118149-2.c: New test.
* gcc.dg/tree-ssa/pr118149.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c | 37 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c   | 20 
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
new file mode 100644
index 000..31f3d7e0dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+typedef int vec __attribute__((vector_size (4 * sizeof (float;
+
+void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
+{
+  vec sel00 = { 1, 1, 3, 3 };
+  vec sel01 = { 0, 0, 2, 2 };
+  vec sel10 = { 3, 3, 2, 2 };
+  vec sel11 = { 1, 1, 0, 0 };
+  vec sel = { 0, 1, 6, 7 };
+  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
+  vec v_in = *p_v_in;
+
+  /* First vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel00);
+  v_2 = __builtin_shuffle (v_in, v_in, sel01);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
+
+  /* Second vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel10);
+  v_2 = __builtin_shuffle (v_in, v_in, sel11);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
+
+  *p_v_out_1 = v_out_1;
+  *p_v_out_2 = v_out_2;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
new file mode 100644
index 000..f471877f661
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+float *fastconv_parse_dst;
+
+void fastconv_parse ()
+{
+  float r3k = fastconv_parse_dst[1] - fastconv_parse_dst[3],
+i0k = fastconv_parse_dst[4] + fastconv_parse_dst[6],
+i1k = fastconv_parse_dst[4] - fastconv_parse_dst[6],
+i2k = fastconv_parse_dst[5] + fastconv_parse_dst[7];
+  fastconv_parse_dst[1] = fastconv_parse_dst[0];
+  fastconv_parse_dst[4] = fastconv_parse_dst[5] = i0k - i2k;
+  fastconv_parse_dst[6] = fastconv_parse_dst[7] = i1k + r3k;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop4" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop4" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop4" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
-- 
2.47.1



Re: [PATCH] Add RISC-V/rv64gc as a secondary platform

2024-12-20 Thread Palmer Dabbelt

On Thu, 19 Dec 2024 15:01:26 PST (-0800), jeffreya...@gmail.com wrote:



On 12/19/24 3:08 PM, Palmer Dabbelt wrote:


I agree lacking B and V makes us very clearly uncompetitive in the space
where these sort of things matter (ie, binary compatible distros and
long term stability type things) -- the gap is just too big to close by
doing clever things in the hardware.  Maybe just B and V isn't enough,
it's hard to tell, but lacking them seems pretty clearly uncompetitive.

I'm not sure B is so scary on the SW side of things, it's been mostly
performance issues we've been fixing.  V is huge, though, and we've
generally found a bunch of V-related functional codegen bugs.  Without
reliable hardware to test against (and do distro builds and such) it
just seems premature to declare that being as stable as the other ports
on the list.

It wouldn't take much to push me into agreeing to B -- it's not scary in
any way.  There's just notable systems out there that don't implement B,
but I wouldn't mind leaving them behind for this change.

V has real performance concerns.  I haven't tested it performance-wise
on the BPI recently, but when I last did the general rule of thumb was
the more vector you did, the worse it performed *especially* for FP.


Yep.  TBH that's actually my biggest worry here: it might be that just 
V isn't enough, which means we have another few generations of HW before 
all the V add-on extensions coalesce into something that's widely 
implemented.  I'm not really sure there, though -- the HW guys can be 
pretty clever so maybe V gets us close enough to be interesting.




jeff


Re: [2nd PING, Fortran, Patch, PR107635, Part 1] Rework handling of allocatable components in derived type coarrays.

2024-12-20 Thread Jerry Delisle
I got as far as applying and regression testing version 2. I have queried
around for others to look over the patch. I will scan this one.

Since you are the expert in this area, we may just have you push and let
the pieces fall out.

I will get back to you later today.

Jerry

On Fri, Dec 20, 2024, 4:37 AM Andre Vehreschild  wrote:

> Hi all,
>
> pinging again.
>
> Please note, that attached patch is a slightly updated version.
>
> Regtests ok on x86_64-pc-linux-gnu. Ok for mainline?
>
> Regards,
> Andre
>
> On Mon, 16 Dec 2024 10:58:22 +0100
> Andre Vehreschild  wrote:
>
> > PING!
> >
> > On Fri, 6 Dec 2024 19:10:08 +0100
> > Andre Vehreschild  wrote:
> >
> > > Hi all,
> > >
> > > I had to dive deeply into the issue with handling allocatable
> components in
> > > derived types and to find a future proof solution. I hope do have
> found a
> > > universal and flexible one now:
> > >
> > > For each allocatable (or pointer) component in a derived type a coarray
> > > token is required. While this is quite easy for the current compilation
> > > unit, it is very difficult to do for arbitrary compilation units or
> when
> > > one gets libraries that are not aware of coarray's needs. The approach
> this
> > > patch now implements is to delegate the evaluation of a reference into
> a
> > > coarray into a separate access routine. This routine is present on
> every
> > > image, because it is generated by the compiler at the time it knows
> that a
> > > coarray access is done. With external compilation units or libraries
> this
> > > solves the access issue, because each image knows how to access its own
> > > objects, but does not need a coarray token for allocatable (or pointer)
> > > components anymore. The access on the remote image's object is done by
> the
> > > remote image itself (for the MPI implementation in a separate thread).
> > > Therefore it knows about the bounds of arrays, allocation and
> association
> > > state of components and can handle those.
> > >
> > > Furthermore is this approach faster, because it has O(1) complexity
> > > regarding the communication. The old approach was O(N) where N is the
> > > number of allocatable/pointer components + array descriptors on the
> path of
> > > the access. The new approach sends a set of parameters to the remote
> image
> > > and gets the desired data in return.
> > >
> > > At the moment the patch handles only getting of data from a remote
> image. It
> > > is split into two patchsets. The first one does some preparatory clean
> up,
> > > like stopping to add caf_get calls into the expression tree and
> removing
> > > them afterwards again, where they are in the way.
> > >
> > > The second patch is then doing the access routine creation.
> Unfortunately is
> > > this the longer patch. I have also updated the documentation of the caf
> > > API. I hope to not have overlooked something.
> > >
> > > This is the first part of a series to rework all coarray access
> routines to
> > > use the new approach and then remove the deprecated calls. This makes
> things
> > > clearer and easier to maintain, although the tree-dump now presents
> some
> > > more generated routines, which might look odd.
> > >
> > > Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and
> 41. Ok
> > > for mainline?
> > >
> > > I will continue working on the coarray stuff and fix upcoming bugs in
> the
> > > future.
> > >
> > > Regards,
> > > Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [ping][patch] Allow target to chose address-space for artificial rodata lookup tables.

2024-12-20 Thread Jeff Law




On 12/19/24 11:36 PM, Georg-Johann Lay wrote:

This is a ping for

https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671216.html
I'd prefer this to wait for gcc-16 as we're well into stage3.  While 
using a hook makes it fairly safe, I think everyone is quite busy with 
stage3 activities.


jeff



Re: [PATCH v2] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-20 Thread Jonathan Wakely
On Fri, 20 Dec 2024 at 14:21, Giuseppe D'Angelo
 wrote:
>
> Hi,
>
> On 20/12/2024 13:37, Jonathan Wakely wrote:
> >
> > With this, I get seven PASS results for this test.
>
> Thank you; I was just thinking, would it be simpler to split the test in
> two instead rather than making it more convoluted than necessary? Let me
> know and I'll amend it, in either direction.


I did think about that, but I think it's OK in a single test. That
will also run a tiny bit quicker, as we won't have two tests which are
skipped for most targets, we'll just have one which is only skipped
for C++17 down.

And I don't expect this test to need to change in future, so it
shouldn't get any more convoluted than it is now, and won't be a
maintenance headache.

So I think I'll just finish testing it like that, as one test, and push that.



Re: [PATCH] Add RISC-V/rv64gc as a secondary platform

2024-12-20 Thread Richard Biener



> Am 20.12.2024 um 17:49 schrieb Palmer Dabbelt :
> 
> On Thu, 19 Dec 2024 15:01:26 PST (-0800), jeffreya...@gmail.com wrote:
>> 
>> 
>>> On 12/19/24 3:08 PM, Palmer Dabbelt wrote:
>>> 
>>> I agree lacking B and V makes us very clearly uncompetitive in the space
>>> where these sort of things matter (ie, binary compatible distros and
>>> long term stability type things) -- the gap is just too big to close by
>>> doing clever things in the hardware.  Maybe just B and V isn't enough,
>>> it's hard to tell, but lacking them seems pretty clearly uncompetitive.
>>> 
>>> I'm not sure B is so scary on the SW side of things, it's been mostly
>>> performance issues we've been fixing.  V is huge, though, and we've
>>> generally found a bunch of V-related functional codegen bugs.  Without
>>> reliable hardware to test against (and do distro builds and such) it
>>> just seems premature to declare that being as stable as the other ports
>>> on the list.
>> It wouldn't take much to push me into agreeing to B -- it's not scary in
>> any way.  There's just notable systems out there that don't implement B,
>> but I wouldn't mind leaving them behind for this change.
>> 
>> V has real performance concerns.  I haven't tested it performance-wise
>> on the BPI recently, but when I last did the general rule of thumb was
>> the more vector you did, the worse it performed *especially* for FP.
> 
> Yep.  TBH that's actually my biggest worry here: it might be that just V 
> isn't enough, which means we have another few generations of HW before all 
> the V add-on extensions coalesce into something that's widely implemented.  
> I'm not really sure there, though -- the HW guys can be pretty clever so 
> maybe V gets us close enough to be interesting.

Note we usually do Not restrict ports to a subset of the ISA, iff extensions 
are not ready for prime time Id suggest to instead never enable those by 
default.

Note we might want to start documenting primary/secondary _hosts_ vs targets as 
well.  For a native compiler that might imply a select default ISA.

Richard 

>> 
>> jeff


Re: [2nd PING, Fortran, Patch, PR107635, Part 1] Rework handling of allocatable components in derived type coarrays.

2024-12-20 Thread Andre Vehreschild

Thank you very much Jerry.

The delta between the two patches is really minor. In resolve.c I have
removed some attr.pointer setting and in caf/single.c the handling for
those as well as treating non-descriptor arrays differently.

Most changes in the diff of v2 to v3 are about lines having moved due to
other patches.

And I will be around for fixing occurring issues. Btw, in the OpenCoarrays
GitHub repo the branch for 774 implements this new way of accessing data
for mpi.

Thanks again for looking and regards,
Andre

Andre Vehreschild


[PATCH] libstdc++: fix a dangling reference crash in ranges::is_permutation

2024-12-20 Thread Giuseppe D'Angelo

Hi,

The implementation of ranges::is_permutation may create a dangling 
reference, which then results (sometimes) in a crash. A minimal example 
that shows the problem under ASAN is https://gcc.godbolt.org/z/7bP9nE8fK


The attached patch fixes it. I've tested on Linux x86-64. Adding a 
negative test for this is somehow challenging (how do you test you're 
not using a dangling reference?), but running the modified test under 
ASAN shows the fix in place.


Do you need me to create a report on bugzilla and cross-reference it 
from the patch?


Thanks,
--
Giuseppe D'Angelo
From 0433b4f2678d00c3699c8375c7d277ca6d7e45cb Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Fri, 20 Dec 2024 12:55:01 +0100
Subject: [PATCH] libstdc++: fix a dangling reference crash in
 ranges::is_permutation

The code was caching the result of `invoke(proj, *it)` in a local
`auto &&` variable. The problem is that this may create dangling
references, for instance in case `proj` is `std::identity` (the common
case) and `*it` produces a prvalue.

Rather than finding a "creative" solution (e.g. use the value category
of `*it`, and the return type of calling the projection on that, to
determine whether we can keep a reference or we need a value), get rid
of the caching and call `invoke` as needed. In the common case the
projection is cheap, and we are allowed to dereference the same iterator
more than once (they're forward). This also sounds more correct because
we pass the correct value category (obtained from applying the
projection to the iterator) to the comparator.

libstdc++-v3/ChangeLog:

	* include/bits/ranges_algo.h: Do not cache the projection result
	in a local variable, as that may create dangling references.
	* testsuite/25_algorithms/is_permutation/constrained.cc: Add a
	test with a range returning prvalues.

Signed-off-by: Giuseppe D'Angelo 
---
 libstdc++-v3/include/bits/ranges_algo.h|  3 +--
 .../25_algorithms/is_permutation/constrained.cc| 10 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h
index 772bf4dd997..4d3c4325e2c 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -567,9 +567,8 @@ namespace ranges
 
 	for (auto __scan = __first1; __scan != __last1; ++__scan)
 	  {
-	auto&& __proj_scan = std::__invoke(__proj1, *__scan);
 	auto __comp_scan = [&]  (_Tp&& __arg) -> bool {
-	  return std::__invoke(__pred, __proj_scan,
+	  return std::__invoke(__pred, std::__invoke(__proj1, *__scan),
    std::forward<_Tp>(__arg));
 	};
 	if (__scan != ranges::find_if(__first1, __scan,
diff --git a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
index 2fbebe37609..c24ff4312e2 100644
--- a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
@@ -19,6 +19,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -76,6 +77,15 @@ test03()
   while (std::next_permutation(std::begin(cx), std::end(cx)));
 }
 
+void
+test04()
+{
+  int x[] = { 4, 3, 2, 1 };
+  auto y = std::views::iota(1, 5);
+  VERIFY( ranges::is_permutation(x, y) );
+  VERIFY( ranges::is_permutation(y, x) );
+}
+
 int
 main()
 {
-- 
2.34.1



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [2nd PING, Fortran, Patch, PR107635, Part 1] Rework handling of allocatable components in derived type coarrays.

2024-12-20 Thread Andre Vehreschild
Hi all,

pinging again.

Please note, that attached patch is a slightly updated version.

Regtests ok on x86_64-pc-linux-gnu. Ok for mainline?

Regards,
Andre

On Mon, 16 Dec 2024 10:58:22 +0100
Andre Vehreschild  wrote:

> PING!
>
> On Fri, 6 Dec 2024 19:10:08 +0100
> Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > I had to dive deeply into the issue with handling allocatable components in
> > derived types and to find a future proof solution. I hope do have found a
> > universal and flexible one now:
> >
> > For each allocatable (or pointer) component in a derived type a coarray
> > token is required. While this is quite easy for the current compilation
> > unit, it is very difficult to do for arbitrary compilation units or when
> > one gets libraries that are not aware of coarray's needs. The approach this
> > patch now implements is to delegate the evaluation of a reference into a
> > coarray into a separate access routine. This routine is present on every
> > image, because it is generated by the compiler at the time it knows that a
> > coarray access is done. With external compilation units or libraries this
> > solves the access issue, because each image knows how to access its own
> > objects, but does not need a coarray token for allocatable (or pointer)
> > components anymore. The access on the remote image's object is done by the
> > remote image itself (for the MPI implementation in a separate thread).
> > Therefore it knows about the bounds of arrays, allocation and association
> > state of components and can handle those.
> >
> > Furthermore is this approach faster, because it has O(1) complexity
> > regarding the communication. The old approach was O(N) where N is the
> > number of allocatable/pointer components + array descriptors on the path of
> > the access. The new approach sends a set of parameters to the remote image
> > and gets the desired data in return.
> >
> > At the moment the patch handles only getting of data from a remote image. It
> > is split into two patchsets. The first one does some preparatory clean up,
> > like stopping to add caf_get calls into the expression tree and removing
> > them afterwards again, where they are in the way.
> >
> > The second patch is then doing the access routine creation. Unfortunately is
> > this the longer patch. I have also updated the documentation of the caf
> > API. I hope to not have overlooked something.
> >
> > This is the first part of a series to rework all coarray access routines to
> > use the new approach and then remove the deprecated calls. This makes things
> > clearer and easier to maintain, although the tree-dump now presents some
> > more generated routines, which might look odd.
> >
> > Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok
> > for mainline?
> >
> > I will continue working on the coarray stuff and fix upcoming bugs in the
> > future.
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From d8a31b2ebc078a86ac8dee78e7865e98e003b9b7 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 6 Dec 2024 08:57:34 +0100
Subject: [PATCH 2/2] Fortran: Replace getting of coarray data with
 accessor-based version. [PR107635]

Getting coarray data from remote images was slow, inefficient and did
not work for object files that where not compiled with coarray support
for derived types with allocatable/pointer components.  The old approach
emulated accessing data through a whole structure ref, which was error
prone for corner cases.  Furthermore was did it have a runtime
complexity of O(N), where N is the number of allocatable/pointer
components and descriptors involved.  Each of those needed communication
twice.  The new approach creates a routine for each access into a
coarray object putting all required operations there.  Looking a
tree-dump one will see those small routines.  But this time it is just
compiled fortran with all the knowledge of the compiler of bounds and so
on.  New paradigms will be available out of the box.  Furthermore is the
complexity of the communication reduced to be O(1).  E.g. the mpi
implementation sends one message for the parameters of the access and
one message back with the results without caring about the number of
allocatable/pointer/descriptor components in the access.

Identification of access routines is done be adding them to a hash map,
where the hash is the same on all images.  Translating the hash to an
index, which is the same on all images again, allows for fast calls of
the access routines.  Resolving the hash to an index is cached at
runtime, preventing additional hash map lookups.  A hashmap was use
because not all processor OS combinations may use the same address for
the access routine.

gcc/fortran/ChangeLog:

	PR fortran/107635

	* gfortran.h (gfc_add_caf_accessor): New function.
	* gfortran.texi

Re: [PATCH v2] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-20 Thread Jonathan Wakely

On 19/12/24 14:49 +0100, Giuseppe D'Angelo wrote:

diff --git a/libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc 
b/libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc
new file mode 100644
index 000..ef43541b769
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc
@@ -0,0 +1,31 @@
+// { dg-do run { target c++20 } }
+
+#include 
+#include 
+#include 
+
+#include 
+
+// Examples from P2447R4
+void one(std::pair) {}
+void one(std::span) {}
+void two(std::span) {}
+constexpr std::size_t three(std::span v) { return v.size(); }
+constexpr std::size_t four(std::span v) { return v.size(); }
+
+int main()
+{
+  one({1, 2}); // { dg-error "call of overloaded" "should be ambiguous with the 
one(std::pair) overload" { target c++26 } }
+  two({{1, 2}}); // { dg-error "would use explicit constructor" "should prefer the 
initializer_list constructor, which is explicit" { target c++26 } }
+
+  void *array3[10];
+  std::any array4[10];
+
+#if __cpp_lib_span_initializer_list
+  static_assert(three({array3, 0}) == 2);
+  VERIFY(four({array4, array4 + 10}) == 2);


This runtime VERIFY assertion doesn't make sense because the file
doesn't compile at all in C++26 mode, it fails due to the errors
above.


+#else
+  static_assert(three({array3, 0}) == 0);
+  VERIFY(four({array4, array4 + 10}) == 10);
+#endif
+}
--
2.34.1



I'm seeing failures for this new test when compiled as C++26, e.g.
when using

GLIBCXX_TESTSUITE_STDS=20,23,26 make check 
RUNTESTFLAGS="conformance.exp=23_containers/span/init_list_cons_neg.cc -a"

The results are:

PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++20 (test for excess 
errors)
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++20 execution test
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++23 (test for excess 
errors)
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++23 execution test
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++26 should be 
ambiguous with the one(std::pair) overload (test for errors, line 18)
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++26 should prefer the 
initializer_list constructor, which is explicit (test for errors, line 19)
PASS: 23_containers/span/init_list_cons_neg.cc  -std=gnu++26 (test for excess 
errors)
UNRESOLVED: 23_containers/span/init_list_cons_neg.cc  -std=gnu++26 compilation 
failed to produce executable

The problem is that { dg-do run ... } means that dejagnu expects the
file to compile and run, but it fails to compile for C++26 so cannot
be run.

I think what's needed is:

// { dg-do run { target { c++20 && c++23_down } } }
// { dg-do compile { target c++26 } }

This makes it a compile+link+run test for C++20/23 but only a compile
test for C++26, so it won't complain about it being UNRESOLVED for the
run step.


To solve the problem of the VERIFY not being checked for C++26, let's
repalce std::any with a constexpr-compatible Any type and move the
arrays to namespace scope, like so:

// { dg-do run { target { c++20 && c++23_down } } }
// { dg-do compile { target c++26 } }

#include 
#include 

#include 

struct Any {
  constexpr Any() { }
  template constexpr Any(T) { }
};

// Examples from P2447R4
void one(std::pair) {}
void one(std::span) {}
void two(std::span) {}
constexpr std::size_t three(std::span v) { return v.size(); }
constexpr std::size_t four(std::span v) { return v.size(); }

void *array3[10];
Any array4[10];

int main()
{
  one({1, 2}); // { dg-error "call of overloaded" "should be ambiguous with the 
one(std::pair) overload" { target c++26 } }
  two({{1, 2}}); // { dg-error "would use explicit constructor" "should prefer the 
initializer_list constructor, which is explicit" { target c++26 } }

#if __cpp_lib_span_initializer_list
  static_assert(three({array3, 0}) == 2);
  static_assert(four({array4, array4 + 10}) == 2);
#else
  static_assert(three({array3, 0}) == 0);
  static_assert(four({array4, array4 + 10}) == 10);
#endif
}

With this, I get seven PASS results for this test.




Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-20 Thread Robin Dapp
> "Robin Dapp"  writes:
>> But here I intended to just change the encoded value in memory and to
>> prevent it from aliasing with other encoded values.
>> In an actual instruction the values we "padded" (or that are beyond the
>> vector length) are ignored.  For the purpose of hashing a mask of
>> "1010" and "10101010" should be distinct which they are not right now
>> because we don't stop at the length so to say.
>
> But what modes do "1010" and "10101010" represent here?  And which code
> treats them as equal?  (It should be ok if the hashes are equal, as long
> as the equality function distinguishes them.)

I realize I didn't fully detail that in the original post, sorry :/

In my test case we have two functions one with the first mask ("1010" for
V4BI) and another one performing a similar operation just for
V8BI ("10101010").

In optimize_constant_pool we use native_encode_rtx to encode the value
into a buffer (of size GET_MODE_SIZE which is one byte here).
IIRC the whole function works at byte granularity.

htab->find_slot_with_hash
then finds the same hash for the V4BI and V8BI masks, and we wrongly unify it.

-- 
Regards
 Robin



[PUSHED] c++: regenerate opt urls

2024-12-20 Thread Nathaniel Shead
Pushed as obvious.

-- >8 --

This should have been part of r15-6379-g0c2ae384326108.

gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

Signed-off-by: Nathaniel Shead 
---
 gcc/c-family/c.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index 421cc08e2c7..fd7ffd38d53 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -870,6 +870,9 @@ 
UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wno-template-body)
 Wtemplate-id-cdtor
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wno-template-id-cdtor)
 
+Wtemplate-names-tu-local
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wno-template-names-tu-local)
+
 Wterminate
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wno-terminate)
 
-- 
2.47.0



Re: [PATCH v2] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-20 Thread Jonathan Wakely
On Thu, 19 Dec 2024 at 13:49, Giuseppe D'Angelo
 wrote:
>
> Hello,
>
> On 19/12/2024 13:27, Jonathan Wakely wrote:
> > I was about to push this and realised it's missing a Signed-off-by
> > tag. I assume you meant to contribute this under the DCO terms, as
> > with your previous patches?
>
> Yes, of course; sorry for forgetting the line. Here's a signed patch.

Great, thanks for the prompt reply! I'll test and get it pushed.



[PATCH] c++: handle decltype in nested-name-spec printing [PR118139]

2024-12-20 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Compiling this test, we emit:

  error: 'static void CW::operator=(int) requires 
requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type 
x) {x;}' must be a non-static member function

where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
to print:

error: 'static void CW::operator=(int) requires requires(typename 
decltype(T())::type x) {x;}' must be a non-static member function

PR c++/118139

gcc/cp/ChangeLog:

* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
a computed-type-specifier.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/decltype1.C: New test.
---
 gcc/cp/cxx-pretty-print.cc  | 14 +++---
 gcc/testsuite/g++.dg/diagnostic/decltype1.C |  8 
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/decltype1.C

diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
index 4104c6863c7..caa74e3b86f 100644
--- a/gcc/cp/cxx-pretty-print.cc
+++ b/gcc/cp/cxx-pretty-print.cc
@@ -234,8 +234,12 @@ pp_cxx_template_keyword_if_needed (cxx_pretty_printer *pp, 
tree scope, tree t)
 }
 
 /* nested-name-specifier:
-  class-or-namespace-name :: nested-name-specifier(opt)
-  class-or-namespace-name :: template nested-name-specifier   */
+  ::
+  type-name ::
+  namespace-name ::
+  computed-type-specifier ::
+  nested-name-specifier identifier ::
+  nested-name-specifier templateopt simple-template-id ::  */
 
 static void
 pp_cxx_nested_name_specifier (cxx_pretty_printer *pp, tree t)
@@ -252,7 +256,11 @@ pp_cxx_nested_name_specifier (cxx_pretty_printer *pp, tree 
t)
   tree scope = get_containing_scope (t);
   pp_cxx_nested_name_specifier (pp, scope);
   pp_cxx_template_keyword_if_needed (pp, scope, t);
-  pp_cxx_unqualified_id (pp, t);
+  /* This is a computed-type-specifier.  */
+  if (TREE_CODE (t) == PACK_INDEX_TYPE || TREE_CODE (t) == DECLTYPE_TYPE)
+   pp->type_id (t);
+  else
+   pp_cxx_unqualified_id (pp, t);
   pp_cxx_colon_colon (pp);
 }
 }
diff --git a/gcc/testsuite/g++.dg/diagnostic/decltype1.C 
b/gcc/testsuite/g++.dg/diagnostic/decltype1.C
new file mode 100644
index 000..8b57b165603
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/decltype1.C
@@ -0,0 +1,8 @@
+// PR c++/118139
+// { dg-do compile { target c++20 } }
+
+template
+struct CW {
+  using V = typename decltype(T())::type;
+  static void operator=(int) requires requires(V x) { x; } {} // { dg-error 
{requires requires\(typename decltype\(T\(\)\)::type x\)} }
+};

base-commit: 219ddae16f9d724baeff86934f8981aa5ef7b95f
-- 
2.47.1



Re: [PATCH] Fortran: potential aliasing of complex pointer inquiry references [PR118120]

2024-12-20 Thread Harald Anlauf

Am 20.12.24 um 00:46 schrieb Jerry D:

On 12/19/24 1:34 PM, Harald Anlauf wrote:

Dear all,

the check for potential aliasing of lhs and rhs currently shortcuts
if the types differ.  This is a problem if one is of type complex
and the other is of type real (and of the same kind parameter value),
as this ignores that F2008 inquiry references (%RE, %IM) could be
involved.  The attached patch just addresses this shortcut.

This may not be a complete solution, see discussion in the PR,
but is a lightweight solution (for the time being).

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


I agree with Steve, OK. The inquiry references can be dealt with later.


Steve, Jerry,

I've slightly extended the testcase to ensure that PR113928 is
covered, and pushed as r15-6395.

Thanks for the review!

Harald


Regards,

Jerry





Re: [PATCH] Add RISC-V/rv64gc as a secondary platform

2024-12-20 Thread Palmer Dabbelt

On Fri, 20 Dec 2024 09:09:02 PST (-0800), richard.guent...@gmail.com wrote:




Am 20.12.2024 um 17:49 schrieb Palmer Dabbelt :

On Thu, 19 Dec 2024 15:01:26 PST (-0800), jeffreya...@gmail.com wrote:




On 12/19/24 3:08 PM, Palmer Dabbelt wrote:

I agree lacking B and V makes us very clearly uncompetitive in the space
where these sort of things matter (ie, binary compatible distros and
long term stability type things) -- the gap is just too big to close by
doing clever things in the hardware.  Maybe just B and V isn't enough,
it's hard to tell, but lacking them seems pretty clearly uncompetitive.

I'm not sure B is so scary on the SW side of things, it's been mostly
performance issues we've been fixing.  V is huge, though, and we've
generally found a bunch of V-related functional codegen bugs.  Without
reliable hardware to test against (and do distro builds and such) it
just seems premature to declare that being as stable as the other ports
on the list.

It wouldn't take much to push me into agreeing to B -- it's not scary in
any way.  There's just notable systems out there that don't implement B,
but I wouldn't mind leaving them behind for this change.

V has real performance concerns.  I haven't tested it performance-wise
on the BPI recently, but when I last did the general rule of thumb was
the more vector you did, the worse it performed *especially* for FP.


Yep.  TBH that's actually my biggest worry here: it might be that just V isn't 
enough, which means we have another few generations of HW before all the V 
add-on extensions coalesce into something that's widely implemented.  I'm not 
really sure there, though -- the HW guys can be pretty clever so maybe V gets 
us close enough to be interesting.


Note we usually do Not restrict ports to a subset of the ISA, iff extensions 
are not ready for prime time Id suggest to instead never enable those by 
default.


Ya, I'd noticed that when writing it ;)

Right now we don't have any sort of autoconf-time 
"--enable-experimental-extensions" type thing, so they're just all in 
the mix and controlled by -march.  We don't generate code for them by 
default or anything, but users can enable them.


So as a result we have a few tiers of how good stuff is in the compiler.  
I don't think we've sat down and though up anything formal, as right now 
they're just all in there together, but I'd line them up as:


* rv64gc.  This is pretty good: full distro builds, hardware, etc.  
 There's still bugs, but they're not earth-shattering.
* Things like rv32, soft-float, B.  These all pass the smell test, but 
 we don't have the big breadth of examples where they work because 
 there's no hardware and thus no distro builds.  So maybe they're fine, 
 or maybe we're just being optimistic.
* Things like V, which don't pass the smell test (like we've been 
 talking about in the thread).
* The vendor extensions.  I think most of these will remain low quality 
 forever.


This sort of thing has come up a few times in different contexts, and 
I've generally come to the conclusion that trying to support all this 
stuff that's not actually implemented in hardware really just results in 
us shooting ourselves in the feet.  We've had this acceptance creep over 
the years (first we required hardware, then just a spec, and now vendor 
stuff) and it's just slowly built up hard to support code.


The problem is that doing anything else basically involves telling 
everyone to go fork the software world, which I think is worse.



Note we might want to start documenting primary/secondary _hosts_ vs targets as 
well.  For a native compiler that might imply a select default ISA.

Richard 



jeff


[PING] [PATCH, part4] Fortran: fix passing of NULL() to assumed-rank, derived type dummy [PR104819]

2024-12-20 Thread Harald Anlauf

Ping!

Am 14.12.24 um 20:56 schrieb Harald Anlauf:

Dear all,

this patch is the 4th part of a series on the passing of NULL() to
assumed-rank dummies.  This one handles the case of a derived type
dummy and is mostly straightforward.

There was one particular problem I encountered: passing NULL() to
an allocatable dummy with no intent given.  This lead to an ICE
I could not resolve other than treating this the same as if an
intent(in) were given.  If someone has a better idea, I'd love
to learn about it...

Testcase cross-checked with Intel's ifx.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

P.S.: if someone would like to assist with the case of class
dummies, please let me know.





Re: [PATCH] c++: handle decltype in nested-name-spec printing [PR118139]

2024-12-20 Thread Marek Polacek
On Fri, Dec 20, 2024 at 12:51:02PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> Compiling this test, we emit:
> 
>   error: 'static void CW::operator=(int) requires 
> requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type 
> x) {x;}' must be a non-static member function
> 
> where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
> to print:
> 
> error: 'static void CW::operator=(int) requires requires(typename 
> decltype(T())::type x) {x;}' must be a non-static member function
> 
>   PR c++/118139
> 
> gcc/cp/ChangeLog:
> 
>   * cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
>   a computed-type-specifier.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/diagnostic/decltype1.C: New test.
> ---
>  gcc/cp/cxx-pretty-print.cc  | 14 +++---
>  gcc/testsuite/g++.dg/diagnostic/decltype1.C |  8 
>  2 files changed, 19 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/diagnostic/decltype1.C
> 
> diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
> index 4104c6863c7..caa74e3b86f 100644
> --- a/gcc/cp/cxx-pretty-print.cc
> +++ b/gcc/cp/cxx-pretty-print.cc
> @@ -234,8 +234,12 @@ pp_cxx_template_keyword_if_needed (cxx_pretty_printer 
> *pp, tree scope, tree t)
>  }
>  
>  /* nested-name-specifier:
> -  class-or-namespace-name :: nested-name-specifier(opt)
> -  class-or-namespace-name :: template nested-name-specifier   */
> +  ::
> +  type-name ::
> +  namespace-name ::
> +  computed-type-specifier ::
> +  nested-name-specifier identifier ::
> +  nested-name-specifier templateopt simple-template-id ::  */

Consider this line fixed to say:

  nested-name-specifier template(opt) simple-template-id ::

(missing parens).

Marek



[PATCH] c++: ICE with pack indexing and partial inst [PR117937]

2024-12-20 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE in expand_expr_real_1:

  if (exp)
{
  tree context = decl_function_context (exp);
  gcc_assert (SCOPE_FILE_SCOPE_P (context)
  || context == current_function_decl

on something like this test:

  void
  f (auto... args)
  {
[&](seq) {
g(args...[i]...);
}(seq<0>());
  }

because while current_function_decl is:

  f(int)::)> [with long unsigned int ...i = {0}]

(correct), context is:

  f(int)::)>

which is only the partial instantiation.

I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it.  The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator().  So we should let retrieve_local_specialization
find the right args#0.

PR c++/117937

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.
---
 gcc/cp/pt.cc |  8 +++
 gcc/testsuite/g++.dg/cpp26/pack-indexing13.C | 23 
 gcc/testsuite/g++.dg/cpp26/pack-indexing14.C | 18 +++
 3 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
 create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing14.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 7fa286698ef..c40b147e837 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14063,6 +14063,14 @@ tsubst_pack_index (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   tree pack = PACK_INDEX_PACK (t);
   if (PACK_EXPANSION_P (pack))
 pack = tsubst_pack_expansion (pack, args, complain, in_decl);
+  else
+{
+  /* PACK can be {*args#0} whose args#0's value-expr refers to
+a partially instantiated closure.  Let tsubst find the
+fully-instantiated one.  */
+  gcc_assert (TREE_CODE (pack) == TREE_VEC);
+  pack = tsubst (pack, args, complain, in_decl);
+}
   if (TREE_CODE (pack) == TREE_VEC && TREE_VEC_LENGTH (pack) == 0)
 {
   if (complain & tf_error)
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
new file mode 100644
index 000..e0dd9c21c67
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
@@ -0,0 +1,23 @@
+// PR c++/117937
+// { dg-do compile { target c++26 } }
+
+using size_t = decltype(sizeof(0));
+
+template
+struct seq {};
+
+void g(auto...) {}
+
+void
+f (auto... args)
+{
+  [&](seq) {
+  g(args...[i]...);
+  }(seq<0>());
+}
+
+int
+main ()
+{
+  f(0);
+}
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C
new file mode 100644
index 000..c8a67ee16ed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C
@@ -0,0 +1,18 @@
+// PR c++/117937
+// { dg-do compile { target c++26 } }
+
+void operate_one(const int) {}
+
+template
+void operate_multi(T... args)
+{
+[&]()
+{
+   ::operate_one(args...[idx]);
+}.template operator()<0>();
+}
+
+int main()
+{
+::operate_multi(0);
+}

base-commit: 219ddae16f9d724baeff86934f8981aa5ef7b95f
-- 
2.47.1



Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-12-20 Thread Dmitry Ilvokhin
On Wed, Oct 02, 2024 at 08:15:38PM +0100, Jonathan Wakely wrote:
> On Wed, 2 Oct 2024 at 19:25, Jonathan Wakely  wrote:
> >
> > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> > >
> > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > > >
> > > > Instead of looping over every byte of the tail, unroll loop manually
> > > > using switch statement, then compilers (at least GCC and Clang) will
> > > > generate a jump table [1], which is faster on a microbenchmark [2].
> > > >
> > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > > >   loop using switch statement.
> > > >
> > > > Signed-off-by: Dmitry Ilvokhin 
> > > > ---
> > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > index 3665375096a..294a7323dd0 100644
> > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > @@ -50,10 +50,29 @@ namespace
> > > >load_bytes(const char* p, int n)
> > > >{
> > > >  std::size_t result = 0;
> > > > ---n;
> > > > -do
> > > > -  result = (result << 8) + static_cast(p[n]);
> > > > -while (--n >= 0);
> > >
> > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > only hash the first 8 bytes.
> >
> > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> 
> It seems to be slower for short strings, but a win overall:
> https://quick-bench.com/q/xhh5m1akZzwUAXRiYJ17z9FASc8
> This measures different lengths, and tries to ensure that the string
> contents aren't treated as constant.
>

I had a little bit time to play with this benchmark.

First off, there is no need to read first byte in the switch, because we
always know n > 0, this makes generated code look much better: GCC was
able to completely eliminate code for cases we don't actually have. See
difference in generated code for load_bytes_switch_baseline and
load_bytes_switch_load in [1].

Next, I suspected code alignment has a notable impact on the benchmark.
To confirm this, I added additional branch for case n == 1.  It doesn't
change generated code much, but adds two additional instructions: cmp
and je. Conveniently, this is enough to align benchmark to 64 byte
boundary. See benchmark code in [2]: sub (last instruction of the loop)
shifted from 0x407e08 (0x407e08 % 64 == 8) for LoadBytesSwitchLoad to
0x407f40 (0x407f40 % 64 == 0) for LoadBytesSwitchIf.

This makes LoadBytesSwitchIf faster than LoadBytesLoop (current trunk)
for every case. Honestly, I doubt it worth to optimize code alignment
for this particular case, it looks too fragile to me.

Version with load of the first byte out of the switch seems much more
robust, but it still slightly slower than a loop for case n = 1, see
benchmark in [3].

I am happy to submit modified version of the patch if you think
regression for n = 1 case is acceptable.

In anyway, now I have one less unexplained performance mystery on my
TODO list.

[1]: https://godbolt.org/z/vcWYs6o9x
[2]: https://quick-bench.com/q/QocQhela6YQzbsZWQukBRFix8ms
[3]: https://quick-bench.com/q/To93I-w6G1w5GwFz1Hi-XmPN57Y

> >
> >
> > >
> > > > +switch(n & 7)
> > > > +  {
> > > > +  case 7:
> > > > +   result |= std::size_t(p[6]) << 48;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 6:
> > > > +   result |= std::size_t(p[5]) << 40;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 5:
> > > > +   result |= std::size_t(p[4]) << 32;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 4:
> > > > +   result |= std::size_t(p[3]) << 24;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 3:
> > > > +   result |= std::size_t(p[2]) << 16;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 2:
> > > > +   result |= std::size_t(p[1]) << 8;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 1:
> > > > +   result |= std::size_t(p[0]);
> > > > +  };
> > > >  return result;
> > > >}
> > > >
> > > > --
> > > > 2.43.5
> > > >
> 


Re: [PATCH v1] RISC-V: Fix the the operand alignment for strided load/store pattern [NFC]

2024-12-20 Thread 钟居哲
ok



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-12-20 14:49
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Fix the the operand alignment for strided 
load/store pattern [NFC]
From: Pan Li 
 
Just notice the unalignment operand for strided load/store pattern when
bugfix the strided load/store memory alias, would like to make it align.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Align the operand for strided
load/store pattern.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2529dc77f22..88c0f00e0ea 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2903,7 +2903,7 @@ (define_expand "v3"
;; == Strided Load/Store
;; =
(define_expand "mask_len_strided_load_"
-  [(match_operand:V_VLS 0 "register_operand")
+  [(match_operand:V_VLS 0 "register_operand")
(match_operand   1 "pmode_reg_or_0_operand")
(match_operand   2 "pmode_reg_or_0_operand")
(match_operand:  3 "vector_mask_operand")
@@ -2919,7 +2919,7 @@ (define_expand "mask_len_strided_load_"
(define_expand "mask_len_strided_store_"
   [(match_operand   0 "pmode_reg_or_0_operand")
(match_operand   1 "pmode_reg_or_0_operand")
-   (match_operand:V_VLS 2 "register_operand")
+   (match_operand:V_VLS 2 "register_operand")
(match_operand:  3 "vector_mask_operand")
(match_operand   4 "autovec_length_operand")
(match_operand   5 "const_0_operand")]
-- 
2.43.0
 
 


Re: [PATCH] forwprop: Fix lane handling for VEC_PERM seqence blending

2024-12-20 Thread Richard Biener



> Am 19.12.2024 um 21:47 schrieb Christoph Müllner 
> :
> 
> In PR117830 a miscompilation of 464.h264ref was reported.
> An analysis showed that wrong code was generated because of
> unsatisfied assuptions in the code.  This patch addresses
> these issues.
> 
> The first assuption was that we can independently analyse the two
> vec-perms at the start of a vec-perm-simplify sequence and use the
> information  later for calculating a final vec-perm selector that
> utilizes less lanes.  However, this information does not help much,
> because for changing the selector entry, we need to ensure that both
> elements of the operand vectors v_1 and v_2 remain equal.
> This is addressed by removing the function get_vect_selector_index_map
> and checking for this equality in the loop where we create the new
> selector.
> 
> The calculation of the selector vector for the blended sequence
> assumed that the indices of the selector vector of the narrowed
> sequences are increasing.  This assuption does not hold in general.
> This was fixed by allowing a wrap-around when searching for an empty
> lane.
> 
> Further, there was an issue in the calculation of the selector vector
> entries for the second sequence.  The code did not consider that the
> lanes of the second sequence could have been moved.
> 
> A relevant property of this patch is, that it introduces a
> couple of nested loops, where the out loop iterates from
> i=0..nelts and the inner loop iterates from j=0..i.
> To avoid performance concerns, a check is introduced that
> ensures nelts won't exceed 4 lanes.
> 
> The added test case is derived from h264ref (the other cases from the
> benchmark have the same structure and don't provide additional coverage).
> 
> Bootstrapped and regression-tested on x86-64 and aarch64.
> Further, tested on CPU 2006 h264ref and CPU 2017 x264.

Ok

Thanks,
Richard 

>PR117830
> 
> gcc/ChangeLog:
> 
>* tree-ssa-forwprop.cc (get_vect_selector_index_map):
>(recognise_vec_perm_simplify_seq):
>(calc_perm_vec_perm_simplify_seqs):
>(process_vec_perm_simplify_seq_list):
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/tree-ssa/vector-11.c: New test.
> 
> Signed-off-by: Christoph Müllner 
> ---
> gcc/testsuite/gcc.dg/tree-ssa/vector-11.c |  38 
> gcc/tree-ssa-forwprop.cc  | 203 +-
> 2 files changed, 162 insertions(+), 79 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
> new file mode 100644
> index 000..e4102d318d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } 
> */
> +
> +typedef int vec __attribute__((vector_size (4 * sizeof (int;
> +
> +void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
> +{
> +  vec sel00 = { 2, 3, 2, 2 };
> +  vec sel01 = { 1, 0, 1, 1 };
> +  vec sel10 = { 3, 2, 3, 3 };
> +  vec sel11 = { 0, 1, 0, 0 };
> +  vec sel = { 0, 5, 2, 7 };
> +  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
> +  vec v_in = *p_v_in;
> +
> +  /* First vec perm sequence.  */
> +  v_1 = __builtin_shuffle (v_in, v_in, sel00);
> +  v_2 = __builtin_shuffle (v_in, v_in, sel01);
> +  v_x = v_2 - v_1;
> +  v_y = v_1 + v_2;
> +  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
> +
> +  /* Second vec perm sequence.  */
> +  v_1 = __builtin_shuffle (v_in, v_in, sel10);
> +  v_2 = __builtin_shuffle (v_in, v_in, sel11);
> +  v_x = v_2 - v_1;
> +  v_y = v_1 + v_2;
> +  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
> +
> +  /* Won't blend because the narrowed sequence
> + utilizes three of the four lanes.  */
> +
> +  *p_v_out_1 = v_out_1;
> +  *p_v_out_2 = v_out_2;
> +}
> +
> +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 2, 6 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index 7cae08f0d79..dae8c2f435b 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -3479,41 +3479,6 @@ fwprop_ssa_val (tree name)
>   return name;
> }
> 
> -/* Get an index map from the provided vector permute selector
> -   and return the number of unique indices.
> -   E.g.: { 1, 3, 1, 3 } -> <0, 1, 0, 1>, 2
> - { 0, 2, 0, 2 } -> <0, 1, 0, 1>, 2
> - { 3, 2, 1, 0 } -> <0, 1, 2, 3>, 4.  */
> -
> -static unsigned int
> -get_vect_selector_index_map (tree sel, vec *index_map)
> -{
> -  gcc_assert (VECTOR_CST_NELTS (sel).is_constant ());
> -  unsigned int nelts = VECTOR_CST_NELTS (sel).to_constant ();
> -  unsigned int n = 0;
> -
> -  for (unsigned int i = 0; i < nelts; i++)
> -{
> -  /* Extract the i-th value from the selector.  */
> -  tree sel_cst_tr

Re: [PATCH] ifcombine field merge: handle masks with sign extensions

2024-12-20 Thread Jakub Jelinek
On Wed, Dec 18, 2024 at 12:59:11AM -0300, Alexandre Oliva wrote:
>   * gcc.dg/field-merge-16.c: New.

Note the test FAILs on i686-linux or on x86_64-linux with -m32.  The IL
is identical before ifcombine, but -m64 to -m32 ifcombine dump shows
-optimizing two comparisons to (_10 & 18446462598732873728) == 0
-optimizing two comparisons to (_10 & 18446462600880357376) == 0
-Merging blocks 2 and 3
-Merging blocks 2 and 4
+optimizing bits or bits test to 576460752303423488 & T != 0
+with temporary T = (unsigned long long) _6 | (unsigned long long) _2
+Merging blocks 4 and 5
and similarly for 2 other functions.

Jakub



Re: [PATCH] Fix compilation error in vmsdbgout_begin_block on VMS targets

2024-12-20 Thread Richard Biener



> Am 20.12.2024 um 04:04 schrieb Mark Harmstone :
> 
> Commit 4ed189854eae ("Add block parameter to begin_block debug hook") changed
> the definition of the begin_block function pointer to add another parameter,
> but I missed a call in vmsdbgout_begin_block.
> 
> Fixes bug #118123.

Ok

> gcc/
>* vmsdbgout.cc (vmsdbgout_begin_block): Fix compilation error.
> ---
> gcc/vmsdbgout.cc | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/vmsdbgout.cc b/gcc/vmsdbgout.cc
> index d9e6a8b7b74..204e5695d39 100644
> --- a/gcc/vmsdbgout.cc
> +++ b/gcc/vmsdbgout.cc
> @@ -1231,10 +1231,10 @@ vmsdbgout_end_epilogue (unsigned int line, const char 
> *file)
> 
> static void
> vmsdbgout_begin_block (unsigned line, unsigned blocknum,
> -   tree block ATTRIBUTE_UNUSED)
> +   tree block)
> {
>   if (write_symbols == VMS_AND_DWARF2_DEBUG)
> -(*dwarf2_debug_hooks.begin_block) (line, blocknum);
> +(*dwarf2_debug_hooks.begin_block) (line, blocknum, block);
> 
>   if (debug_info_level > DINFO_LEVEL_TERSE)
> targetm.asm_out.internal_label (asm_out_file, BLOCK_BEGIN_LABEL, 
> blocknum);
> --
> 2.45.2
> 


Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-20 Thread Robin Dapp
>> How does encoding work for SVE's small mask modes?  I suppose
>>
>>   unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
>> GET_MODE_NUNITS (mode));
>>
>> is != 1 but rather adjusted so a byte is filled?
>
> It's 1 for VNx16BI, 2 for VNx8BI, and 4 for VNx4BI.  There are then
> respectively 8, 4, and 2 elements per byte.

For us elt_bits = 1 because e.g.
GET_MODE_PRECISION (V4BI) == GET_MODE_NUNITS (V4BI).
And, to be sure, I believe the problem I'm trying to address only concerns
V1BI, V2BI, and V4BI modes because they are smaller than a byte.  Not the
actually VLA modes.
>
>> For our mask modes anything else but zero padding makes no sense, so how
>> could we clarify this?
>
> Generally, we don't try to maintain the value of padding bits.  E.g.
> if we have a single-byte V4BI mode with 4 bits of padding, doing:
>
>   (set (reg:QI R1) (subreg:QI (reg:V4BI R2) 0))
>
> would not necessarily leave the upper 4 bits of R2 as zero, and so
> it would not be valid to optimise:
>
>   (set (reg:QI R1) (and:QI (subreg:QI (reg:V4BI R2) 0) (const_int 15)))
>
> to the move above.  Similarly, on SVE:
>
>   (set (reg:VNx8BI R1) (subreg:VNx8BI (reg:VNx4BI R2) 0))
>
> sets Nx4 of the R1 elements to undefined values.  The even elements are
> defined, the odd elements are not.
>
> That's why it seems like forcing the padding is masking a bug elsewhere.
> In general, things should work whatever value the padding bits have.

For RVV hardware we also don't care about the padding bytes.  As they are
packed at the beginning of a register and the vector length must be
correct we only access the bits corresponding to the vector length.
Everything beyond that is subject to a tail-policy and can be "anything".

But here I intended to just change the encoded value in memory and to
prevent it from aliasing with other encoded values.
In an actual instruction the values we "padded" (or that are beyond the
vector length) are ignored.  For the purpose of hashing a mask of
"1010" and "10101010" should be distinct which they are not right now
because we don't stop at the length so to say.

-- 
Regards
 Robin



Re: [PATCH 2/7]AArch64: Add SVE support for simd clones [PR96342]

2024-12-20 Thread Richard Sandiford
Tamar Christina  writes:
>> > +}
>> > +  cgraph_simd_clone *sc = node->simdclone;
>> > +
>> > +  for (unsigned i = 0; i < sc->nargs; ++i)
>> > +{
>> > +  bool is_mask = false;
>> > +  tree type;
>> > +  switch (sc->args[i].arg_type)
>> > +  {
>> > +  case SIMD_CLONE_ARG_TYPE_MASK:
>> > +is_mask = true;
>> > +gcc_fallthrough ();
>> > +  case SIMD_CLONE_ARG_TYPE_VECTOR:
>> > +  case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
>> > +  case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
>> > +type = sc->args[i].vector_type;
>> > +gcc_assert (VECTOR_TYPE_P (type));
>> > +if (node->simdclone->vecsize_mangle == 's')
>> > +  type = simd_clone_adjust_sve_vector_type (type, is_mask,
>> > +sc->simdlen);
>> > +else if (is_mask)
>> > +  type = truth_type_for (type);
>> 
>> Sorry, I have a horrible feeling I knew this once and have forgotten,
>> but: why do we need to this for non-SVE, when we didn't before?
>> 
>
> I don't think we do either.  For Adv. SIMD the truth type is the same as
> the vector type anyway so this is a no-op.  Removed.
>
>> 
>> I should have noticed this last time, sorry, but we don't seem to have
>> any coverage for the linear cases above.  Maybe that comes in a later
>> patch though.
>> 
>
> No, Though I tried to make some examples of linear cases.
> On C the vectorizer just ignores the pragma.

Like we discussed off-list, the above cases only trigger for references
that have a val modifier (or no modifier), such as, to steal a variant
of your example:

#define TYPE int
#pragma omp declare simd linear(n)
extern TYPE __attribute__ ((const)) fn0 (TYPE x, TYPE &n);
void test_fn0 (TYPE *__restrict a, TYPE *__restrict b, TYPE n)
{
  TYPE i;
#pragma omp simd linear(i)
  for (i = 0; i < n; ++i)
a[i] += fn0 (b[i], i);
}

And as you pointed out, the vectoriser doesn't handle those yet:

  case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_REF_VARIABLE_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_VARIABLE_STEP:
/* FORNOW */
i = -1;
break;

Since this series is only concerned with using externally-provided
simd implementations for vectorisation, the aarch64 cases above:

case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:

will be nugatory until the FORNOW is fixed.  But I went through some
#pragma omp declare simd examples locally, dumping the types that
aarch64_simd_clone_adjust produces, and I agree that they seem to be
correct.  In particular, the elements always seem to be uintptr_ts, as
expected.

So while it seems unfortunate to be committing code that has no
test coverage, let's go with it.

> In C++ with a linear reference we fail to vectorize because we hit the
> Safe_len being an int and VF being a poly thing again and so we bail out.
>
> I did manage to create a testcase that generates an ICE, but that's due to
> an existing bug in the vectorizer with how it registers masks.
>
> Since that's an existing bug I'm hoping that's not a blocker for this series.

The two linear cases that the vectoriser can handle -- non-reference
parameters, and reference parameters with ref qualifiers -- wouldn't be
affected by the code above.  The following ref case works for me locally:

#define TYPE long
#pragma omp declare simd linear(ref(n))
extern TYPE __attribute__ ((const)) fn0 (TYPE x, TYPE &n);
void test_fn0 (TYPE *__restrict a, TYPE *__restrict b, TYPE n)
{
  TYPE i;
#pragma omp simd linear(i)
  for (i = 0; i < n; ++i)
a[i] += fn0 (b[i], i);
}

as long as I disable:

  /* If the function isn't const, only allow it in simd loops where user
 has asserted that at least nunits consecutive iterations can be
 performed using SIMD instructions.  */
  if ((loop == NULL || maybe_lt ((unsigned) loop->safelen, nunits))
  && gimple_vuse (stmt))
return false;

although the generated code is awful :)  (Part of the problem seems
to be that .GOMP_SIMD_LANE as vectorised as ints rather than longs,
meaning that we need a doubled VF.  Advanced SIMD behaves similarly,
although the code quality is better.)

If I change:

#define TYPE long

to:

#define TYPE int

then I get an ICE in vect_get_loop_mask, which might be what you hit,
or might be different.  I think this case might be a backend bug.
lane_size includes:

  /* For non map-to-vector types that are pointers we use the element type it
 points to.  */
  if (POINTER_TYPE_P (type))
switch (clone_arg_type)
  {
  default:
break;
  case SIMD_CLONE_ARG_TYPE_UNIFORM:
  case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
  case SIMD_CLONE_ARG_TYPE_LINEAR_

Re: [PATCH 3/4] arm, testsuite: fix arm_v8_3a_fp16_complex_neon_ok

2024-12-20 Thread Christophe Lyon
On Thu, 19 Dec 2024 at 13:17, Christophe Lyon
 wrote:
>
> Without this patch, testcases using arm_v8_3a_fp16_complex_neon fail
> to compile on arm-linux-gnueabihf with
> fatal error: gnu/stubs-soft.h: No such file or directory
> because they are actually compiled with
> -mfloat-abi=softfp -mfpu=auto -mcpu=unset -march=armv8.3-a+fp16
>
> Fix this by including stdint.h in the sample code for the effective-target.
>
> This makes these tests PASS instead of being UNRESOLVED:
> fast-math-bb-slp-complex-add-half-float.c
> fast-math-bb-slp-complex-mla-half-float.c
> fast-math-bb-slp-complex-mls-half-float.c
> fast-math-bb-slp-complex-mul-half-float.c
> fast-math-complex-add-half-float.c
> fast-math-complex-mla-half-float.c
> fast-math-complex-mls-half-float.c
> fast-math-complex-mul-half-float.c
>
> except for two new
> FAIL: gcc.dg/vect/complex/fast-math-complex-mls-half-float.c scan-tree-dump 
> vect "Found COMPLEX_ADD_ROT270"
> (and the same with -flto -ffat-lto-objects)
>

Actually, as Linaro CI shows, this is not quite true: the CI configures GCC
--with-float=hard --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
and the test passes.

In my manual testing I'm using
--with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb
--with-tune=cortex-a9 --with-arch=armv7-a

and fast-math-complex-mls-half-float.c is compiled with:
-mfpu=neon -mfloat-abi=hard -mfpu=auto -mcpu=unset -march=armv8.3-a+simd

I'm not sure where the difference comes from...

Christophe

> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp
> (check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache):
> Include stdint.h.
> ---
>  gcc/testsuite/lib/target-supports.exp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index a16e9534ccd..9f4e2700dd2 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -13298,6 +13298,7 @@ proc 
> check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache { } {
> #if !defined (__ARM_FEATURE_COMPLEX)
> #error "__ARM_FEATURE_COMPLEX not defined"
> #endif
> +   #include 
> } "$flags -mcpu=unset -march=armv8.3-a+fp16"] } {
> set et_arm_v8_3a_fp16_complex_neon_flags \
> "$flags -mcpu=unset -march=armv8.3-a+fp16"
> --
> 2.34.1
>


Re: [PATCH] RISC-V: List valid -mtune options only once

2024-12-20 Thread Kito Cheng
LGTM

Christoph Müllner  於 2024年12月20日 週五 04:13 寫道:

> This patch ensures that the list of valid -mtune options
> does not contain entries more than once.
> The -mtune option accepts CPU identifiers as well as
> tuning identifiers and there are cases where a CPU and
> its tuning have the same identifier.
>
> PR116347
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_get_valid_option_values):
> Skip adding mtune entries that are already in the list.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/common/config/riscv/riscv-common.cc | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc
> b/gcc/common/config/riscv/riscv-common.cc
> index 4c9a72d1180..2f85bb21a4c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -2437,7 +2437,19 @@ riscv_get_valid_option_values (int option_code,
>
> const riscv_cpu_info *cpu_info = &riscv_cpu_tables[0];
> for (;cpu_info->name; ++cpu_info)
> - v.safe_push (cpu_info->name);
> + {
> +   /* Skip duplicates.  */
> +   bool skip = false;
> +   int i;
> +   const char *str;
> +   FOR_EACH_VEC_ELT (v, i, str)
> + {
> +   if (!strcmp (str, cpu_info->name))
> + skip = true;
> + }
> +   if (!skip)
> + v.safe_push (cpu_info->name);
> + }
>}
>break;
>  case OPT_mcpu_:
> --
> 2.47.1
>
>


[PATCH] c++, v3: Fix ICEs with large initializer lists or ones including #embed [PR118124]

2024-12-20 Thread Jakub Jelinek
On Thu, Dec 19, 2024 at 07:01:39PM +0100, Jakub Jelinek wrote:
> So far lightly tested, ok for trunk this way if it passes bootstrap & testing?

Bootstrap/regtest found an issue, warning about
  if ()
for ()
  if ()
  else if ()
  else
so I've added {}s around it (no other changes from the previous patch).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-12-20  Jakub Jelinek  

PR c++/118124
* call.cc (convert_like_internal): Handle RAW_DATA_CST in
ck_list handling.  Formatting fixes.

* g++.dg/cpp/embed-15.C: New test.
* g++.dg/cpp/embed-16.C: New test.
* g++.dg/cpp0x/initlist-opt3.C: New test.
* g++.dg/cpp0x/initlist-opt4.C: New test.

--- gcc/cp/call.cc.jj   2024-12-11 17:27:52.481221310 +0100
+++ gcc/cp/call.cc  2024-12-19 18:50:52.478315892 +0100
@@ -8766,8 +8766,8 @@ convert_like_internal (conversion *convs
 
if (tree init = maybe_init_list_as_array (elttype, expr))
  {
-   elttype = cp_build_qualified_type
- (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST);
+   elttype = cp_build_qualified_type (elttype, cp_type_quals (elttype)
+   | TYPE_QUAL_CONST);
array = build_array_of_n_type (elttype, len);
array = build_vec_init_expr (array, init, complain);
array = get_target_expr (array);
@@ -8775,13 +8775,85 @@ convert_like_internal (conversion *convs
  }
else if (len)
  {
-   tree val; unsigned ix;
-
+   tree val;
+   unsigned ix;
tree new_ctor = build_constructor (init_list_type_node, NULL);
 
/* Convert all the elements.  */
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), ix, val)
  {
+   if (TREE_CODE (val) == RAW_DATA_CST)
+ {
+   tree elt_type;
+   conversion *next;
+   /* For conversion to initializer_list or
+  initializer_list or initializer_list
+  we can optimize and keep RAW_DATA_CST with adjusted
+  type if we report narrowing errors if needed, for
+  others this converts each element separately.  */
+   if (convs->u.list[ix]->kind == ck_std
+   && (elt_type = convs->u.list[ix]->type)
+   && (TREE_CODE (elt_type) == INTEGER_TYPE
+   || is_byte_access_type (elt_type))
+   && TYPE_PRECISION (elt_type) == CHAR_BIT
+   && (next = next_conversion (convs->u.list[ix]))
+   && next->kind == ck_identity)
+ {
+   if (!TYPE_UNSIGNED (elt_type)
+   && (TYPE_UNSIGNED (TREE_TYPE (val))
+   || (TYPE_PRECISION (TREE_TYPE (val))
+   > CHAR_BIT)))
+ for (int i = 0; i < RAW_DATA_LENGTH (val); ++i)
+   {
+ if (RAW_DATA_SCHAR_ELT (val, i) >= 0)
+   continue;
+ else if (complain & tf_error)
+   {
+ location_t loc
+   = cp_expr_loc_or_input_loc (val);
+ int savederrorcount = errorcount;
+ permerror_opt (loc, OPT_Wnarrowing,
+"narrowing conversion of "
+"%qd from %qH to %qI",
+RAW_DATA_UCHAR_ELT (val, i),
+TREE_TYPE (val), elt_type);
+ if (errorcount != savederrorcount)
+   return error_mark_node;
+   }
+ else
+   return error_mark_node;
+   }
+   tree sub = copy_node (val);
+   TREE_TYPE (sub) = elt_type;
+   CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_ctor),
+   NULL_TREE, sub);
+ }
+   else
+ {
+   for (int i = 0; i < RAW_DATA_LENGTH (val); ++i)
+ {
+   tree elt
+ = build_int_cst (TREE_TYPE (val),
+  RAW_DATA_UCHAR_ELT (val, i));
+   tree sub
+ = convert_like (convs->u.list[ix], elt,
+ fn, argnum, false, false,
+   

Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-20 Thread Richard Sandiford
"Robin Dapp"  writes:
> But here I intended to just change the encoded value in memory and to
> prevent it from aliasing with other encoded values.
> In an actual instruction the values we "padded" (or that are beyond the
> vector length) are ignored.  For the purpose of hashing a mask of
> "1010" and "10101010" should be distinct which they are not right now
> because we don't stop at the length so to say.

But what modes do "1010" and "10101010" represent here?  And which code
treats them as equal?  (It should be ok if the hashes are equal, as long
as the equality function distinguishes them.)

Richard


Re: [PATCH] arm: [MVE intrinsics] Fix moves of tuples (PR target/118131)

2024-12-20 Thread Richard Earnshaw (lists)
On 19/12/2024 16:45, Christophe Lyon wrote:
> Commit r15-6245-g4f4e13dd235b introduced new modes for MVE tuples, but
> missed adding support for them in a few places.
> 
> Adding them to the list in arm_attr_length_move_neon is not sufficient
> since we later face another ICE where the compiler does not know how
> to split move of such data.
> 
> The patch therefore enhances the define_splits for OI and XI moves in
> neon.md, via the introduction of new iterators.
> 
> In addition, it seems consistent to update output_move_neon such that
> VALID_NEON_*_MODE are used only when TARGET_NEON.
> 
> gcc/ChangeLog:
> 
>   PR target/118131
>   * config/arm/arm.cc (output_move_neon): Check TARGET_NEON as
>   needed.
>   (arm_attr_length_move_neon): Add support for V2x and V4x MVE tuple
>   modes.
>   * config/arm/iterators.md (VSTRUCT2, VSTRUCT4): New.
>   * config/arm/neon.md: Use VSTRUCT2 instead of OI and VSTRUCT4
>   instead of XI in define_split.

OK.

R.



Re: [PATCH] RISC-V: vector absolute difference expander [PR117722]

2024-12-20 Thread Andrew Pinski
On Fri, Dec 20, 2024 at 2:14 PM Vineet Gupta  wrote:
>
> This improves codegen for x264 sum of absolute difference routines.
> The insn count is same, but we avoid double widening ops and ensuing
> whole register moves.
>
> Also for more general applicability, we chose to implement abs diff
> vs. the sum of abs diff variant.
>
> Suggested-by: Robin Dapp 
> Co-developed-by: Pan Li 

I think we use Co-Authored-By rather than Co-developed-by.

Thanks,
Andrew Pinski


> Signed-off-by: Vineet Gupta 
>
> PR target/117722
>
> gcc/ChangeLog:
> * config/riscv/autovec.md: Add uabd expander.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr117722.c: New test.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/autovec.md   | 26 +++
>  .../gcc.target/riscv/rvv/autovec/pr117722.c   | 23 
>  2 files changed, 49 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 2529dc77f221..4678906fb918 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -2928,3 +2928,29 @@
>  riscv_vector::expand_strided_store (mode, operands);
>  DONE;
>})
> +
> +; 
> +; == Absolute difference (not including sum)
> +; 
> +(define_expand "uabd3"
> +  [(match_operand:V_VLSI 0 "register_operand")
> +   (match_operand:V_VLSI 1 "register_operand")
> +   (match_operand:V_VLSI 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +rtx max = gen_reg_rtx (mode);
> +insn_code icode = code_for_pred (UMAX, mode);
> +rtx ops1[] = {max, operands[1], operands[2]};
> +riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
> +
> +rtx min = gen_reg_rtx (mode);
> +icode = code_for_pred (UMIN, mode);
> +rtx ops2[] = {min, operands[1], operands[2]};
> +riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
> +
> +icode = code_for_pred (MINUS, mode);
> +rtx ops3[] = {operands[0], max, min};
> +riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
> +
> +DONE;
> +  });
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
> new file mode 100644
> index ..c633f31ef25b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O2" } */
> +
> +/* Generate sum of absolute difference as sub (max, min).
> +   This helps with x264 sad routines.  */
> +
> +inline int abs(int i)
> +{
> +  return (i < 0 ? -i : i);
> +}
> +
> +int pixel_sad_n(unsigned char *pix1, unsigned char *pix2, int n)
> +{
> +  int sum = 0;
> +  for( int i = 0; i < n; i++ )
> +   sum += abs(pix1[x] - pix2[x]);
> +
> +  return sum;
> +}
> +
> +/* { dg-final { scan-assembler {vmin\.v} } } */
> +/* { dg-final { scan-assembler {vmax\.v} } } */
> +/* { dg-final { scan-assembler {vsub\.v} } } */
> --
> 2.43.0
>


Re: [PATCH] RISC-V: vector absolute difference expander [PR117722]

2024-12-20 Thread Vineet Gupta
On 12/20/24 17:16, Andrew Pinski wrote:
> On Fri, Dec 20, 2024 at 2:14 PM Vineet Gupta  wrote:
>> This improves codegen for x264 sum of absolute difference routines.
>> The insn count is same, but we avoid double widening ops and ensuing
>> whole register moves.
>>
>> Also for more general applicability, we chose to implement abs diff
>> vs. the sum of abs diff variant.
>>
>> Suggested-by: Robin Dapp 
>> Co-developed-by: Pan Li 
> I think we use Co-Authored-By rather than Co-developed-by.

Fixed locally.

Thx,
-Vineet


Re: [PATCH] add options to control ifcombine

2024-12-20 Thread Alexandre Oliva
On Dec 19, 2024, Richard Biener  wrote:

> Please don't use -1 initializers, instead populate
> opts.cc:default_options_table.

ACK.  This trick (that I used) will be hard to unlearn ;-)

> IMO options where it's not clear how they interact are bad.

*nod*.  I struggled a bit with that.

In the end, I couldn't get any evidence that ifcombine was the culprit
by the slowdowns, so I'm inclined to withdraw the patch and leave it at
that.  We have an option to disable individual passes already.

If we find evidence that limiting the attempts to combine noncontiguous
blocks is needed, I'll be happy to introduce a param for that.

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] c++: Avoid infinite recursion when deducing template arguments for invalid code [PR118078]

2024-12-20 Thread Simon Martin
Hi,

On 20 Dec 2024, at 20:37, Simon Martin wrote:

> We currently fail due to "infinite recursion" on the following invalid
> code with -std=c++20 and above
>
> === cut here ===
> template  struct S { struct U { const S s; } u; };
> S t{2};
> === cut here ===
>
> The problem is that reshape_init_class for S calls reshape_init_r for
> its field S::u, that calls reshape_init_class for S::U, that calls
> reshape_init_r for field S::U::s that calls reshape_init_class for its
> type S, etc.
>
> This patch fixes the issue by erroring out in reshape_init_class if we
> detect that we're about to call reshape_init_r for a type that's part 
> of
> our "TYPE_CONTEXT chain".
>
> An alternative was to change the check in grokdeclarator that rejects
> fields with an incomplete type to check for the field type's
> TYPE_BEING_DECLARED (which sounds like the right thing to do), however
> erroring for the definition of U::s breaks a bunch of existing test
> cases, and it would also make us diverge from clang and MSVC (but 
> behave
> like EDG) - see https://godbolt.org/z/hT8d1GWa3. It feels to me like 
> we
> don't want that (I'm happy to revisit if people think it's OK).
>
> Successfully tested on x86_64-pc-linux-gnu.
>
>   PR c++/118078
>
> gcc/cp/ChangeLog:
>
>   * decl.cc (reshape_init_class): Don't trigger infinite recursion
>   for invalid "recursive types".
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp1z/class-deduction118.C: New test.
>   * g++.dg/cpp2a/class-deduction-aggr16.C: New test.
>
> ---
>  gcc/cp/decl.cc| 21 
> ---
>  .../g++.dg/cpp1z/class-deduction118.C |  8 +++
>  .../g++.dg/cpp2a/class-deduction-aggr16.C |  7 +++
>  3 files changed, 33 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
>  create mode 100644 
> gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
>
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 42e83f880f9..ea55ea4c0e5 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -7315,9 +7315,24 @@ reshape_init_class (tree type, reshape_iter *d, 
> bool first_initializer_p,
> d->cur++;
>   }
>else
> - field_init = reshape_init_r (TREE_TYPE (field), d,
> -  /*first_initializer_p=*/NULL_TREE,
> -  complain);
> + {
> +   /* Make sure that we won't be calling ourselves recursively, which
> +  could happen with "recursive template types" (PR c++/118078).  
> */
> +   tree ctx = TYPE_CONTEXT (type);
> +   while (ctx && CLASS_TYPE_P (ctx))
> + {
> +   if (ctx == TYPE_MAIN_VARIANT (TREE_TYPE (field))) {
Of course this should be same_type_p (cxx, TYPE_MAIN_VARIANT (TREE_TYPE 
(field)))...

I’m running an extra round of testing with this. Ok assuming the 
testing comes back green?

> +   if (complain & tf_error)
> + error ("field %qD has incomplete type %qT", field,
> +TREE_TYPE (field));
> +   return error_mark_node;
> +   }
> +   ctx = TYPE_CONTEXT (ctx);
> + }
> +   field_init = reshape_init_r (TREE_TYPE (field), d,
> +/*first_initializer_p=*/NULL_TREE,
> +complain);
> + }
>
>if (field_init == error_mark_node)
>   return error_mark_node;
> diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C 
> b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
> new file mode 100644
> index 000..b14d62df793
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
> @@ -0,0 +1,8 @@
> +// PR c++/118078
> +// { dg-do "compile" { target c++11 } }
> +
> +template 
> +struct S { struct U { const S s; } u; };
> +S t{2};   // { dg-error "invalid use of template-name 'S' without an 
> argument list" "" { target c++14_down } }
> +   // { dg-error "class template argument deduction failed" "" { 
> target c++17 } .-1 }
> +   // { dg-error "no matching function for call to 'S\\\(int\\\)'" "" 
> { target c++17 } .-2 }
> diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C 
> b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
> new file mode 100644
> index 000..feab11927c1
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
> @@ -0,0 +1,7 @@
> +// PR c++/118078
> +// { dg-do "compile" { target c++20 } }
> +
> +template 
> +struct S { const struct U { struct V { volatile S s; } v; } u; };
> +S t{2};   // { dg-error "class template argument deduction failed" }
> +   // { dg-error "no matching function for call to 'S\\\(int\\\)'" "" 
> { target *-*-* } .-1 }
> -- 
> 2.44.0



Re: [PATCH] Add RISC-V/rv64gc as a secondary platform

2024-12-20 Thread Jeff Law




On 12/20/24 9:48 AM, Palmer Dabbelt wrote:

On Thu, 19 Dec 2024 15:01:26 PST (-0800), jeffreya...@gmail.com wrote:



On 12/19/24 3:08 PM, Palmer Dabbelt wrote:


I agree lacking B and V makes us very clearly uncompetitive in the space
where these sort of things matter (ie, binary compatible distros and
long term stability type things) -- the gap is just too big to close by
doing clever things in the hardware.  Maybe just B and V isn't enough,
it's hard to tell, but lacking them seems pretty clearly uncompetitive.

I'm not sure B is so scary on the SW side of things, it's been mostly
performance issues we've been fixing.  V is huge, though, and we've
generally found a bunch of V-related functional codegen bugs.  Without
reliable hardware to test against (and do distro builds and such) it
just seems premature to declare that being as stable as the other ports
on the list.

It wouldn't take much to push me into agreeing to B -- it's not scary in
any way.  There's just notable systems out there that don't implement B,
but I wouldn't mind leaving them behind for this change.

V has real performance concerns.  I haven't tested it performance-wise
on the BPI recently, but when I last did the general rule of thumb was
the more vector you did, the worse it performed *especially* for FP.


Yep.  TBH that's actually my biggest worry here: it might be that just V 
isn't enough, which means we have another few generations of HW before 
all the V add-on extensions coalesce into something that's widely 
implemented.  I'm not really sure there, though -- the HW guys can be 
pretty clever so maybe V gets us close enough to be interesting.
Well, I think Robin and I would argue that V can be made to perform.  We 
see that with our design.  But that's because Robin & others have been 
focused on the performance for months.  The gains that have been made 
have been significant and have brought the design & codegen to a point 
where it performs largely at expectation.  But it's also the case that 
our design is not generally available



I don't doubt significant gains could be made that would make V behave 
better on widely available designs.  They don't have the option of uarch 
adjustments, so the upside is limited by that.  But with a good 
understanding of the uarch and good PMU data significant gains almost 
certainly could be made.


Point being I do think V is viable and will continue to improve as the 
feedback loops become better established.


jeff


[COMMITTED 2/3] Fortran: Use the present tense for the manual.

2024-12-20 Thread Sandra Loosemore
The present tense is preferred for expressing facts or enduring
behavior.  Thus we should say "option X does Y" instead of "option X
will do Y", reserving the future tense for things that happen at some
later time (such as in a future release of GCC, or at run time as
explicitly contrasted with compile time).

This set of edits is largely mechanical substitution of phrasing
involving "will".  I also fixed a few more markup problems noted while
editing nearby text and fixed a few instances of awkward wording.

gcc/fortran/ChangeLog
* gfortran.texi: Use the present tense throughout; fix some
markup issues and awkward wording.
* invoke.texi: Likewise.
---
 gcc/fortran/gfortran.texi | 141 +++--
 gcc/fortran/invoke.texi   | 143 --
 2 files changed, 147 insertions(+), 137 deletions(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 77c5e5235f9..5b27a6dbe30 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -634,7 +634,7 @@ The default value is 0.
 
 This environment variable controls whether all I/O is unbuffered.  If
 the first letter is @samp{y}, @samp{Y} or @samp{1}, all I/O is
-unbuffered.  This will slow down small sequential reads and writes.  If
+unbuffered.  This slows down small sequential reads and writes.  If
 the first letter is @samp{n}, @samp{N} or @samp{0}, I/O is buffered.
 This is the default.
 
@@ -644,7 +644,7 @@ This is the default.
 The environment variable named @env{GFORTRAN_UNBUFFERED_PRECONNECTED} controls
 whether I/O on a preconnected unit (i.e.@: STDOUT or STDERR) is unbuffered.  If
 the first letter is @samp{y}, @samp{Y} or @samp{1}, I/O is unbuffered.  This
-will slow down small sequential reads and writes.  If the first letter
+slows down small sequential reads and writes.  If the first letter
 is @samp{n}, @samp{N} or @samp{0}, I/O is buffered.  This is the default.
 
 @node GFORTRAN_SHOW_LOCUS
@@ -674,6 +674,7 @@ be sure to quote spaces, as in
 @smallexample
 $ GFORTRAN_LIST_SEPARATOR='  ,  ' ./a.out
 @end smallexample
+@noindent
 when @command{a.out} is the compiled Fortran program that you want to run.
 Default is a single space.
 
@@ -753,7 +754,7 @@ setting a default data representation for the whole 
program.  The
 @code{CONVERT} specifier overrides the @option{-fconvert} compile options.
 
 Note that the values specified via the @env{GFORTRAN_CONVERT_UNIT}
-environment variable will override the @code{CONVERT} specifier in the
+environment variable override the @code{CONVERT} specifier in the
 @code{OPEN} statement.  This is to give control over data formats to
 users who do not have the source code of their program available.
 
@@ -889,7 +890,7 @@ See also @ref{Argument passing conventions} and 
@ref{Interoperability with C}.
 The Fortran standard does not require the compiler to evaluate all parts of an
 expression, if they do not contribute to the final result.  For logical
 expressions with @code{.AND.} or @code{.OR.} operators, in particular, GNU
-Fortran will optimize out function calls (even to impure functions) if the
+Fortran optimizes out function calls (even to impure functions) if the
 result of the expression can be established without them.  However, since not
 all compilers do that, and such an optimization can potentially modify the
 program flow and subsequent results, GNU Fortran throws warnings for such
@@ -1018,17 +1019,17 @@ where file metadata is written to the directory lazily. 
This means
 that, for instance, the @code{dir} command can show a stale size for a
 file. One can force a directory metadata update by closing the unit,
 or by calling @code{_commit} on the file descriptor. Note, though,
-that @code{_commit} will force all dirty data to stable storage, which
+that @code{_commit} forces all dirty data to stable storage, which
 is often a very slow operation.
 
 The Network File System (NFS) implements a relaxed consistency model
 called open-to-close consistency. Closing a file forces dirty data and
 metadata to be flushed to the server, and opening a file forces the
 client to contact the server in order to revalidate cached
-data. @code{fsync} will also force a flush of dirty data and metadata
+data. @code{fsync} also forces a flush of dirty data and metadata
 to the server. Similar to @code{open} and @code{close}, acquiring and
-releasing @code{fcntl} file locks, if the server supports them, will
-also force cache validation and flushing dirty data and metadata.
+releasing @code{fcntl} file locks, if the server supports them,
+also forces cache validation and flushing dirty data and metadata.
 
 
 @node Files opened without an explicit ACTION= specifier
@@ -1058,16 +1059,16 @@ symbolic links, on systems that support them.
 
 @item Results of @code{INQUIRE} statements of the ``inquire by file'' form
 relate to the target of the symbolic link. For example,
-@code{INQUIRE(FILE="foo",EXIST=ex)} will set @va

[COMMITTED] Fortran: Clean up -funderscoring and -fsecond-underscore docs [PR51820]

2024-12-20 Thread Sandra Loosemore
This is a long-standing documentation bug in the Fortran manual,
initially reported in 2012 as PR51820, with a quick fix applied later
for PR109216.  The patch here incorporates more of the discussion from
the original issue.

gcc/fortran/ChangeLog
PR fortran/51820
PR fortran/89632
PR fortran/109216
* invoke.texi (Code Gen Options): Further cleanups of the discussion
of what -funderscoring and -fsecond-underscore do.
---
 gcc/fortran/invoke.texi | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index ff4040732d8..f88a9b8252f 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1664,6 +1664,8 @@ source file by appending underscores to them.
 With @option{-funderscoring} in effect, GNU Fortran appends one
 underscore to external names.  This is done to ensure
 compatibility with code produced by many UNIX Fortran compilers.
+Note this does not apply to names declared with C binding, or within
+a module.
 
 @emph{Caution}: The default behavior of GNU Fortran is
 incompatible with @command{f2c} and @command{g77}, please use the
@@ -1678,12 +1680,12 @@ and so on).
 
 For example, with @option{-funderscoring}, and assuming that @code{j()} and
 @code{max_count()} are external functions while @code{my_var} and
-@code{lvar} are local variables, a statement like
+@code{lvar} are local variables, a Fortran statement like
 @smallexample
 I = J() + MAX_COUNT (MY_VAR, LVAR)
 @end smallexample
 @noindent
-is implemented as something akin to:
+is implemented as something akin to the C code:
 @smallexample
 i = j_() + max_count_(&my_var, &lvar);
 @end smallexample
@@ -1715,11 +1717,10 @@ could make finding unresolved-reference bugs quite 
difficult in some
 cases---they might occur at program run time, and show up only as
 buggy behavior at run time.
 
-In future versions of GNU Fortran we hope to improve naming and linking
-issues so that debugging always involves using the names as they appear
-in the source, even if the names as seen by the linker are mangled to
-prevent accidental linking between procedures with incompatible
-interfaces.
+@xref{Naming and argument-passing conventions}, for more information.
+Also note that declaring symbols as @code{bind(C)} is a more robust way to
+interface with code written in other languages or compiled with different
+Fortran compilers than the command-line options documented in this section.
 
 @opindex fsecond-underscore
 @cindex underscore
@@ -1731,21 +1732,19 @@ interfaces.
 @cindex libf2c calling convention
 @item -fsecond-underscore
 By default, GNU Fortran appends an underscore to external
-names.  If this option is used GNU Fortran appends two
-underscores to names with underscores and one underscore to external names
-with no underscores.  GNU Fortran also appends two underscores to
-internal names with underscores to avoid naming collisions with external
-names.
+names.  If this option is used, GNU Fortran appends two
+underscores to names with underscores and one underscore to names
+with no underscores.
 
-This option has no effect if @option{-fno-underscoring} is
-in effect.  It is implied by the @option{-ff2c} option.
-
-Otherwise, with this option, an external name such as @code{MAX_COUNT}
+For example, an external name such as @code{MAX_COUNT}
 is implemented as a reference to the link-time external symbol
 @code{max_count__}, instead of @code{max_count_}.  This is required
 for compatibility with @command{g77} and @command{f2c}, and is implied
 by use of the @option{-ff2c} option.
 
+This option has no effect if @option{-fno-underscoring} is
+in effect.  It is implied by the @option{-ff2c} option.
+
 @opindex fcoarray
 @cindex coarrays
 @item -fcoarray=@var{}
-- 
2.25.1



[COMMITTED 0/3] Incremental cleanups to Fortran manual

2024-12-20 Thread Sandra Loosemore
I've done some cleanups on the Fortran manual.  None of the patches in
this series are intended to have any semantic effect -- just clean up
markup, grammar, spelling, punctuation, etc.  I didn't make it through
through all the coarray and intrinsics documentation yet, but this should
be at least an incremental improvement.

-Sandra


Sandra Loosemore (3):
  Fortran: Fixes for markup, typos, and indexing in manual
  Fortran: Use the present tense for the manual.
  Fortran: Fix hyphenation errors in the manual

 gcc/fortran/gfortran.texi | 431 --
 gcc/fortran/invoke.texi   | 257 ---
 2 files changed, 368 insertions(+), 320 deletions(-)

-- 
2.25.1



[COMMITTED 1/3] Fortran: Fixes for markup, typos, and indexing in manual

2024-12-20 Thread Sandra Loosemore
While working on something else I noticed there were numerous places
in the GNU Fortran manual with incorrect/missing Texinfo markup.  I
made a pass through about the first third of the manual (not yet the
coarray API or intrinsics documentation) to fix at least some of those
issues, plus some typos and missing @cindex entries.

There shouldn't be any semantic changes to the documentation in this patch.

gcc/fortran/ChangeLog
* gfortran.texi: Fix markup, typos, and indexing throughout the
file.
* invoke.texi: Likewise.
---
 gcc/fortran/gfortran.texi | 236 +-
 gcc/fortran/invoke.texi   |  92 ---
 2 files changed, 183 insertions(+), 145 deletions(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 8aedf46680d..77c5e5235f9 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -237,7 +237,7 @@ The GNU Fortran compiler is the successor to @command{g77}, 
the
 Fortran 77 front end included in GCC prior to version 4 (released in
 2005).  While it is backward-compatible with most @command{g77}
 extensions and command-line options, @command{gfortran} is a completely new
-implemention designed to support more modern dialects of Fortran.
+implementation designed to support more modern dialects of Fortran.
 GNU Fortran implements the Fortran 77, 90 and 95 standards
 completely, most of the Fortran 2003 and 2008 standards, and some
 features from the 2018 standard.  It also implements several extensions
@@ -752,9 +752,9 @@ data representation for unformatted files.  @xref{Runtime 
Options}, for
 setting a default data representation for the whole program.  The
 @code{CONVERT} specifier overrides the @option{-fconvert} compile options.
 
-@emph{Note that the values specified via the GFORTRAN_CONVERT_UNIT
-environment variable will override the CONVERT specifier in the
-open statement}.  This is to give control over data formats to
+Note that the values specified via the @env{GFORTRAN_CONVERT_UNIT}
+environment variable will override the @code{CONVERT} specifier in the
+@code{OPEN} statement.  This is to give control over data formats to
 users who do not have the source code of their program available.
 
 @node GFORTRAN_ERROR_BACKTRACE
@@ -818,7 +818,7 @@ might in some way or another become visible to the 
programmer.
 
 
 @node KIND Type Parameters
-@section KIND Type Parameters
+@section @code{KIND} Type Parameters
 @cindex kind
 
 The @code{KIND} type parameters supported by GNU Fortran for the primitive
@@ -866,7 +866,7 @@ the kind parameters of the @ref{ISO_C_BINDING} module 
should be used.
 
 
 @node Internal representation of LOGICAL variables
-@section Internal representation of LOGICAL variables
+@section Internal representation of @code{LOGICAL} variables
 @cindex logical, variable representation
 
 The Fortran standard does not specify how variables of @code{LOGICAL}
@@ -897,7 +897,7 @@ situations with the @option{-Wfunction-elimination} flag.
 
 
 @node MAX and MIN intrinsics with REAL NaN arguments
-@section MAX and MIN intrinsics with REAL NaN arguments
+@section @code{MAX} and @code{MIN} intrinsics with @code{REAL} NaN arguments
 @cindex MAX, MIN, NaN
 
 The Fortran standard does not specify what the result of the
@@ -970,7 +970,7 @@ Fortran programmer can use the intrinsic @code{FNUM} to 
retrieve the
 low level file descriptor corresponding to an open Fortran unit. Then,
 using e.g. the @code{ISO_C_BINDING} feature, one can call the
 underlying system call to flush dirty data to stable storage, such as
-@code{fsync} on POSIX, @code{_commit} on MingW, or @code{fcntl(fd,
+@code{fsync} on POSIX, @code{_commit} on MinGW, or @code{fcntl(fd,
 F_FULLSYNC, 0)} on macOS. The following example shows how to call
 fsync:
 
@@ -1032,7 +1032,7 @@ also force cache validation and flushing dirty data and 
metadata.
 
 
 @node Files opened without an explicit ACTION= specifier
-@section Files opened without an explicit ACTION= specifier
+@section Files opened without an explicit @code{ACTION=} specifier
 @cindex open, action
 
 The Fortran standard says that if an @code{OPEN} statement is executed
@@ -1056,7 +1056,7 @@ symbolic links, on systems that support them.
 
 @itemize
 
-@item Results of INQUIRE statements of the ``inquire by file'' form will
+@item Results of @code{INQUIRE} statements of the ``inquire by file'' form
 relate to the target of the symbolic link. For example,
 @code{INQUIRE(FILE="foo",EXIST=ex)} will set @var{ex} to @var{.true.} if
 @var{foo} is a symbolic link pointing to an existing file, and @var{.false.}
@@ -1088,11 +1088,11 @@ Each subrecord consists of a leading record marker, the 
data written
 by the user program, and a trailing record marker.  The record markers
 are four-byte integers by default, and eight-byte integers if the
 @option{-fmax-subrecord-length=8} option (which exists for backwards
-compability only) is in effect.
+compatibility only) is in effect.
 
 The repr

[COMMITTED 3/3] Fortran: Fix hyphenation errors in the manual

2024-12-20 Thread Sandra Loosemore
When looking through the gfortran manual, I noted some problems with
hyphens being used where they're not correct or necessary,
e.g. "non-standard" vs "nonstandard", "null-pointer" vs "null pointer"
(as a noun), etc.  I've made a pass through the documentation to
correct at least some of those uses.

gcc/fortran/ChangeLog
* gfortran.texi: Get rid of some unnecessary hyphens throughout
the file.
* invoke.texi: Likewise.
---
 gcc/fortran/gfortran.texi | 58 +++
 gcc/fortran/invoke.texi   | 24 
 2 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 5b27a6dbe30..2838702b64b 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -163,7 +163,7 @@ the GNU Fortran compiler.  You can find in this manual how 
to invoke
 
 @ifset DEVELOPMENT
 @emph{Warning:} This document, and the compiler it describes, are still
-under development.  While efforts are made to keep it up-to-date, it might
+under development.  While efforts are made to keep it up to date, it might
 not accurately reflect the status of the most recent GNU Fortran compiler.
 @end ifset
 
@@ -1163,7 +1163,7 @@ sytems, such as Linux, it is necessary to specify 
@option{-pthread},
 Integer overflow is prohibited by the Fortran standard.  The behavior
 of gfortran on integer overflow is undefined by default.  Traditional
 code, like linear congruential pseudo-random number generators in old
-programs that rely on specific, non-standard behavior may generate
+programs that rely on specific, nonstandard behavior may generate
 unexpected results.  The @option{-fsanitize=undefined} option can be
 used to detect such code at runtime.
 
@@ -1606,12 +1606,12 @@ and @code{CHARACTER}.
 @cindex conversion, to character
 
 Allowing character literals to be used in a similar way to Hollerith constants
-is a non-standard extension.  This feature is enabled using
+is a nonstandard extension.  This feature is enabled using
 -fdec-char-conversions and only applies to character literals of @code{kind=1}.
 
 Character literals can be used in @code{DATA} statements and assignments with
 numeric (@code{INTEGER}, @code{REAL}, or @code{COMPLEX}) or @code{LOGICAL}
-variables. Like Hollerith constants they are copied byte-wise fashion. The
+variables. Like Hollerith constants they are copied bytewise fashion. The
 constant is padded with spaces or truncated to fit the size of the
 variable in which it is stored.
 
@@ -1629,7 +1629,7 @@ Examples:
 @subsection Cray pointers
 @cindex pointer, Cray
 
-Cray pointers are part of a non-standard extension that provides a
+Cray pointers are part of a nonstandard extension that provides a
 C-like pointer in Fortran.  This is accomplished through a pair of
 variables: an integer ``pointer'' that holds a memory address, and a
 ``pointee'' that is used to dereference the pointer.
@@ -2126,7 +2126,7 @@ the initializer list.
 @cindex @code{MAP}
 
 Unions are an old vendor extension which were commonly used with the
-non-standard @ref{STRUCTURE and RECORD} extensions. Use of @code{UNION} and
+nonstandard @ref{STRUCTURE and RECORD} extensions. Use of @code{UNION} and
 @code{MAP} is automatically enabled with @option{-fdec-structure}.
 
 A @code{UNION} declaration occurs within a structure; within the definition of
@@ -2667,7 +2667,7 @@ c
 
 Some Fortran compilers, including @command{g77}, let the user declare
 complex functions with the syntax @code{COMPLEX FUNCTION name*16()}, as
-well as @code{COMPLEX*16 FUNCTION name()}.  Both are non-standard, legacy
+well as @code{COMPLEX*16 FUNCTION name()}.  Both are nonstandard legacy
 extensions.  @command{gfortran} accepts the latter form, which is more
 common, but not the former.
 
@@ -3838,7 +3838,7 @@ dollar sign (@code{$}) is additionally permitted with the 
option
 @option{-fdollar-ok}, but not as first character and only if the
 target system supports it.
 
-By default, the procedure name is the lower-cased Fortran name with an
+By default, the procedure name is the lowercased Fortran name with an
 appended underscore (@code{_}); using @option{-fno-underscoring} no
 underscore is appended while @code{-fsecond-underscore} appends two
 underscores.  Depending on the target system and the calling convention,
@@ -3848,12 +3848,12 @@ number is appended.  For the changing the calling 
convention, see
 @pxref{GNU Fortran Compiler Directives}.
 
 For common blocks, the same convention is used, i.e. by default an
-underscore is appended to the lower-cased Fortran name.  Blank commons
+underscore is appended to the lowercased Fortran name.  Blank commons
 have the name @code{__BLNK__}.
 
 For procedures and variables declared in the specification space of a
-module, the name is formed by @code{__}, followed by the lower-cased
-module name, @code{_MOD_}, and the lower-cased Fortran name.  Note that
+module, the name is formed by @code{__}, followe

Re: [PATCH v2] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-20 Thread Giuseppe D'Angelo

Hi,

On 20/12/2024 13:37, Jonathan Wakely wrote:


With this, I get seven PASS results for this test.


Thank you; I was just thinking, would it be simpler to split the test in 
two instead rather than making it more convoluted than necessary? Let me 
know and I'll amend it, in either direction.


--
Giuseppe D'Angelo



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] testsuite: Add tests for PR118149

2024-12-20 Thread Jakub Jelinek
On Fri, Dec 20, 2024 at 02:55:51PM +0100, Christoph Müllner wrote:
> A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
> This patch adds two test cases for PR118149.
> These tests are different than other tests in that one of the
> vec-perm selectors contains indices in descending order (1, 1, 0, 0),
> which is the root cause for the ICE observed in PR118149.
> 
>   PR 118149
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/pr118149-2.c: New test.
>   * gcc.dg/tree-ssa/pr118149.c: New test.
> 
> Signed-off-by: Christoph Müllner 

Have you actually tested it with say
make check-gcc
RUNTESTFLAGS="--target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse} 
tree-ssa.exp='pr118149* vector-11.c'"
on x86_64?  While the -2.c possibly could work, I don't see how the last
testcase could on i686-linux (i.e. that -m32 -mno-mmx -mno-sse).
So I think it needs
/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
Or so.

Also, why are you using dg-additional-options in tree-ssa/ ?  I think the
default there is just -pedantic-errors which you don't really need for these
tests, so just dg-options?

>  gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c | 36 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr118149.c   | 19 
>  2 files changed, 55 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> new file mode 100644
> index 000..b6ceea8fdb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } 
> */
> +
> +typedef int vec __attribute__((vector_size (4 * sizeof (float;
> +
> +void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
> +{
> +  vec sel00 = { 1, 1, 3, 3 };
> +  vec sel01 = { 0, 0, 2, 2 };
> +  vec sel10 = { 3, 3, 2, 2 };
> +  vec sel11 = { 1, 1, 0, 0 };
> +  vec sel = { 0, 1, 6, 7 };
> +  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
> +  vec v_in = *p_v_in;
> +
> +  /* First vec perm sequence.  */
> +  v_1 = __builtin_shuffle (v_in, v_in, sel00);
> +  v_2 = __builtin_shuffle (v_in, v_in, sel01);
> +  v_x = v_2 - v_1;
> +  v_y = v_1 + v_2;
> +  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
> +
> +  /* Second vec perm sequence.  */
> +  v_1 = __builtin_shuffle (v_in, v_in, sel10);
> +  v_2 = __builtin_shuffle (v_in, v_in, sel11);
> +  v_x = v_2 - v_1;
> +  v_y = v_1 + v_2;
> +  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
> +
> +  *p_v_out_1 = v_out_1;
> +  *p_v_out_2 = v_out_2;
> +}
> +
> +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> new file mode 100644
> index 000..a87463dfa07
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } 
> */
> +
> +float *fastconv_parse_dst;
> +
> +void fastconv_parse ()
> +{
> +  float r3k = fastconv_parse_dst[1] - fastconv_parse_dst[3],
> +i0k = fastconv_parse_dst[4] + fastconv_parse_dst[6],
> +i1k = fastconv_parse_dst[4] - fastconv_parse_dst[6],
> +i2k = fastconv_parse_dst[5] + fastconv_parse_dst[7];
> +  fastconv_parse_dst[1] = fastconv_parse_dst[0];
> +  fastconv_parse_dst[4] = fastconv_parse_dst[5] = i0k - i2k;
> +  fastconv_parse_dst[6] = fastconv_parse_dst[7] = i1k + r3k;
> +}
> +
> +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */

Jakub



[PATCH] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
This patch adds two test cases for PR118149.
These tests are different than other tests in that one of the
vec-perm selectors contains indices in descending order (1, 1, 0, 0),
which is the root cause for the ICE observed in PR118149.

PR 118149

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118149-2.c: New test.
* gcc.dg/tree-ssa/pr118149.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c | 36 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c   | 19 
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
new file mode 100644
index 000..b6ceea8fdb7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+
+typedef int vec __attribute__((vector_size (4 * sizeof (float;
+
+void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
+{
+  vec sel00 = { 1, 1, 3, 3 };
+  vec sel01 = { 0, 0, 2, 2 };
+  vec sel10 = { 3, 3, 2, 2 };
+  vec sel11 = { 1, 1, 0, 0 };
+  vec sel = { 0, 1, 6, 7 };
+  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
+  vec v_in = *p_v_in;
+
+  /* First vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel00);
+  v_2 = __builtin_shuffle (v_in, v_in, sel01);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
+
+  /* Second vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel10);
+  v_2 = __builtin_shuffle (v_in, v_in, sel11);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
+
+  *p_v_out_1 = v_out_1;
+  *p_v_out_2 = v_out_2;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
new file mode 100644
index 000..a87463dfa07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
+
+float *fastconv_parse_dst;
+
+void fastconv_parse ()
+{
+  float r3k = fastconv_parse_dst[1] - fastconv_parse_dst[3],
+i0k = fastconv_parse_dst[4] + fastconv_parse_dst[6],
+i1k = fastconv_parse_dst[4] - fastconv_parse_dst[6],
+i2k = fastconv_parse_dst[5] + fastconv_parse_dst[7];
+  fastconv_parse_dst[1] = fastconv_parse_dst[0];
+  fastconv_parse_dst[4] = fastconv_parse_dst[5] = i0k - i2k;
+  fastconv_parse_dst[6] = fastconv_parse_dst[7] = i1k + r3k;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
-- 
2.47.1



Re: [PATCH] c++: ICE with nested anonymous union [PR117153]

2024-12-20 Thread Marek Polacek
Ping.

On Mon, Nov 25, 2024 at 04:46:56PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> 
> -- >8 --
> In a template, for
> 
>   union {
> union {
>   T d;
> };
>   };
> 
> build_anon_union_vars crates a malformed COMPONENT_REF: we have no
> DECL_NAME for the nested anon union so we create something like "object.".
> Most of the front end doesn't seem to care, but if such a tree gets to
> potential_constant_expression, it can cause a crash.
> 
> I suppose we can use any name for the COMPONENT_REF's member.  tsubst_stmt
> should build up a proper one in:
> 
> if (VAR_P (decl) && !DECL_NAME (decl)
>   && ANON_AGGR_TYPE_P (TREE_TYPE (decl)))
>   /* Anonymous aggregates are a special case.  */
>   finish_anon_union (decl);
> 
>   PR c++/117153
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (build_anon_union_vars): If DECL_NAME is empty, use
>   an anonymous name for the second operand of a COMPONENT_REF.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/other/anon-union6.C: New test.
>   * g++.dg/other/anon-union7.C: New test.
> ---
>  gcc/cp/decl2.cc  |  5 -
>  gcc/testsuite/g++.dg/other/anon-union6.C | 13 +
>  gcc/testsuite/g++.dg/other/anon-union7.C | 16 
>  3 files changed, 33 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/other/anon-union6.C
>  create mode 100644 gcc/testsuite/g++.dg/other/anon-union7.C
> 
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index bc3a9d0e922..1383cbb660c 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -1959,7 +1959,10 @@ build_anon_union_vars (tree type, tree object)
>  
>if (processing_template_decl)
>   ref = build_min_nt_loc (UNKNOWN_LOCATION, COMPONENT_REF, object,
> - DECL_NAME (field), NULL_TREE);
> + (DECL_NAME (field)
> +  ? DECL_NAME (field)
> +  : make_anon_name ()),
> + NULL_TREE);
>else
>   ref = build_class_member_access_expr (object, field, NULL_TREE,
> false, tf_warning_or_error);
> diff --git a/gcc/testsuite/g++.dg/other/anon-union6.C 
> b/gcc/testsuite/g++.dg/other/anon-union6.C
> new file mode 100644
> index 000..d1cde30f790
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/other/anon-union6.C
> @@ -0,0 +1,13 @@
> +// PR c++/117153
> +// { dg-do compile }
> +
> +template
> +void f() {
> +  union {
> +union {
> +  T d;
> +};
> +  };
> +  (void) (d + 0);
> +}
> +template void f();
> diff --git a/gcc/testsuite/g++.dg/other/anon-union7.C 
> b/gcc/testsuite/g++.dg/other/anon-union7.C
> new file mode 100644
> index 000..e89334a5d0e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/other/anon-union7.C
> @@ -0,0 +1,16 @@
> +// PR c++/117153
> +// { dg-do compile { target c++11 } }
> +
> +using U = union { union { int a; }; };
> +
> +U
> +foo ()
> +{
> +  return {};
> +}
> +
> +void
> +g ()
> +{
> +  [[maybe_unused]] auto u = foo ();
> +}
> 
> base-commit: 4b09e2c67ef593db171b0755b46378964421782b
> -- 
> 2.47.0
> 

Marek



Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2024-12-20 Thread Marek Polacek
Ping.

On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> As the manual states, using "-fhardened -fstack-protector" will produce
> a warning because -fhardened wants to enable -fstack-protector-strong,
> but it can't since it's been overriden by the weaker -fstack-protector.
> 
> -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> produce the same warning.  But we don't detect this combination, so
> this patch fixes it.  I also renamed a variable to better reflect its
> purpose.
> 
> Also don't check warn_hardened in process_command, since it's always
> true there.
> 
> Also tweak wording in the manual as Jon Wakely suggested on IRC.
> 
>   PR driver/117739
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi: Tweak wording for -Whardened.
>   * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
>   specified, don't enable linker hardening.
>   (process_command): Don't check warn_hardened.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/fhardened-16.c: New test.
>   * c-c++-common/fhardened-17.c: New test.
>   * c-c++-common/fhardened-18.c: New test.
>   * c-c++-common/fhardened-19.c: New test.
>   * c-c++-common/fhardened-20.c: New test.
>   * c-c++-common/fhardened-21.c: New test.
> ---
>  gcc/doc/invoke.texi   |  4 ++--
>  gcc/gcc.cc| 20 ++--
>  gcc/testsuite/c-c++-common/fhardened-16.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-17.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-18.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-19.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-20.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-21.c |  5 +
>  8 files changed, 46 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 346ac1369b8..371f723539c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}.
>  Warn when @option{-fhardened} did not enable an option from its set (for
>  which see @option{-fhardened}).  For instance, using @option{-fhardened}
>  and @option{-fstack-protector} at the same time on the command line causes
> -@option{-Whardened} to warn because @option{-fstack-protector-strong} is
> -not enabled by @option{-fhardened}.
> +@option{-Whardened} to warn because @option{-fstack-protector-strong} will
> +not be enabled by @option{-fhardened}.
>  
>  This warning is enabled by default and has effect only when 
> @option{-fhardened}
>  is enabled.
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 92c92996401..d2718d263bb 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0;
> driver added to dumpdir after dumpbase or linker output name.  */
>  static bool dumpdir_trailing_dash_added = false;
>  
> -/* True if -r, -shared, -pie, or -no-pie were specified on the command
> -   line.  */
> -static bool any_link_options_p;
> +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were
> +   specified on the command line, and therefore -fhardened should not
> +   add -z now/relro.  */
> +static bool avoid_linker_hardening_p;
>  
>  /* True if -static was specified on the command line.  */
>  static bool static_p;
> @@ -4434,10 +4435,17 @@ driver_handle_option (struct gcc_options *opts,
>   }
>   /* Record the part after the last comma.  */
>   add_infile (arg + prev, "*");
> + if (strcmp (arg, "-z,lazy") == 0 || strcmp (arg, "-z,norelro") == 0)
> +   avoid_linker_hardening_p = true;
>}
>do_save = false;
>break;
>  
> +case OPT_z:
> +  if (strcmp (arg, "lazy") == 0 || strcmp (arg, "norelro") == 0)
> + avoid_linker_hardening_p = true;
> +  break;
> +
>  case OPT_Xlinker:
>add_infile (arg, "*");
>do_save = false;
> @@ -4642,7 +4650,7 @@ driver_handle_option (struct gcc_options *opts,
>  case OPT_r:
>  case OPT_shared:
>  case OPT_no_pie:
> -  any_link_options_p = true;
> +  avoid_linker_hardening_p = true;
>break;
>  
>  case OPT_static:
> @@ -5026,7 +5034,7 @@ process_command (unsigned int decoded_options_count,
>/* TODO: check if -static -pie works and maybe use it.  */
>if (flag_hardened)
>  {
> -  if (!any_link_options_p && !static_p)
> +  if (!avoid_linker_hardening_p && !static_p)
>   {
>

Re: [PATCH] Introduce -flto-partition=locality

2024-12-20 Thread Kyrylo Tkachov
Ping.
Thanks,
Kyrill

> On 13 Dec 2024, at 16:47, Kyrylo Tkachov  wrote:
> 
> Ping.
> Thanks,
> Kyrill
> 
>> On 28 Nov 2024, at 11:22, Kyrylo Tkachov  wrote:
>> 
>> Ping.
>> 
>>> On 15 Nov 2024, at 17:04, Kyrylo Tkachov  wrote:
>>> 
>>> Hi all,
>>> 
>>> This is a patch submission following-up from the RFC at:
>>> https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
>>> The patch is rebased and retested against current trunk, some debugging code
>>> removed, comments improved and some fixes added as I've we've done more
>>> testing.
>>> 
>>> >8-
>>> Implement partitioning and cloning in the callgraph to help locality.
>>> A new -flto-partition=locality flag is used to enable this.
>>> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
>>> The optimization has two components:
>>> * Partitioning the callgraph so as to group callers and callees that 
>>> frequently
>>> call each other in the same partition
>>> * Cloning functions that straddle multiple callchains and allowing each 
>>> clone
>>> to be local to the partition of its callchain.
>>> 
>>> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
>>> It creates a partitioning plan and does the prerequisite cloning.
>>> The partitioning is then implemented during the existing LTO partitioning 
>>> pass.
>>> 
>>> To guide these locality heuristics we use PGO data.
>>> In the absence of PGO data we use a static heuristic that uses the 
>>> accumulated
>>> estimated edge frequencies of the callees for each function to guide the
>>> reordering.
>>> We are investigating some more elaborate static heuristics, in particular 
>>> using
>>> the demangled C++ names to group template instantiatios together.
>>> This is promising but we are working out some kinks in the implementation
>>> currently and want to send that out as a follow-up once we're more confident
>>> in it.
>>> 
>>> A new bootstrap-lto-locality bootstrap config is added that allows us to 
>>> test
>>> this on GCC itself with either static or PGO heuristics.
>>> GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
>>> 
>>> With this optimization we are seeing good performance gains on some large
>>> internal workloads that stress the parts of the processor that is sensitive
>>> to code locality, but we'd appreciate wider performance evaluation.
>>> 
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Ok for mainline?
>>> Thanks,
>>> Kyrill
>>> 
>>> Signed-off-by: Prachi Godbole 
>>> Co-authored-by: Kyrylo Tkachov 
>>> 
>>>  config/ChangeLog:
>>>   * bootstrap-lto-locality.mk: New file.
>>> 
>>>   gcc/ChangeLog:
>>>  * Makefile.in (OBJS): Add ipa-locality-cloning.o
>>>  (GTFILES): Add ipa-locality-cloning.cc dependency.
>>>  * common.opt (lto_partition_model): Add locality value.
>>>  * flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
>>> value.
>>>  (enum lto_locality_cloning_model): Define.
>>>  * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
>>> of node
>>>  and index.
>>>  * params.opt (lto_locality_cloning_model): New enum.
>>>  (lto-partition-locality-cloning): New param.
>>>  (lto-partition-locality-frequency-cutoff): Likewise.
>>>  (lto-partition-locality-size-cutoff): Likewise.
>>>  (lto-max-locality-partition): Likewise.
>>>  * passes.def: Add pass_ipa_locality_cloning.
>>>  * timevar.def (TV_IPA_LC): New timevar.
>>>  * tree-pass.h (make_pass_ipa_locality_cloning): Declare.
>>>  * ipa-locality-cloning.cc: New file.
>>>  * ipa-locality-cloning.h: New file.
>>> 
>>>gcc/lto/ChangeLog:
>>>   * lto-partition.cc: Include ipa-locality-cloning.h
>>>  (add_node_references_to_partition): Define.
>>>  (create_partition): Likewise.
>>>  (lto_locality_map): Likewise.
>>>  (lto_promote_cross_file_statics): Add extra dumping.
>>>  * lto-partition.h (lto_locality_map): Declare.
>>>  * lto.cc (do_whole_program_analysis): Handle 
>>> LTO_PARTITION_LOCALITY.
>>> 
>>> <0001-Introduce-flto-partition-locality.patch>
>> 
> 



Re: [PATCH] c++: bogus error with nested lambdas [PR117602]

2024-12-20 Thread Marek Polacek
Ping.

On Fri, Nov 15, 2024 at 09:08:18AM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> The error here should also check that we aren't nested in another
> lambda; in it, at_function_scope_p() will be false.
> 
>   PR c++/117602
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_lambda_introducer): Check if we're in a lambda
>   before emitting the error about a non-local lambda with
>   a capture-default.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/lambda-uneval19.C: New test.
> ---
>  gcc/cp/parser.cc |  5 -
>  gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C | 14 ++
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 07b12224615..dc79ff42a3b 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -11611,7 +11611,10 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
> lambda_expr)
>cp_lexer_consume_token (parser->lexer);
>first = false;
>  
> -  if (!(at_function_scope_p () || parsing_nsdmi ()))
> +  if (!(at_function_scope_p ()
> + || parsing_nsdmi ()
> + || (current_class_type
> + && LAMBDA_TYPE_P (current_class_type
>   error ("non-local lambda expression cannot have a capture-default");
>  }
>  
> diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C 
> b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C
> new file mode 100644
> index 000..79390180811
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C
> @@ -0,0 +1,14 @@
> +// PR c++/117602
> +// { dg-do compile { target c++20 } }
> +
> +void
> +g ()
> +{
> +  [&](decltype([&](decltype([=](decltype([&](decltype([&]() {})) {})) {})) 
> {})){};
> +  [&](decltype([&](decltype([&](decltype([&](decltype([&]() {})) {})) {})) 
> {})){};
> +  [=](decltype([=](decltype([=](decltype([=](decltype([&]() {})) {})) {})) 
> {})){};
> +
> +  [=]() {
> +[&]() { };
> +  };
> +}
> 
> base-commit: 96a468842ef8b5d9b971428c7ba4e14fdab5ea94
> -- 
> 2.47.0
> 

Marek



Re: [PATCH] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
On Fri, Dec 20, 2024 at 3:38 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 20, 2024 at 02:55:51PM +0100, Christoph Müllner wrote:
> > A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
> > This patch adds two test cases for PR118149.
> > These tests are different than other tests in that one of the
> > vec-perm selectors contains indices in descending order (1, 1, 0, 0),
> > which is the root cause for the ICE observed in PR118149.
> >
> >   PR 118149
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/pr118149-2.c: New test.
> >   * gcc.dg/tree-ssa/pr118149.c: New test.
> >
> > Signed-off-by: Christoph Müllner 
>
> Have you actually tested it with say
> make check-gcc
> RUNTESTFLAGS="--target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse} 
> tree-ssa.exp='pr118149* vector-11.c'"
> on x86_64?  While the -2.c possibly could work, I don't see how the last
> testcase could on i686-linux (i.e. that -m32 -mno-mmx -mno-sse).
> So I think it needs
> /* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
> Or so.

No, I haven't tested like this (only with defaults on x86-64 and aarch64).
I've seen that you added these flags to vector-7, vector-8 and
vector-9 recently.
I copied the DG lines from vector-10 which does not blend and therefore does not
need this line. Btw, vector-11 also needs this DG line.

I will do the following
* Add these options to vector-11.c, pr118149.c and pr118149-2.c
* Retest with the mentioned RUNTESTFLAGS
* Send a patch to fix existing vector-11 patch
* Send a v2 of this patch

> Also, why are you using dg-additional-options in tree-ssa/ ?  I think the
> default there is just -pedantic-errors which you don't really need for these
> tests, so just dg-options?

I will change accordingly for all vector-* tests and satd-hadamard.c
and include that in the patch that fixes vector-11.c.

Thanks!

>
> >  gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c | 36 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr118149.c   | 19 
> >  2 files changed, 55 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> > new file mode 100644
> > index 000..b6ceea8fdb7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" 
> > } */
> > +
> > +typedef int vec __attribute__((vector_size (4 * sizeof (float;
> > +
> > +void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
> > +{
> > +  vec sel00 = { 1, 1, 3, 3 };
> > +  vec sel01 = { 0, 0, 2, 2 };
> > +  vec sel10 = { 3, 3, 2, 2 };
> > +  vec sel11 = { 1, 1, 0, 0 };
> > +  vec sel = { 0, 1, 6, 7 };
> > +  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
> > +  vec v_in = *p_v_in;
> > +
> > +  /* First vec perm sequence.  */
> > +  v_1 = __builtin_shuffle (v_in, v_in, sel00);
> > +  v_2 = __builtin_shuffle (v_in, v_in, sel01);
> > +  v_x = v_2 - v_1;
> > +  v_y = v_1 + v_2;
> > +  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
> > +
> > +  /* Second vec perm sequence.  */
> > +  v_1 = __builtin_shuffle (v_in, v_in, sel10);
> > +  v_2 = __builtin_shuffle (v_in, v_in, sel11);
> > +  v_x = v_2 - v_1;
> > +  v_y = v_1 + v_2;
> > +  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
> > +
> > +  *p_v_out_1 = v_out_1;
> > +  *p_v_out_2 = v_out_2;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> > blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" 
> > { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" 
> > { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> > new file mode 100644
> > index 000..a87463dfa07
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" 
> > } */
> > +
> > +float *fastconv_parse_dst;
> > +
> > +void fastconv_parse ()
> > +{
> > +  float r3k = fastconv_parse_dst[1] - fastconv_parse_dst[3],
> > +i0k = fastconv_parse_dst[4] + fastconv_parse_dst[6],
> > +i1k = fastconv_parse_dst[4] - fastconv_parse_dst[6],
> > +i2k = fastconv_parse_dst[5] + fastconv_parse_dst[7];
> > +  fastconv_parse_dst[1] = fastconv_parse_dst[0];
> > +  fastconv_parse_dst[4] = fastconv_parse_dst[5] = i0k - i2k;
> > +  fastconv_parse_dst[6] = fastconv_parse_dst[7] = i1k + r3k;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> > ble

[PATCH] c++: Allow pragmas in NSDMIs [PR118147]

2024-12-20 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk and
maybe release branches?

-- >8 --

This patch removes the (unnecessary) CPP_PRAGMA_EOL case from
cp_parser_cache_defarg, which currently has the result that any pragmas
in the NSDMI cause an error.

PR c++/118147

gcc/cp/ChangeLog:

* parser.cc (cp_parser_cache_defarg): Don't error when
CPP_PRAGMA_EOL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-defer7.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/parser.cc  |  1 -
 gcc/testsuite/g++.dg/cpp0x/nsdmi-defer7.C | 13 +
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-defer7.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 23c6a2fd30e..527a858ea52 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -36316,7 +36316,6 @@ cp_parser_cache_defarg (cp_parser *parser, bool nsdmi)
 
  /* If we run out of tokens, issue an error message.  */
case CPP_EOF:
-   case CPP_PRAGMA_EOL:
  error_at (token->location, "file ends in default argument");
  return error_mark_node;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-defer7.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-defer7.C
new file mode 100644
index 000..3bef636ccbd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-defer7.C
@@ -0,0 +1,13 @@
+// PR c++/118147
+// { dg-do compile { target c++11 } }
+
+struct F {
+  int i = []{
+#pragma message "test"  // { dg-message "test" }
+return 1;
+  }();
+};
+
+struct G {
+  int i =
+#pragma GCC diagnostic push  // { dg-error "file ends in default 
argument|expected" }
-- 
2.47.0



Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-20 Thread Richard Sandiford
"Robin Dapp"  writes:
>> "Robin Dapp"  writes:
>>> But here I intended to just change the encoded value in memory and to
>>> prevent it from aliasing with other encoded values.
>>> In an actual instruction the values we "padded" (or that are beyond the
>>> vector length) are ignored.  For the purpose of hashing a mask of
>>> "1010" and "10101010" should be distinct which they are not right now
>>> because we don't stop at the length so to say.
>>
>> But what modes do "1010" and "10101010" represent here?  And which code
>> treats them as equal?  (It should be ok if the hashes are equal, as long
>> as the equality function distinguishes them.)
>
> I realize I didn't fully detail that in the original post, sorry :/
>
> In my test case we have two functions one with the first mask ("1010" for
> V4BI) and another one performing a similar operation just for
> V8BI ("10101010").
>
> In optimize_constant_pool we use native_encode_rtx to encode the value
> into a buffer (of size GET_MODE_SIZE which is one byte here).
> IIRC the whole function works at byte granularity.
>
> htab->find_slot_with_hash
> then finds the same hash for the V4BI and V8BI masks, and we wrongly unify it.
Ah, OK, thanks!  I don't remember seeing that code before.

In some ways, merging the entries seems like a nice feature... if it worked
correctly.

But optimize_constant_pool does mean that native_encode_rtx and
output_constant_pool_2 must agree on what the byte image looks like,
even for padding bits.  I wonder if we should instead get rid of:

case MODE_VECTOR_BOOL:
  {
gcc_assert (GET_CODE (x) == CONST_VECTOR);

/* Pick the smallest integer mode that contains at least one
   whole element.  Often this is byte_mode and contains more
   than one element.  */
unsigned int nelts = GET_MODE_NUNITS (mode);
unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));

/* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but
   only properly handle cases where the difference is less than a
   byte.  */
gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) <
BITS_PER_UNIT);

/* Build the constant up one integer at a time.  */
unsigned int elts_per_int = int_bits / elt_bits;
for (unsigned int i = 0; i < nelts; i += elts_per_int)
  {
unsigned HOST_WIDE_INT value = 0;
unsigned int limit = MIN (nelts - i, elts_per_int);
for (unsigned int j = 0; j < limit; ++j)
{
  auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
  value |= (elt & mask) << (j * elt_bits);
}
output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
i != 0 ? MIN (align, int_bits) : align);
  }
break;
  }

from output_constant_pool_2 and make it defer to native_encode_rtx
instead.  That seems like the most direct way of avoiding mishaps.

E.g. another way in which different routines could make different choices
is in whether, for SVE's VNx8BI (say), they fill the upper bit of each
2-bit element with 0s or with a copy of the low bit.  Both choices are
valid in principle, and sharing the same code between both routines
would make sure that they make the same choice.

Thanks,
Richard


Re: [PATCH] testsuite: Add tests for PR118149

2024-12-20 Thread Jakub Jelinek
On Fri, Dec 20, 2024 at 03:56:41PM +0100, Christoph Müllner wrote:
> > Also, why are you using dg-additional-options in tree-ssa/ ?  I think the
> > default there is just -pedantic-errors which you don't really need for these
> > tests, so just dg-options?
> 
> I will change accordingly for all vector-* tests and satd-hadamard.c
> and include that in the patch that fixes vector-11.c.

Note, dg-additional-options is desirable e.g. in */vect/* tests, because
there the defaults include tons of options needed to vectorize on various
arches and the vect_int etc. effective targets rely on those.
Similarly to other parts of the testsuite which have some extra default
options which are needed.  Just gcc.dg/* or gcc.dg/tree-ssa/* isn't it.

Jakub



Re: [PATCH] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
On Fri, Dec 20, 2024 at 4:07 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 20, 2024 at 03:56:41PM +0100, Christoph Müllner wrote:
> > > Also, why are you using dg-additional-options in tree-ssa/ ?  I think the
> > > default there is just -pedantic-errors which you don't really need for 
> > > these
> > > tests, so just dg-options?
> >
> > I will change accordingly for all vector-* tests and satd-hadamard.c
> > and include that in the patch that fixes vector-11.c.
>
> Note, dg-additional-options is desirable e.g. in */vect/* tests, because
> there the defaults include tons of options needed to vectorize on various
> arches and the vect_int etc. effective targets rely on those.
> Similarly to other parts of the testsuite which have some extra default
> options which are needed.  Just gcc.dg/* or gcc.dg/tree-ssa/* isn't it.

Ok, understood.
The vector-*.c tests that I was referring to are all in gcc/dg/tree-ssa.

>
> Jakub
>


[PATCH 2/2] testsuite: tree-ssa: Fix i686/-m32 fails for vector-*.c tests

2024-12-20 Thread Christoph Müllner
FAILs have been reported for several tree-ssa vector-*.c tests
on i686-linux or on x86_64-linux with -m32.
This patch addresses these fails by setting the necessary -msse2 flags.

This patch also streamlines all tests to use dg-options instead
of dg-additional-options.  This is in line with most other tests
in gcc.dg/tree-ssa.

Tested with the following board config in RUNTESTFLAGS:
  --target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-10.c: Rename dg-additional-options
to dg-options and add -msse2 to it.
* gcc.dg/tree-ssa/vector-11.c: Likewise.
* gcc.dg/tree-ssa/vector-8.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-9.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/vector-10.c | 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/vector-11.c | 6 ++
 gcc/testsuite/gcc.dg/tree-ssa/vector-8.c  | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/vector-9.c  | 4 ++--
 5 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c 
b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
index 6042378f165..3e562dabe9f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop4-details" } */
-/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3 -fdump-tree-forwprop4-details" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
index bb1ed92dc90..30561444b66 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-10.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -105,7 +106,7 @@ void f4 (vec *p_v_in_1, vec *p_v_out_1, vec *p_v_out_2)
   v_y = v_1 - v_2;
   v_out_1 = __builtin_shuffle (v_x, v_y, sel);
 
-  /* Won't merge because of dependency.  */
+  /* Won't blend because of dependency.  */
   v_in_2 = foo (v_out_1);
 
   /* Second vec perm sequence.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
index e4102d318d2..4791d7e58c4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-11.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -27,9 +28,6 @@ void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
   v_y = v_1 + v_2;
   v_out_2 = __builtin_shuffle (v_y, v_x, sel);
 
-  /* Won't blend because the narrowed sequence
- utilizes three of the four lanes.  */
-
   *p_v_out_1 = v_out_1;
   *p_v_out_2 = v_out_2;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
index ba9a0187c10..2fb97b5aded 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
-/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
index 1aa2ef99c3c..ce69dc8f6ba 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
-/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
-- 
2.47.1



[PATCH 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Christoph Müllner
A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
This patch adds two test cases for PR118149.
These tests are different than other tests in that one of the
vec-perm selectors contains indices in descending order (1, 1, 0, 0),
which is the root cause for the ICE observed in PR118149.

PR 118149

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118149-2.c: New test.
* gcc.dg/tree-ssa/pr118149.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c | 37 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c   | 20 
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118149.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
new file mode 100644
index 000..2c86dccb145
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+typedef int vec __attribute__((vector_size (4 * sizeof (float;
+
+void f1 (vec *p_v_in, vec *p_v_out_1, vec *p_v_out_2)
+{
+  vec sel00 = { 1, 1, 3, 3 };
+  vec sel01 = { 0, 0, 2, 2 };
+  vec sel10 = { 3, 3, 2, 2 };
+  vec sel11 = { 1, 1, 0, 0 };
+  vec sel = { 0, 1, 6, 7 };
+  vec v_1, v_2, v_x, v_y, v_out_1, v_out_2;
+  vec v_in = *p_v_in;
+
+  /* First vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel00);
+  v_2 = __builtin_shuffle (v_in, v_in, sel01);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_1 = __builtin_shuffle (v_y, v_x, sel);
+
+  /* Second vec perm sequence.  */
+  v_1 = __builtin_shuffle (v_in, v_in, sel10);
+  v_2 = __builtin_shuffle (v_in, v_in, sel11);
+  v_x = v_2 - v_1;
+  v_y = v_1 + v_2;
+  v_out_2 = __builtin_shuffle (v_y, v_x, sel);
+
+  *p_v_out_1 = v_out_1;
+  *p_v_out_2 = v_out_2;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
new file mode 100644
index 000..1507dea133a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
+/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+float *fastconv_parse_dst;
+
+void fastconv_parse ()
+{
+  float r3k = fastconv_parse_dst[1] - fastconv_parse_dst[3],
+i0k = fastconv_parse_dst[4] + fastconv_parse_dst[6],
+i1k = fastconv_parse_dst[4] - fastconv_parse_dst[6],
+i2k = fastconv_parse_dst[5] + fastconv_parse_dst[7];
+  fastconv_parse_dst[1] = fastconv_parse_dst[0];
+  fastconv_parse_dst[4] = fastconv_parse_dst[5] = i0k - i2k;
+  fastconv_parse_dst[6] = fastconv_parse_dst[7] = i1k + r3k;
+}
+
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 0, 6, 6 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 7, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
-- 
2.47.1



Re: [PATCH 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Jakub Jelinek
On Fri, Dec 20, 2024 at 04:22:19PM +0100, Christoph Müllner wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118149-2.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
> +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */

This line should be dg-additional-options, otherwise the last one overrides
the previous one.

> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-forwprop4-details -Wno-psabi" } */
> +/* { dg-options "-msse2" { target i?86-*-* x86_64-*-* } } */

Ditto.

Jakub



[committed] i386: Disable SImode/DImode moves from/to mask regs without avx512bw [PR118067]

2024-12-20 Thread Uros Bizjak
SImode and DImode moves from/to mask registers are valid only with AVX512BW,
so mark relevant alternatives in *movsi_internal and *movdi_internal as such.

Even with the patch, the testcase still fails, but now with:

pr118067.c: In function ‘foo’:
pr118067.c:13:1: internal compiler error: maximum number of generated
reload insns per insn achieved (90)
   13 | }
  | ^
0x2c3b581 internal_error(char const*, ...)
../../git/gcc/gcc/diagnostic-global-context.cc:517
0xb68938 lra_constraints(bool)
../../git/gcc/gcc/lra-constraints.cc:5411
0xb51a0d lra(_IO_FILE*, int)
../../git/gcc/gcc/lra.cc:2449
0xaf9f4d do_reload
../../git/gcc/gcc/ira.cc:5977
0xafa462 execute
../../git/gcc/gcc/ira.cc:6165

PR target/118067

gcc/ChangeLog:

* config/i386/i386.md (*movdi_internal):
Disable alternatives from/to mask registers without AVX512BW.
(*movsi_internal): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6edcb6dc657..11815c59c53 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2642,12 +2642,16 @@ (define_insn "*movdi_internal"
   [(set (attr "isa")
  (cond [(eq_attr "alternative" "0,1,17,18")
  (const_string "nox64")
-   (eq_attr "alternative" "2,3,4,5,10,11,23,25")
+   (eq_attr "alternative" "2,3,4,5,10,11")
  (const_string "x64")
(eq_attr "alternative" "19,20")
  (const_string "x64_sse2")
+   (eq_attr "alternative" "23,25")
+ (const_string "x64_avx512bw")
(eq_attr "alternative" "21,22")
  (const_string "sse2")
+   (eq_attr "alternative" "24,26,27")
+ (const_string "avx512bw")
   ]
   (const_string "*")))
(set (attr "type")
@@ -2870,6 +2874,8 @@ (define_insn "*movsi_internal"
   [(set (attr "isa")
  (cond [(eq_attr "alternative" "12,13")
  (const_string "sse2")
+   (eq_attr "alternative" "14,15,16,17")
+ (const_string "avx512bw")
   ]
   (const_string "*")))
(set (attr "type")


Re: [PATCH] Add RISC-V/rv64gc as a secondary platform

2024-12-20 Thread Palmer Dabbelt

On Fri, 20 Dec 2024 12:54:54 PST (-0800), jeffreya...@gmail.com wrote:



On 12/20/24 9:48 AM, Palmer Dabbelt wrote:

On Thu, 19 Dec 2024 15:01:26 PST (-0800), jeffreya...@gmail.com wrote:



On 12/19/24 3:08 PM, Palmer Dabbelt wrote:


I agree lacking B and V makes us very clearly uncompetitive in the space
where these sort of things matter (ie, binary compatible distros and
long term stability type things) -- the gap is just too big to close by
doing clever things in the hardware.  Maybe just B and V isn't enough,
it's hard to tell, but lacking them seems pretty clearly uncompetitive.

I'm not sure B is so scary on the SW side of things, it's been mostly
performance issues we've been fixing.  V is huge, though, and we've
generally found a bunch of V-related functional codegen bugs.  Without
reliable hardware to test against (and do distro builds and such) it
just seems premature to declare that being as stable as the other ports
on the list.

It wouldn't take much to push me into agreeing to B -- it's not scary in
any way.  There's just notable systems out there that don't implement B,
but I wouldn't mind leaving them behind for this change.

V has real performance concerns.  I haven't tested it performance-wise
on the BPI recently, but when I last did the general rule of thumb was
the more vector you did, the worse it performed *especially* for FP.


Yep.  TBH that's actually my biggest worry here: it might be that just V
isn't enough, which means we have another few generations of HW before
all the V add-on extensions coalesce into something that's widely
implemented.  I'm not really sure there, though -- the HW guys can be
pretty clever so maybe V gets us close enough to be interesting.

Well, I think Robin and I would argue that V can be made to perform.  We
see that with our design.  But that's because Robin & others have been
focused on the performance for months.  The gains that have been made
have been significant and have brought the design & codegen to a point
where it performs largely at expectation.  But it's also the case that
our design is not generally available


That's the problem with this HW stuff, the timelines are just so long to 
get anything out and visible that it's hard to tell what the actual 
state of the world is.



I don't doubt significant gains could be made that would make V behave
better on widely available designs.  They don't have the option of uarch
adjustments, so the upside is limited by that.  But with a good
understanding of the uarch and good PMU data significant gains almost
certainly could be made.


Yep, and I think all of the designs that went through without SW 
feedback are going to need a lot of work to get even reasonable 
performance out of.  I don't think there's really anything V (or RISC-V) 
specific there, it's just how that design methodology ends up, we're 
just seeing it on the V side of things because that's the first 
complicated part of RISC-V.



Point being I do think V is viable and will continue to improve as the
feedback loops become better established.


That'd be great, certainly way better than having to wait for another 
round of ISA design.  Hopefully I'm just seeing more of the results of 
the wrong way of doing things and that's overly coloring things, it's my 
first round of this so I don't really know how it all fits together.


I guess we'll just have to wait and see when the HW starts to show up ;)


jeff


Re: [PATCH v2] libstdc++: fix a dangling reference crash in ranges::is_permutation

2024-12-20 Thread Patrick Palka
On Fri, 20 Dec 2024, Giuseppe D'Angelo wrote:

> Hello,
> 
> On 20/12/2024 13:23, Giuseppe D'Angelo wrote:
> > Hi,
> > 
> > The implementation of ranges::is_permutation may create a dangling
> > reference, which then results (sometimes) in a crash. A minimal example
> > that shows the problem under ASAN is https://gcc.godbolt.org/z/7bP9nE8fK

D'oh, good catch..

> > 
> > The attached patch fixes it. I've tested on Linux x86-64. Adding a
> > negative test for this is somehow challenging (how do you test you're
> > not using a dangling reference?), but running the modified test under
> > ASAN shows the fix in place.

I'd expect a constexpr version of the test to reliably fail as soon as
it encounters the UB.

> > 
> > Do you need me to create a report on bugzilla and cross-reference it
> > from the patch?

That'd be good since we'll probably want to backport the fix to the
release branches and the PR could reflect that.

> Better patch attached.
> 
> Thanks,
> -- 
> Giuseppe D'Angelo
> 

> Subject: [PATCH] libstdc++: fix a dangling reference crash in
>  ranges::is_permutation
> 
> The code was caching the result of `invoke(proj, *it)` in a local
> `auto &&` variable. The problem is that this may create dangling
> references, for instance in case `proj` is `std::identity` (the common
> case) and `*it` produces a prvalue.
> 
> Rather than finding a "creative" solution (e.g. use the value category
> of `*it`, and the return type of calling the projection on that, to
> determine whether we can keep a reference or we need a value), get rid
> of the caching and call `invoke` as needed. In the common case the
> projection is cheap, and we are allowed to dereference the same iterator
> more than once (they're forward). This also sounds more correct because
> we pass the correct value category (obtained from applying the
> projection to the iterator) to the comparator.

Makes sense, but it might be preferable for sake of QoI to fix this
without introducing extra dereferences or projection applications
if possible.

auto&& is supposed to perform lifetime extension, but that only happens
for an outermost temporary and not any temporaries within a
subexpression IIUC.  So how about if we use a second auto&& for the
*__scan subexpression so that lifetime extension occurs there?

> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/ranges_algo.h (__is_permutation_fn): Do not cache
>   the projection result in a local variable, as that may create
>   dangling references.
>   * testsuite/25_algorithms/is_permutation/constrained.cc: Add a
>   test with a range returning prvalues.
> 
> Signed-off-by: Giuseppe D'Angelo 
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   |  3 +--
>  .../25_algorithms/is_permutation/constrained.cc   | 11 +++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index 772bf4dd997..4d3c4325e2c 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -567,9 +567,8 @@ namespace ranges
>  
>   for (auto __scan = __first1; __scan != __last1; ++__scan)
> {
> - auto&& __proj_scan = std::__invoke(__proj1, *__scan);
>   auto __comp_scan = [&]  (_Tp&& __arg) -> bool {
> -   return std::__invoke(__pred, __proj_scan,

If we go with the second auto&& approach then we should perfect forward
__proj_scan here as you alluded to.  That might seem unsafe at first
glance (if say the project returns an rvalue) but since the predicate is
regular_invocable it mustn't modify its arguments IIUC.

> +   return std::__invoke(__pred, std::__invoke(__proj1, *__scan),
>  std::forward<_Tp>(__arg));
>   };
>   if (__scan != ranges::find_if(__first1, __scan,
> diff --git 
> a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc 
> b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> index 2fbebe37609..4981d2c07ce 100644
> --- a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> +++ b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> @@ -19,6 +19,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -76,10 +77,20 @@ test03()
>while (std::next_permutation(std::begin(cx), std::end(cx)));
>  }
>  
> +void
> +test04()
> +{
> +  int x[] = { 4, 3, 2, 1 };
> +  auto y = std::views::iota(1, 5);
> +  VERIFY( ranges::is_permutation(x, y) );
> +  VERIFY( ranges::is_permutation(y, x) );
> +}
> +
>  int
>  main()
>  {
>test01();
>test02();
>test03();
> +  test04();
>  }
> -- 
> 2.34.1
> 



[PATCH] RISC-V: vector absolute difference expander [PR117722]

2024-12-20 Thread Vineet Gupta
This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.

Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.

Suggested-by: Robin Dapp 
Co-developed-by: Pan Li 
Signed-off-by: Vineet Gupta 

PR target/117722

gcc/ChangeLog:
* config/riscv/autovec.md: Add uabd expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117722.c: New test.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/autovec.md   | 26 +++
 .../gcc.target/riscv/rvv/autovec/pr117722.c   | 23 
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2529dc77f221..4678906fb918 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2928,3 +2928,29 @@
 riscv_vector::expand_strided_store (mode, operands);
 DONE;
   })
+
+; 
+; == Absolute difference (not including sum)
+; 
+(define_expand "uabd3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+rtx max = gen_reg_rtx (mode);
+insn_code icode = code_for_pred (UMAX, mode);
+rtx ops1[] = {max, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
+
+rtx min = gen_reg_rtx (mode);
+icode = code_for_pred (UMIN, mode);
+rtx ops2[] = {min, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
+
+icode = code_for_pred (MINUS, mode);
+rtx ops3[] = {operands[0], max, min};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
+
+DONE;
+  });
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
new file mode 100644
index ..c633f31ef25b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O2" } */
+
+/* Generate sum of absolute difference as sub (max, min).
+   This helps with x264 sad routines.  */
+
+inline int abs(int i)
+{
+  return (i < 0 ? -i : i);
+}
+
+int pixel_sad_n(unsigned char *pix1, unsigned char *pix2, int n)
+{
+  int sum = 0;
+  for( int i = 0; i < n; i++ )
+   sum += abs(pix1[x] - pix2[x]);
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler {vmin\.v} } } */
+/* { dg-final { scan-assembler {vmax\.v} } } */
+/* { dg-final { scan-assembler {vsub\.v} } } */
-- 
2.43.0



[PATCH v2] RISC-V: vector absolute difference expander [PR117722]

2024-12-20 Thread Vineet Gupta
Changes since v1:
 - Fix snafu in test.

This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.

Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.

Suggested-by: Robin Dapp 
Co-developed-by: Pan Li 
Signed-off-by: Vineet Gupta 

PR target/117722

gcc/ChangeLog:
* config/riscv/autovec.md: Add uabd expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117722.c: New test.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/autovec.md   | 26 +++
 .../gcc.target/riscv/rvv/autovec/pr117722.c   | 23 
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2529dc77f221..4678906fb918 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2928,3 +2928,29 @@
 riscv_vector::expand_strided_store (mode, operands);
 DONE;
   })
+
+; 
+; == Absolute difference (not including sum)
+; 
+(define_expand "uabd3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+rtx max = gen_reg_rtx (mode);
+insn_code icode = code_for_pred (UMAX, mode);
+rtx ops1[] = {max, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
+
+rtx min = gen_reg_rtx (mode);
+icode = code_for_pred (UMIN, mode);
+rtx ops2[] = {min, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
+
+icode = code_for_pred (MINUS, mode);
+rtx ops3[] = {operands[0], max, min};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
+
+DONE;
+  });
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
new file mode 100644
index ..b675930818e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O2" } */
+
+/* Generate sum of absolute difference as sub (max, min).
+   This helps with x264 sad routines.  */
+
+inline int abs(int i)
+{
+  return (i < 0 ? -i : i);
+}
+
+int pixel_sad_n(unsigned char *pix1, unsigned char *pix2, int n)
+{
+  int sum = 0;
+  for( int i = 0; i < n; i++ )
+   sum += abs(pix1[i] - pix2[i]);
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler {vmin\.v} } } */
+/* { dg-final { scan-assembler {vmax\.v} } } */
+/* { dg-final { scan-assembler {vsub\.v} } } */
-- 
2.43.0



Re: [PATCH v2] libstdc++: fix a dangling reference crash in ranges::is_permutation

2024-12-20 Thread Patrick Palka
On Fri, 20 Dec 2024, Patrick Palka wrote:

> On Fri, 20 Dec 2024, Giuseppe D'Angelo wrote:
> 
> > Hello,
> > 
> > On 20/12/2024 13:23, Giuseppe D'Angelo wrote:
> > > Hi,
> > > 
> > > The implementation of ranges::is_permutation may create a dangling
> > > reference, which then results (sometimes) in a crash. A minimal example
> > > that shows the problem under ASAN is https://gcc.godbolt.org/z/7bP9nE8fK
> 
> D'oh, good catch..
> 
> > > 
> > > The attached patch fixes it. I've tested on Linux x86-64. Adding a
> > > negative test for this is somehow challenging (how do you test you're
> > > not using a dangling reference?), but running the modified test under
> > > ASAN shows the fix in place.
> 
> I'd expect a constexpr version of the test to reliably fail as soon as
> it encounters the UB.
> 
> > > 
> > > Do you need me to create a report on bugzilla and cross-reference it
> > > from the patch?
> 
> That'd be good since we'll probably want to backport the fix to the
> release branches and the PR could reflect that.
> 
> > Better patch attached.
> > 
> > Thanks,
> > -- 
> > Giuseppe D'Angelo
> > 
> 
> > Subject: [PATCH] libstdc++: fix a dangling reference crash in
> >  ranges::is_permutation
> > 
> > The code was caching the result of `invoke(proj, *it)` in a local
> > `auto &&` variable. The problem is that this may create dangling
> > references, for instance in case `proj` is `std::identity` (the common
> > case) and `*it` produces a prvalue.
> > 
> > Rather than finding a "creative" solution (e.g. use the value category
> > of `*it`, and the return type of calling the projection on that, to
> > determine whether we can keep a reference or we need a value), get rid
> > of the caching and call `invoke` as needed. In the common case the
> > projection is cheap, and we are allowed to dereference the same iterator
> > more than once (they're forward). This also sounds more correct because
> > we pass the correct value category (obtained from applying the
> > projection to the iterator) to the comparator.
> 
> Makes sense, but it might be preferable for sake of QoI to fix this
> without introducing extra dereferences or projection applications
> if possible.
> 
> auto&& is supposed to perform lifetime extension, but that only happens
> for an outermost temporary and not any temporaries within a
> subexpression IIUC.  So how about if we use a second auto&& for the
> *__scan subexpression so that lifetime extension occurs there?
> 
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/bits/ranges_algo.h (__is_permutation_fn): Do not cache
> > the projection result in a local variable, as that may create
> > dangling references.
> > * testsuite/25_algorithms/is_permutation/constined.cc: Add a
> > test with a range returning prvalues.
> > 
> > Signed-off-by: Giuseppe D'Angelo 
> > ---
> >  libstdc++-v3/include/bits/ranges_algo.h   |  3 +--
> >  .../25_algorithms/is_permutation/constrained.cc   | 11 +++
> >  2 files changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index 772bf4dd997..4d3c4325e2c 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -567,9 +567,8 @@ namespace ranges
> >  
> > for (auto __scan = __first1; __scan != __last1; ++__scan)
> >   {
> > -   auto&& __proj_scan = std::__invoke(__proj1, *__scan);
> > auto __comp_scan = [&]  (_Tp&& __arg) -> bool {
> > - return std::__invoke(__pred, __proj_scan,
> 
> If we go with the second auto&& approach then we should perfect forward
> __proj_scan here as you alluded to.  That might seem unsafe at first
> glance (if say the project returns an rvalue) but since the predicate is
> regular_invocable it mustn't modify its arguments IIUC.

N.B. that we currently don't perfect forward __proj_scan here is a
minor bug separate from the dangling reference bug.

The predicate's constraints permit invocability with
iter_reference_t not necessarily with iter_reference_t&,
but here we're currently invoking it with the latter.

> 
> > + return std::__invoke(__pred, std::__invoke(__proj1, *__scan),
> >std::forward<_Tp>(__arg));
> > };
> > if (__scan != ranges::find_if(__first1, __scan,
> > diff --git 
> > a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc 
> > b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> > index 2fbebe37609..4981d2c07ce 100644
> > --- a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> > +++ b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
> > @@ -19,6 +19,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -76,10 +77,20 @@ test03()
> >while (std::next_permutation(std::begin(cx), std::end(cx)));
> >  }
> >  
> > +void
> > +test04()
> > +{

Re: [PATCH v3] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-20 Thread Giuseppe D'Angelo

Hello,

On 20/12/2024 18:05, Jonathan Wakely wrote:

So I think I'll just finish testing it like that, as one test, and push that.


OK, thanks.

But in fact, I've realized that it doesn't need to be run at all? 
There's nothing that happens at runtime. I've therefore made the test 
just a compile time one.


Thanks,
--
Giuseppe D'Angelo
From 9036ca04aaf07557c7dc14d995d2a8d4b41d5289 Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Tue, 3 Dec 2024 16:56:45 +0100
Subject: [PATCH] libstdc++: add initializer_list constructor to std::span
 (P2447R6)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements P2447R6. The code is straightforward (just one
extra constructor, with constraints and conditional explicit).

I decided to suppress -Winit-list-lifetime because otherwise it would
give too many false positives. The new constructor is meant to be used
as a parameter-passing interface (this is a design choice, see
P2447R6/§2) and, as such, the initializer_list won't dangle despite
GCC's warnings.

The new constructor isn't 100% backwards compatible. A couple of
examples are included in Annex C, but I have also lifted some more
from R4. A new test checks for the old and the new behaviors.

libstdc++-v3/ChangeLog:

	* include/bits/version.def: Added the new feature-testing macro.
	* include/bits/version.h (defined): Regenerated.
	* include/std/span: Added constructor from initializer_list.
	* testsuite/23_containers/span/init_list_cons.cc: New test.
	* testsuite/23_containers/span/init_list_cons_neg.cc: New test.

Signed-off-by: Giuseppe D'Angelo 
---
 libstdc++-v3/include/bits/version.def |  8 +++
 libstdc++-v3/include/bits/version.h   | 10 +++
 libstdc++-v3/include/std/span | 17 +
 .../23_containers/span/init_list_cons.cc  | 65 +++
 .../23_containers/span/init_list_cons_neg.cc  | 36 ++
 5 files changed, 136 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc

diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def
index 3b31cff5194..62b8252e02d 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1869,6 +1869,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = span_initializer_list;
+  values = {
+v = 202311;
+cxxmin = 26;
+  };
+};
+
 ftms = {
   name = text_encoding;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h
index ef27ae5e4fa..16cdae920a1 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2075,6 +2075,16 @@
 #endif /* !defined(__cpp_lib_saturation_arithmetic) && defined(__glibcxx_want_saturation_arithmetic) */
 #undef __glibcxx_want_saturation_arithmetic
 
+#if !defined(__cpp_lib_span_initializer_list)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_span_initializer_list 202311L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_span_initializer_list)
+#   define __cpp_lib_span_initializer_list 202311L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_span_initializer_list) && defined(__glibcxx_want_span_initializer_list) */
+#undef __glibcxx_want_span_initializer_list
+
 #if !defined(__cpp_lib_text_encoding)
 # if (__cplusplus >  202302L) && _GLIBCXX_HOSTED && (_GLIBCXX_USE_NL_LANGINFO_L)
 #  define __glibcxx_text_encoding 202306L
diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index e8043c02c9a..c8aa5c02635 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -39,6 +39,7 @@
 #endif
 
 #define __glibcxx_want_span
+#define __glibcxx_want_span_initializer_list
 #include 
 
 #ifdef __cpp_lib_span // C++ >= 20 && concepts
@@ -46,6 +47,9 @@
 #include 
 #include 
 #include 
+#ifdef __cpp_lib_span_initializer_list
+# include 
+#endif
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -228,6 +232,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	: _M_ptr(ranges::data(__range)), _M_extent(ranges::size(__range))
 	{ }
 
+#if __cpp_lib_span_initializer_list >= 202311L // >= C++26
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-list-lifetime"
+	constexpr
+	explicit(extent != dynamic_extent)
+	span(initializer_list __il)
+	requires (is_const_v<_Type>)
+	: _M_ptr(__il.begin()), _M_extent(__il.size())
+	{
+	}
+#pragma GCC diagnostic pop
+#endif
+
   constexpr
   span(const span&) noexcept = default;
 
diff --git a/libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc b/libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc
new file mode 100644
index 000..1dc30ab1a50
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc
@@ -0,0 +1,65 @@
+// { dg-do compile { target c++26 } }
+
+#include 
+#include 
+
+#if !defined(__cpp_lib_sp

Re: [PATCH v2] libstdc++: fix a dangling reference crash in ranges::is_permutation

2024-12-20 Thread Giuseppe D'Angelo

Hello,

On 20/12/2024 13:23, Giuseppe D'Angelo wrote:

Hi,

The implementation of ranges::is_permutation may create a dangling
reference, which then results (sometimes) in a crash. A minimal example
that shows the problem under ASAN is https://gcc.godbolt.org/z/7bP9nE8fK

The attached patch fixes it. I've tested on Linux x86-64. Adding a
negative test for this is somehow challenging (how do you test you're
not using a dangling reference?), but running the modified test under
ASAN shows the fix in place.

Do you need me to create a report on bugzilla and cross-reference it
from the patch?

Better patch attached.

Thanks,
--
Giuseppe D'Angelo
From 2fe40c1dd49662c12d46dcb08b14c2ee68f9ea82 Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Fri, 20 Dec 2024 12:55:01 +0100
Subject: [PATCH] libstdc++: fix a dangling reference crash in
 ranges::is_permutation

The code was caching the result of `invoke(proj, *it)` in a local
`auto &&` variable. The problem is that this may create dangling
references, for instance in case `proj` is `std::identity` (the common
case) and `*it` produces a prvalue.

Rather than finding a "creative" solution (e.g. use the value category
of `*it`, and the return type of calling the projection on that, to
determine whether we can keep a reference or we need a value), get rid
of the caching and call `invoke` as needed. In the common case the
projection is cheap, and we are allowed to dereference the same iterator
more than once (they're forward). This also sounds more correct because
we pass the correct value category (obtained from applying the
projection to the iterator) to the comparator.

libstdc++-v3/ChangeLog:

	* include/bits/ranges_algo.h (__is_permutation_fn): Do not cache
	the projection result in a local variable, as that may create
	dangling references.
	* testsuite/25_algorithms/is_permutation/constrained.cc: Add a
	test with a range returning prvalues.

Signed-off-by: Giuseppe D'Angelo 
---
 libstdc++-v3/include/bits/ranges_algo.h   |  3 +--
 .../25_algorithms/is_permutation/constrained.cc   | 11 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h
index 772bf4dd997..4d3c4325e2c 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -567,9 +567,8 @@ namespace ranges
 
 	for (auto __scan = __first1; __scan != __last1; ++__scan)
 	  {
-	auto&& __proj_scan = std::__invoke(__proj1, *__scan);
 	auto __comp_scan = [&]  (_Tp&& __arg) -> bool {
-	  return std::__invoke(__pred, __proj_scan,
+	  return std::__invoke(__pred, std::__invoke(__proj1, *__scan),
    std::forward<_Tp>(__arg));
 	};
 	if (__scan != ranges::find_if(__first1, __scan,
diff --git a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
index 2fbebe37609..4981d2c07ce 100644
--- a/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/is_permutation/constrained.cc
@@ -19,6 +19,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -76,10 +77,20 @@ test03()
   while (std::next_permutation(std::begin(cx), std::end(cx)));
 }
 
+void
+test04()
+{
+  int x[] = { 4, 3, 2, 1 };
+  auto y = std::views::iota(1, 5);
+  VERIFY( ranges::is_permutation(x, y) );
+  VERIFY( ranges::is_permutation(y, x) );
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
-- 
2.34.1



smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] c++: Avoid infinite recursion when deducing template arguments for invalid code [PR118078]

2024-12-20 Thread Simon Martin
We currently fail due to "infinite recursion" on the following invalid
code with -std=c++20 and above

=== cut here ===
template  struct S { struct U { const S s; } u; };
S t{2};
=== cut here ===

The problem is that reshape_init_class for S calls reshape_init_r for
its field S::u, that calls reshape_init_class for S::U, that calls
reshape_init_r for field S::U::s that calls reshape_init_class for its
type S, etc.

This patch fixes the issue by erroring out in reshape_init_class if we
detect that we're about to call reshape_init_r for a type that's part of
our "TYPE_CONTEXT chain".

An alternative was to change the check in grokdeclarator that rejects
fields with an incomplete type to check for the field type's
TYPE_BEING_DECLARED (which sounds like the right thing to do), however
erroring for the definition of U::s breaks a bunch of existing test
cases, and it would also make us diverge from clang and MSVC (but behave
like EDG) - see https://godbolt.org/z/hT8d1GWa3. It feels to me like we
don't want that (I'm happy to revisit if people think it's OK).

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118078

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Don't trigger infinite recursion
for invalid "recursive types".

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction118.C: New test.
* g++.dg/cpp2a/class-deduction-aggr16.C: New test.

---
 gcc/cp/decl.cc| 21 ---
 .../g++.dg/cpp1z/class-deduction118.C |  8 +++
 .../g++.dg/cpp2a/class-deduction-aggr16.C |  7 +++
 3 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 42e83f880f9..ea55ea4c0e5 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7315,9 +7315,24 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
  d->cur++;
}
   else
-   field_init = reshape_init_r (TREE_TYPE (field), d,
-/*first_initializer_p=*/NULL_TREE,
-complain);
+   {
+ /* Make sure that we won't be calling ourselves recursively, which
+could happen with "recursive template types" (PR c++/118078).  */
+ tree ctx = TYPE_CONTEXT (type);
+ while (ctx && CLASS_TYPE_P (ctx))
+   {
+ if (ctx == TYPE_MAIN_VARIANT (TREE_TYPE (field))) {
+ if (complain & tf_error)
+   error ("field %qD has incomplete type %qT", field,
+  TREE_TYPE (field));
+ return error_mark_node;
+ }
+ ctx = TYPE_CONTEXT (ctx);
+   }
+ field_init = reshape_init_r (TREE_TYPE (field), d,
+  /*first_initializer_p=*/NULL_TREE,
+  complain);
+   }
 
   if (field_init == error_mark_node)
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
new file mode 100644
index 000..b14d62df793
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
@@ -0,0 +1,8 @@
+// PR c++/118078
+// { dg-do "compile" { target c++11 } }
+
+template 
+struct S { struct U { const S s; } u; };
+S t{2};   // { dg-error "invalid use of template-name 'S' without an argument 
list" "" { target c++14_down } }
+ // { dg-error "class template argument deduction failed" "" { target 
c++17 } .-1 }
+ // { dg-error "no matching function for call to 'S\\\(int\\\)'" "" { 
target c++17 } .-2 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
new file mode 100644
index 000..feab11927c1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
@@ -0,0 +1,7 @@
+// PR c++/118078
+// { dg-do "compile" { target c++20 } }
+
+template 
+struct S { const struct U { struct V { volatile S s; } v; } u; };
+S t{2};   // { dg-error "class template argument deduction failed" }
+ // { dg-error "no matching function for call to 'S\\\(int\\\)'" "" { 
target *-*-* } .-1 }
-- 
2.44.0



Re: [PATCH v3 2/2] testsuite: tree-ssa: Fix i686/-m32 fails for vector-*.c tests

2024-12-20 Thread Jakub Jelinek
On Fri, Dec 20, 2024 at 05:38:57PM +0100, Christoph Müllner wrote:
> FAILs have been reported for several tree-ssa vector-*.c tests
> on i686-linux or on x86_64-linux with -m32.
> This patch addresses these fails by setting the necessary -msse2 flags.
> 
> This patch also streamlines all tests to use dg-options instead
> of dg-additional-options.  This is in line with most other tests
> in gcc.dg/tree-ssa.
> 
> Tested with the following board config in RUNTESTFLAGS:
>   --target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/satd-hadamard.c: Rename dg-additional-options
>   to dg-options.
>   * gcc.dg/tree-ssa/vector-10.c: Rename dg-additional-options
>   to dg-options and add -msse2 to it.
>   * gcc.dg/tree-ssa/vector-11.c: Likewise.
>   * gcc.dg/tree-ssa/vector-8.c: Rename dg-additional-options
>   to dg-options.
>   * gcc.dg/tree-ssa/vector-9.c: Likewise.
> 
> Signed-off-by: Christoph Müllner 

Ok.

Jakub



Re: [PATCH v3 1/2] testsuite: Add tests for PR118149

2024-12-20 Thread Jakub Jelinek
On Fri, Dec 20, 2024 at 05:38:56PM +0100, Christoph Müllner wrote:
> A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
> This patch adds two test cases for PR118149.
> These tests are different than other tests in that one of the
> vec-perm selectors contains indices in descending order (1, 1, 0, 0),
> which is the root cause for the ICE observed in PR118149.
> 
>   PR 118149
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/pr118149-2.c: New test.
>   * gcc.dg/tree-ssa/pr118149.c: New test.

Ok for trunk.

Jakub



[PATCH] arm: [MVE intrinsics] Another fix for moves of tuples (PR target/118131)

2024-12-20 Thread Christophe Lyon
Commit r15-6389-g670df03e5294a3 only partially fixed support for moves
of large modes: despite the introduction of V2x* and V4x* modes in
r15-6245-g4f4e13dd235b to support MVE tuples, we still need to support
TI, OI and XI modes, which appear for instance in gcc.dg/pr100887.c.

The problem was noticed when running the testsuite with
-mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
where several tests would ICE in output_move_neon.

gcc/ChangeLog:

PR target/118131
* config/arm/arm.h (VALID_MVE_STRUCT_MODE): Accept TI, OI and XI
modes again.
---
 gcc/config/arm/arm.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index b2044db938b..4246b34b6e0 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1128,7 +1128,10 @@ extern const int arm_arch_cde_coproc_bits[];
|| (MODE) == CImode || (MODE) == XImode)
 
 #define VALID_MVE_STRUCT_MODE(MODE)\
-  ((MODE) == V2x16QImode   \
+  ((MODE) == TImode\
+   || (MODE) == OImode \
+   || (MODE) == XImode \
+   || (MODE) == V2x16QImode\
|| (MODE) == V2x8HImode \
|| (MODE) == V2x4SImode \
|| (MODE) == V2x8HFmode \
-- 
2.34.1



Re: [PATCH v2] RISC-V: vector absolute difference expander [PR117722]

2024-12-20 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: Vineet Gupta
Date: 2024-12-21 06:18
To: gcc-patches
CC: gnu-toolchain; Robin Dapp; Pan Li; Juzhe Zhong; Jeff Law; Vineet Gupta
Subject: [PATCH v2] RISC-V: vector absolute difference expander [PR117722]
Changes since v1:
- Fix snafu in test.
 
This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.
 
Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.
 
Suggested-by: Robin Dapp 
Co-developed-by: Pan Li 
Signed-off-by: Vineet Gupta 
 
PR target/117722
 
gcc/ChangeLog:
* config/riscv/autovec.md: Add uabd expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr117722.c: New test.
 
Signed-off-by: Vineet Gupta 
---
gcc/config/riscv/autovec.md   | 26 +++
.../gcc.target/riscv/rvv/autovec/pr117722.c   | 23 
2 files changed, 49 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2529dc77f221..4678906fb918 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2928,3 +2928,29 @@
 riscv_vector::expand_strided_store (mode, operands);
 DONE;
   })
+
+; 
+; == Absolute difference (not including sum)
+; 
+(define_expand "uabd3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+rtx max = gen_reg_rtx (mode);
+insn_code icode = code_for_pred (UMAX, mode);
+rtx ops1[] = {max, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
+
+rtx min = gen_reg_rtx (mode);
+icode = code_for_pred (UMIN, mode);
+rtx ops2[] = {min, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
+
+icode = code_for_pred (MINUS, mode);
+rtx ops3[] = {operands[0], max, min};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
+
+DONE;
+  });
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
new file mode 100644
index ..b675930818e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117722.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O2" } */
+
+/* Generate sum of absolute difference as sub (max, min).
+   This helps with x264 sad routines.  */
+
+inline int abs(int i)
+{
+  return (i < 0 ? -i : i);
+}
+
+int pixel_sad_n(unsigned char *pix1, unsigned char *pix2, int n)
+{
+  int sum = 0;
+  for( int i = 0; i < n; i++ )
+   sum += abs(pix1[i] - pix2[i]);
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler {vmin\.v} } } */
+/* { dg-final { scan-assembler {vmax\.v} } } */
+/* { dg-final { scan-assembler {vsub\.v} } } */
-- 
2.43.0
 
 


[PATCH] ifcombine field-merge: improve handling of dwords

2024-12-20 Thread Alexandre Oliva
On Dec 20, 2024, Jakub Jelinek  wrote:

> On Wed, Dec 18, 2024 at 12:59:11AM -0300, Alexandre Oliva wrote:
>> * gcc.dg/field-merge-16.c: New.

> Note the test FAILs on i686-linux or on x86_64-linux with -m32.

Indeed, thanks.  Here's a fix.


On 32-bit hosts, data types with 64-bit alignment aren't getting
treated as desired by ifcombine field-merging: we limit the choice of
modes at BITS_PER_WORD sizes, but when deciding the boundary for a
split, we'd limit the choice only by the alignment, so we wouldn't
even consider a split at an odd 32-bit boundary.  Fix that by limiting
the boundary choice by word choice as well.

Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
data structures unhandled by ifcombine on 32-bit hosts.  We already
need to loading them as double words, and if they're not byte-aligned,
the code gets really ugly, but ifcombine could improve it if it allows
double-word loads as a last resort.  I've added that.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
boundary choice by word size as well.  Try aligned double-word
loads as a last resort.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-17.c: New.
---
 gcc/gimple-fold.cc|   30 +++---
 gcc/testsuite/gcc.dg/field-merge-17.c |   46 +
 2 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-17.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 2d6e2074416f5..0e832158a47b3 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8381,16 +8381,40 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
 {
   /* Consider the possibility of recombining loads if any of the
 fields straddles across an alignment boundary, so that either
-part can be loaded along with the other field.  */
+part can be loaded along with the other field.  Since we
+limit access modes to BITS_PER_WORD, don't exceed that,
+otherwise on a 32-bit host and a 64-bit-aligned data
+structure, we'll fail the above for a field that straddles
+across two words, and would fail here for not even trying to
+split it at between 32-bit words.  */
   HOST_WIDE_INT boundary = compute_split_boundary_from_align
-   (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+   (MIN (ll_align, BITS_PER_WORD),
+ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
 
   if (boundary < 0
  || !get_best_mode (boundary - first_bit, first_bit, 0, ll_end_region,
 ll_align, BITS_PER_WORD, volatilep, &lnmode)
  || !get_best_mode (end_bit - boundary, boundary, 0, ll_end_region,
 ll_align, BITS_PER_WORD, volatilep, &lnmode2))
-   return 0;
+   {
+ if (ll_align <= BITS_PER_WORD)
+   return 0;
+
+ /* As a last resort, try double-word access modes.  This
+enables us to deal with misaligned double-word fields
+that straddle across 3 separate words.  */
+ boundary = compute_split_boundary_from_align
+   (MIN (ll_align, 2 * BITS_PER_WORD),
+ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+ if (boundary < 0
+ || !get_best_mode (boundary - first_bit, first_bit,
+0, ll_end_region, ll_align, 2 * BITS_PER_WORD,
+volatilep, &lnmode)
+ || !get_best_mode (end_bit - boundary, boundary,
+0, ll_end_region, ll_align, 2 * BITS_PER_WORD,
+volatilep, &lnmode2))
+   return 0;
+   }
 
   /* If we can't have a single load, but can with two, figure out whether
 the two compares can be separated, i.e., whether the entirety of the
diff --git a/gcc/testsuite/gcc.dg/field-merge-17.c 
b/gcc/testsuite/gcc.dg/field-merge-17.c
new file mode 100644
index 0..06c8ec16e86c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-17.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-ifcombine-details" } */
+
+/* Check that we can optimize misaligned double-words.  */
+
+struct s {
+  short a;
+  long long b;
+  int c;
+  long long d;
+  short e;
+} __attribute__ ((packed, aligned (8)));
+
+struct s p = { 0, 0, 0, 0, 0 };
+
+__attribute__ ((__noinline__, __noipa__, __noclone__))
+int fp ()
+{
+  if (p.a
+  || p.b
+  || p.c
+  || p.d
+  || p.e)
+return 1;
+  else
+return -1;
+}
+
+int main () {
+  /* Unlikely, but play safe.  */
+  if (sizeof (long long) == sizeof (short))
+return 0;
+  if (fp () > 0)
+__builtin_abort ();
+  unsigned char *pc = (unsigned char *)&p;
+  for (int i = 0; i < sizeof (p); i++)
+{
+  pc[i] = 1;
+  if (fp () < 0)
+   __builtin_abort ()

Re: [r15-6361 Regression] FAIL: gcc.dg/field-merge-16.c scan-tree-dump-times ifcombine "optimizing" 6 on Linux/x86_64

2024-12-20 Thread Alexandre Oliva
On Dec 19, 2024, "haochen.jiang via Gcc-regression" 
 wrote:

> FAIL: gcc.dg/field-merge-16.c scan-tree-dump-times ifcombine "optimizing" 6

ACK, thanks.

I expect each of the following two patches contains a fix for this failure:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672161.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672162.html
They're meant to go together, but each one should suffice.

Apologies for the noise

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] testsuite: generalize ifcombine field-merge tests [PR118025]

2024-12-20 Thread Alexandre Oliva
On Dec 20, 2024, Jakub Jelinek  wrote:

> On Wed, Dec 18, 2024 at 12:59:11AM -0300, Alexandre Oliva wrote:
>> * gcc.dg/field-merge-16.c: New.

> Note the test FAILs on i686-linux or on x86_64-linux with -m32.

Also fixed herein.


A number of tests that check for specific ifcombine transformations
fail on AVR and PRU targets, whose type sizes and alignments aren't
conducive of the expected transformations.  Adjust the expectations.

Most execution tests should run successfully regardless of the
transformations, but a few that could conceivably fail if short and
char have the same bit width now check for that and bypass the tests
that would fail.

Conversely, one test that had such a runtime test, but that would work
regardless, no longer has that runtime test, and its types are
narrowed so that the transformations on 32-bit targets are more likely
to be the same as those that used to take place on 64-bit targets.
This latter change is somewhat obviated by a separate patch, but I've
left it in place anyway.

Regstrapped on x86_64-linux-gnu.  I'd appreciate if someone who can test
AVR and PRU would confirm that it fixes all field-merge-* failures.  Ok
to install?


for  gcc/testsuite/ChangeLog

PR testsuite/118025
* field-merge-1.c: Skip BIT_FIELD_REF counting on AVR and PRU.
* field-merge-3.c: Bypass the test if short doesn't have the
expected size.
* field-merge-8.c: Likewise.
* field-merge-9.c: Likewise.  Skip optimization counting on
AVR and PRU.
* field-merge-14.c: Skip optimization counting on AVR and PRU.
* field-merge-15.c: Likewise.
* field-merge-17.c: Likewise.
* field-merge-16.c: Likewise.  Drop runtime bypass.  Use
smaller types.
---
 gcc/testsuite/gcc.dg/field-merge-1.c  |2 +-
 gcc/testsuite/gcc.dg/field-merge-13.c |2 +-
 gcc/testsuite/gcc.dg/field-merge-14.c |3 ++-
 gcc/testsuite/gcc.dg/field-merge-15.c |2 +-
 gcc/testsuite/gcc.dg/field-merge-16.c |   17 +++--
 gcc/testsuite/gcc.dg/field-merge-17.c |2 +-
 gcc/testsuite/gcc.dg/field-merge-3.c  |2 ++
 gcc/testsuite/gcc.dg/field-merge-8.c  |2 ++
 gcc/testsuite/gcc.dg/field-merge-9.c  |4 +++-
 9 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c 
b/gcc/testsuite/gcc.dg/field-merge-1.c
index 1818e104437e1..4405d40ee79d8 100644
--- a/gcc/testsuite/gcc.dg/field-merge-1.c
+++ b/gcc/testsuite/gcc.dg/field-merge-1.c
@@ -58,7 +58,7 @@ int main () {
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" { target { 
! { avr-*-* pru-*-* } } } } } */
 /* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } 
} } } */
 /* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || 
x86_64-*-* } } } } */
 /* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || 
rs6000-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/field-merge-13.c 
b/gcc/testsuite/gcc.dg/field-merge-13.c
index 7e4f4c499347f..eeef73338f8e5 100644
--- a/gcc/testsuite/gcc.dg/field-merge-13.c
+++ b/gcc/testsuite/gcc.dg/field-merge-13.c
@@ -90,4 +90,4 @@ int main () {
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimizing" 9 "ifcombine" } } */
+/* { dg-final { scan-tree-dump-times "optimizing" 9 "ifcombine" { target { ! { 
avr-*-* pru-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/field-merge-14.c 
b/gcc/testsuite/gcc.dg/field-merge-14.c
index 91d84cfebf196..73259e0936e4e 100644
--- a/gcc/testsuite/gcc.dg/field-merge-14.c
+++ b/gcc/testsuite/gcc.dg/field-merge-14.c
@@ -1,7 +1,8 @@
 /* { dg-do run } */
 /* { dg-options "-O -fdump-tree-ifcombine-details" } */
 
-/* Check that we don't get confused by multiple conversions.  */
+/* Check that we don't get confused by multiple conversions.  Conceivably, we
+   could combine both tests using b, but the current logic won't do that.  */
 
 __attribute__((noipa))
 int f(int *a,int *d)
diff --git a/gcc/testsuite/gcc.dg/field-merge-15.c 
b/gcc/testsuite/gcc.dg/field-merge-15.c
index 34641e893c92f..fc38464527161 100644
--- a/gcc/testsuite/gcc.dg/field-merge-15.c
+++ b/gcc/testsuite/gcc.dg/field-merge-15.c
@@ -33,4 +33,4 @@ int main () {
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimizing" 6 "ifcombine" } } */
+/* { dg-final { scan-tree-dump-times "optimizing" 6 "ifcombine" { target { ! { 
avr-*-* pru-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/field-merge-16.c 
b/gcc/testsuite/gcc.dg/field-merge-16.c
index 2ca23ea663a45..afdaf45b6a949 100644
--- a/gcc/testsuite/gcc.dg/field-merge-16.c
+++ b/gcc/testsuite/gcc.dg/field-merge-16.c
@@ -4,17 +4,17 @@
 /* Check that tests for sign-extension bits are handled correctly.  */
 
 struct s {
-  short a;
-  short b;
-  unsigned short c;
-  unsigned short d;
-} __attribute__ ((aligned (8)));
+  signed char a;
+  signed char b;
+  unsigned char c;
+  u