Re: [PATCH 1/3] Come up with startswith function.

2021-05-10 Thread Richard Biener via Gcc-patches
On Wed, Apr 21, 2021 at 11:39 AM Martin Liška  wrote:
>
> On 4/21/21 9:32 AM, Arnaud Charlet wrote:
> >> gcc/ada/ChangeLog:
> >>
> >>  * adadecode.c (has_prefix): Remove has_prefix and replace it
> >>  with startswith.
> >>  (__gnat_decode): Likewise.
> >
> > This change is not OK: adadecode.c is also a runtime file and as such cannot
> > include compiler include files.
> >
> >>  * gcc-interface/utils.c (def_builtin_1): Use startswith
> >>  function instead of strncmp.
> >
> >>  * init.c (__gnat_install_handler): Likewise.
> >
> > Same for init.c which is both a host and a runtime/target file.
> >
> > Only the change in utils.c is OK.
> >
> > Arno
> >
>
> Thank you for a quick reply.
>
> There's an updated version of the patch.

OK.

Thanks,
Richard.

> Cheers,
> Martin


Re: [PATCH] arm: Fix wrong code with MVE V2DImode loads and stores [PR99960]

2021-05-10 Thread Alex Coplan via Gcc-patches
Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568669.html

On 26/04/2021 11:15, Alex Coplan via Gcc-patches wrote:
> Hi,
> 
> As the PR shows, we currently miscompile V2DImode loads and stores for
> MVE. We're currently using 64-bit loads/stores, but need to be using
> 128-bit vector loads and stores.
> 
> Some intrinsics tests were checking that we (incorrectly) used the
> 64-bit loads/stores: these have been updated.
> 
> Regression tested an arm-eabi cross configured
> --with-arch=armv8.1-m.main+mve --with-float=hard. The patch has the
> following effect on the testsuite:
> 
> FAIL->PASS: c-c++-common/torture/vector-compare-1.c   -O0  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O0  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O1  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O3 -g  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -Os  execution test
> FAIL->PASS: gcc.dg/compat/vector-1 c_compat_x_tst.o-c_compat_y_tst.o execute
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O1  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -Os  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-1.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-2.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-3.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-4.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O1  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O3 -g  execution test FAIL->PASS: 
> gcc.dg/torture/pr58041.c   -Os  execution test
> FAIL->PASS: gcc.dg/torture/pr61346.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.dg/torture/pr61346.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c   -O2  execution test
> 
> Bootstrap and regtest on arm-linux-gnueabihf in progress.
> 
> OK for trunk and eventual backports to 11 and 10 branches if regstrap
> looks good?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/99960
>   * config/arm/mve.md (*mve_mov): Simplify output code. Use
>   vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/99960
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
>   Update now that we're (correctly) using full 128-bit vector
>   loads/stores.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c:
>   Likewise.


RE: [PATCH] arm: Fix wrong code with MVE V2DImode loads and stores [PR99960]

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Alex Coplan 
> Sent: 26 April 2021 11:15
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Fix wrong code with MVE V2DImode loads and stores
> [PR99960]
> 
> Hi,
> 
> As the PR shows, we currently miscompile V2DImode loads and stores for
> MVE. We're currently using 64-bit loads/stores, but need to be using
> 128-bit vector loads and stores.
> 
> Some intrinsics tests were checking that we (incorrectly) used the
> 64-bit loads/stores: these have been updated.
> 
> Regression tested an arm-eabi cross configured
> --with-arch=armv8.1-m.main+mve --with-float=hard. The patch has the
> following effect on the testsuite:
> 
> FAIL->PASS: c-c++-common/torture/vector-compare-1.c   -O0  execution
> test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O0  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O1  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2 -flto -fno-use-linker-
> plugin -flto-partition=none  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O2 -flto -fuse-linker-plugin -
> fno-fat-lto-objects  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -O3 -g  execution test
> FAIL->PASS: gcc.c-torture/execute/pr92618.c   -Os  execution test
> FAIL->PASS: gcc.dg/compat/vector-1 c_compat_x_tst.o-c_compat_y_tst.o
> execute
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O1  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2 -flto -fno-use-linker-plugin -flto-
> partition=none  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O2 -flto -fuse-linker-plugin -fno-fat-
> lto-objects  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O3 -fomit-frame-pointer -funroll-
> loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/torture/pr52407.c   -Os  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-1.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-2.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-3.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr57748-4.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O0  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O1  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2 -flto -fno-use-linker-plugin -flto-
> partition=none  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O2 -flto -fuse-linker-plugin -fno-fat-
> lto-objects  execution test
> FAIL->PASS: gcc.dg/torture/pr58041.c   -O3 -g  execution test FAIL->PASS:
> gcc.dg/torture/pr58041.c   -Os  execution test
> FAIL->PASS: gcc.dg/torture/pr61346.c   -O3 -fomit-frame-pointer -funroll-
> loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.dg/torture/pr61346.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c   -O2  execution test
> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c   -O2  execution test
> 
> Bootstrap and regtest on arm-linux-gnueabihf in progress.
> 
> OK for trunk and eventual backports to 11 and 10 branches if regstrap
> looks good?

Ok. I had looked at it earlier but had forgotten to reply...
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/99960
>   * config/arm/mve.md (*mve_mov): Simplify output code.
> Use
>   vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/99960
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
>   Update now that we're (correctly) using full 128-bit vector
>   loads/stores.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c:
>   Likewise.


[PATCH] Fix awk substr invocation in libgo buildsystem

2021-05-10 Thread Christoph Höger
The awk script used a zero-based index which worked on surprisingly
many plattforms. According to the man page, however, the function
expects one-based indexing.

For reference see this bug in the go git repository:

https://github.com/golang/go/issues/45843

Signed-off-by: Christoph Höger 
---
 ChangeLog | 4 
 libgo/mklinknames.awk | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 2174ab1ea90..495e6f79b76 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2021-05-10  Christoph Höger  
+
+   * libgo/mklinknames.awk: Fix awk substr invocation
+
 2021-05-04  Nick Clifton  
 
* configure.ac (AC_PROG_CC): Replace with AC_PROG_CC_C99.
diff --git a/libgo/mklinknames.awk b/libgo/mklinknames.awk
index 71cb3be7966..0e49c07349e 100644
--- a/libgo/mklinknames.awk
+++ b/libgo/mklinknames.awk
@@ -37,7 +37,7 @@ BEGIN {
 # The goal is to extract "__timegm50".
 if ((def | getline fndef) > 0 && match(fndef, "__asm__\\(\"\\*?")) {
asmname = substr(fndef, RSTART + RLENGTH)
-   asmname = substr(asmname, 0, length(asmname) - 2)
+   asmname = substr(asmname, 1, length(asmname) - 2)
printf("//go:linkname %s %s\n", gofnname, asmname)
 } else {
# Assume the asm name is the same as the declared C name.
-- 
2.31.1



Re: RFC: Changing AC_PROG_CC to AC_PROG_CC_C99 in top level configure

2021-05-10 Thread Iain Sandoe

Alan Modra  wrote:


On Wed, May 05, 2021 at 08:05:29AM +0100, Iain Sandoe wrote:

Alan Modra via Gcc-patches  wrote:

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 97d6f3863cb..cc3b1b6d666 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -73,8 +73,8 @@ struct stringop_algs
{
 const enum stringop_alg unknown_size;
 const struct stringop_strategy {
-const int max;
-const enum stringop_alg alg;
+int max;
+enum stringop_alg alg;
   int noalign;
 } size [MAX_STRINGOP_ALGS];
};


does this relate to / fix PR 100246 (which seems to fire for some GCC
versions as well
as older clang)?


Yes, looks like the same issue.  I started making a similar fix to the
one you attached to the PR, then laziness kicked in after noticing the
errors were only given on the const elements.


I added a third variant to the PR (as below), which preserves the const-ness
but provides a CTOR.  TBH, I have no especial preference for the solution,
but it would be nice to commit one of them :-)

cheers
Iain

The condition is because this header gets pulled in by gcov stuff which is  
built

with a C compiler.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 97d6f38..a417c93 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -73,6 +73,11 @@ struct stringop_algs
 {
   const enum stringop_alg unknown_size;
   const struct stringop_strategy {
+#ifdef __cplusplus
+stringop_strategy(int _max = -1, enum stringop_alg _alg = libcall,
+ int _noalign = false)
+  : max (_max), alg (_alg), noalign (_noalign) {}
+#endif
 const int max;
 const enum stringop_alg alg;
 int noalign;




Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> Ready for master?

This breaks the build for me:

make[3]: *** No rule to make target '/home/eric/cvs/gcc/gcc/version.c', needed 
by 'ada/stamp-sdefault'.  Stop.
make[3]: *** Waiting for unfinished jobs

-- 
Eric Botcazou




Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> Ready for master?

/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: ada/
gnatvsn.o: in function `gnatvsn__gnat_version_string':
/home/eric/cvs/gcc/gcc/ada/gnatvsn.adb:67: undefined reference to 
`version_string'
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /
home/eric/cvs/gcc/gcc/ada/gnatvsn.adb:69: undefined reference to 
`version_string'
collect2: error: ld returned 1 exit status
make[3]: *** [/home/eric/cvs/gcc/gcc/ada/gcc-interface/Make-lang.in:691: 
gnatbind] Error 1
make[3]: *** Waiting for unfinished jobs
rm gcov.pod fsf-funding.pod lto-dump.pod gfdl.pod gpl.pod cpp.pod gcov-
dump.pod gcc.pod gcov-tool.pod
make[3]: Leaving directory '/home/eric/build/gcc/native/gcc'
make[2]: *** [Makefile:4781: all-stage1-gcc] Error 2

ada/gnatvsn.adb imports version_string from version.c

-- 
Eric Botcazou





Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Martin Liška

On 5/10/21 11:01 AM, Eric Botcazou wrote:

Ready for master?


/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: ada/
gnatvsn.o: in function `gnatvsn__gnat_version_string':
/home/eric/cvs/gcc/gcc/ada/gnatvsn.adb:67: undefined reference to
`version_string'
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /
home/eric/cvs/gcc/gcc/ada/gnatvsn.adb:69: undefined reference to
`version_string'
collect2: error: ld returned 1 exit status
make[3]: *** [/home/eric/cvs/gcc/gcc/ada/gcc-interface/Make-lang.in:691:
gnatbind] Error 1
make[3]: *** Waiting for unfinished jobs
rm gcov.pod fsf-funding.pod lto-dump.pod gfdl.pod gpl.pod cpp.pod gcov-
dump.pod gcc.pod gcov-tool.pod
make[3]: Leaving directory '/home/eric/build/gcc/native/gcc'
make[2]: *** [Makefile:4781: all-stage1-gcc] Error 2

ada/gnatvsn.adb imports version_string from version.c



Sorry for the breakage. Apparently, we'll still need a version.c file in ada
folder (as it's imported in gcc/ada/gnatvsn.adb
). Using the attached patch I get to:

../../gnatbind -I../rts -I. -I/home/marxin/Programming/gcc/gcc/ada -I- -I../rts 
-I. -I/home/marxin/Programming/gcc/gcc/ada -static -x -x 
/dev/shm/objdir/gcc/ada/tools/gnatclean.ali

../../gnatlink -v gnatcmd -o ../../gnat \

  --GCC="../../xgcc -B../../ -I- -I../rts -I. -I/home/marxin/Programming/gcc/gcc/ada" 
--LINK="../../xg++ -B../../ -B../../../x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B../../../x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
-L../../../x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L../../../x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -static-libstdc++ -static-libgcc 
-static-libstdc++ -static-libgcc " ../link.o ../targext.o ../../ggc-none.o 
../../libcommon-target.a ../../libcommon.a ../../../libcpp/libcpp.a ../rts/libgnat.a   
../../../libbacktrace/.libs/libbacktrace.a ../../../libiberty/libiberty.a   -no-pie



GNATLINK 12.0.0 20210510 (experimental)

Copyright (C) 1995-2021, Free Software Foundation, Inc.

xgcc -c -gnatA -gnatWb -gnatiw -B../../ -I- -I../rts -I. 
-I/home/marxin/Programming/gcc/gcc/ada -gnatws 
/dev/shm/objdir/gcc/ada/tools/b~gnatcmd.adb

/dev/shm/objdir/gcc/xg++ b~gnatcmd.o ../link.o ../targext.o ../../ggc-none.o 
../rts/ada.o ../rts/a-charac.o ../rts/a-chlat1.o ../rts/interfac.o 
../rts/system.o ../rts/s-addope.o ../rts/s-imgint.o ../rts/s-io.o 
../rts/s-parame.o ../rts/s-crtl.o ../rts/i-cstrea.o ../rts/s-stoele.o 
../rts/s-stache.o ../rts/s-strhas.o ../rts/s-htable.o ../rts/s-string.o 
../rts/s-traent.o ../rts/s-unstyp.o ../rts/s-imguns.o ../rts/s-wchcon.o 
../rts/s-wchjis.o ../rts/s-wchcnv.o ../rts/s-carun8.o ../rts/s-conca2.o 
../rts/s-traceb.o ../rts/s-excdeb.o ../rts/s-valuti.o ../rts/s-valllu.o 
../rts/s-vallli.o ../rts/s-wchstw.o ../rts/a-elchha.o ../rts/a-exctra.o 
../rts/s-addima.o ../rts/s-bitops.o ../rts/s-boustr.o ../rts/s-casuti.o 
../rts/s-exctab.o ../rts/a-contai.o ../rts/a-ioexce.o ../rts/a-string.o 
../rts/a-strmap.o ../rts/a-stmaco.o ../rts/i-c.o ../rts/s-except.o 
../rts/s-excmac.o ../rts/a-chahan.o ../rts/s-exctra.o ../rts/s-memory.o 
../rts/s-mmap.o ../rts/s-mmauni.o ../rts/s-mmosin.o ../rts/s-objrea.o 
../rts/s-dwalin.o ../rts/s-os_lib.o ../rts/s-secsta.o ../rts/s-soliin.o 
../rts/s-soflin.o ../rts/s-stalib.o ../rts/s-trasym.o ../rts/a-except.o 
../rts/a-assert.o ../rts/a-comlin.o ../rts/a-tags.o ../rts/a-stream.o 
../rts/gnat.o ../rts/g-htable.o ../rts/g-os_lib.o ../rts/s-ficobl.o 
../rts/s-finroo.o ../rts/a-finali.o ../rts/s-fileio.o ../rts/s-vaenu8.o 
../rts/a-textio.o ../rts/s-assert.o ./debug.o ./types.o ./alloc.o ./gnatvsn.o 
./hostparm.o ./opt.o ./csets.o ./output.o ./rident.o ./table.o ./widechar.o 
./namet.o ./fmap.o ./sdefault.o ./targparm.o ./osint.o ./switch.o ./usage.o 
./gnatcmd.o ../../libcommon-target.a ../../libcommon.a ../../../libcpp/libcpp.a 
../rts/libgnat.a ../../../libbacktrace/.libs/libbacktrace.a 
../../../libiberty/libiberty.a -no-pie -o ../../gnat -L../rts/ -L./ 
-L/home/marxin/Programming/gcc/gcc/ada/ 
-L/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/12.0.0/adalib/ 
/dev/shm/objdir/gcc/ada/rts/libgnat.a -ldl -B../../ 
-B../../../x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B../../../x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
-L../../../x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L../../../x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -static-libstdc++ 
-static-libgcc -static-libstdc++ -static-libgcc

/usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: 
./gnatvsn.o: in function `gnatvsn__gnat_version_string':

/home/marxin/Programming/gcc/gcc/ada/gnatvsn.adb:69: undefined reference to 
`gnat_version_string'

/usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: 
/home/marxin/Programming/gcc/gcc/ada/gnatvsn.adb:67: undefined reference to 
`gnat_version_string'

collect2: error: ld returned 1 exit status


Do you know Eric where version.o nee

Re: [GOVERNANCE] Where to file complaints re project-maintainers?

2021-05-10 Thread Jakub Jelinek via Gcc-patches
On Sun, May 09, 2021 at 07:48:50PM -0700, Ian Lance Taylor via Gcc-patches 
wrote:
> On Sun, May 9, 2021 at 8:33 AM abebeos  
> wrote:
> >
> > To me this sounds quite like an "disorganized mess, where bullies, abusers 
> > and even IT-fascists can thrive".
> >
> > It is clear to me that some gcc project maintainers, the steering committee 
> > and bountysource are crossing ethical (if not legal) boundaries.
> 
> The GCC project maintainers and the steering committee are definitely
> not crossing ethical or legal boundaries here.
> 
> I don't know anything about Bountysource.  Bountysource is completely
> separate from GCC.  It appears from your link that John Paul Adrian
> Glaubitz posted a bounty for some GCC work.  A number of people and
> organizations supported the bounty, but the GCC project itself did
> not.  Although the work is for GCC, the GCC project has nothing to do
> with that bounty.  That is handled entirely by Bountysource.

Yeah, all that happened on the GCC project side is the agreement
to deprecate and eventually remove ports that still rely on internal
details that were obsolete 20 years ago, see
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01256.html
and then patch review of changes that were posted to gcc-patches.
The GCC reviewers review posted patches based on the technical
merits and whether copyright assignment for parts that require copyright
assignment is available, regardless of whether the people who submit their
work did the work in their spare time without being compensated for it,
whether their employers compensated them for it, whether they got contracted by
some company for that work or other means (e.g. bountysource).
All that is outside of the scope of the GCC project.
Bountysource AFAIK has its own terms and rules and I believe ultimately it
is the people who donated money for it that vote about that.

Jakub



[PATCH] Fix awk substr invocation in libgo buildsystem

2021-05-10 Thread Christoph Höger
The awk script used a zero-based index which worked on surprisingly
many plattforms. According to the man page, however, the function
expects one-based indexing.

For reference see this bug in the go git repository:

https://github.com/golang/go/issues/45843

Signed-off-by: Christoph Höger 
---
 ChangeLog | 4 
 libgo/mklinknames.awk | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 2174ab1ea90..495e6f79b76 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2021-05-10  Christoph Höger  
+
+   * libgo/mklinknames.awk: Fix awk substr invocation
+
 2021-05-04  Nick Clifton  
 
* configure.ac (AC_PROG_CC): Replace with AC_PROG_CC_C99.
diff --git a/libgo/mklinknames.awk b/libgo/mklinknames.awk
index 71cb3be7966..0e49c07349e 100644
--- a/libgo/mklinknames.awk
+++ b/libgo/mklinknames.awk
@@ -37,7 +37,7 @@ BEGIN {
 # The goal is to extract "__timegm50".
 if ((def | getline fndef) > 0 && match(fndef, "__asm__\\(\"\\*?")) {
asmname = substr(fndef, RSTART + RLENGTH)
-   asmname = substr(asmname, 0, length(asmname) - 2)
+   asmname = substr(asmname, 1, length(asmname) - 2)
printf("//go:linkname %s %s\n", gofnname, asmname)
 } else {
# Assume the asm name is the same as the declared C name.
-- 
2.31.1



Re: [PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings

2021-05-10 Thread Chung-Lin Tang

On 2021/5/7 8:35 PM, Thomas Schwinge wrote:

On 2021-05-05T23:17:25+0800, Chung-Lin Tang via 
Gcc-patches  wrote:

This patch implements relaxing the requirements when a map with the implicit 
attribute encounters
an overlapping existing map.  [...]

Oh, oh, these data mapping interfaces/semantics ares getting more and
more "convoluted"...  %-\ (Not your fault, of course.)

Haven't looked in too much detail in the patch/implementation (I'm not
very well-versend in the exact OpenMP semantics anyway), but I suppose we
should do similar things for OpenACC, too.  I think we even currently do
have a gimplification-level "hack" to replicate data clauses' array
bounds for implicit data clauses on compute constructs, if the default
"complete" mapping is going to clash with a "limited" mapping that's
specified in an outer OpenACC 'data' directive.  (That, of course,
doesn't work for the general case of non-lexical scoping, or dynamic
OpenACC 'enter data', etc., I suppose) I suppose your method could easily
replace and improve that; we shall look into that later.

That said, in your patch, is this current implementation (explicitly)
meant or not meant to be active for OpenACC, too, or just OpenMP (I
couldn't quickly tell), and/or is it (implicitly?) a no-op for OpenACC?


It appears that I have inadvertently enabled it for OpenACC as well!
But everything was tested together, so I assume it works okay for that mode as 
well.

The entire set of implicit-specific actions are enabled by the setting of
'OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1' inside 
gimplify.c:gimplify_adjust_omp_clauses_1,
so in case you want to disable it for OpenACC again, that's where you need to 
add the guard condition.


Also, another adjustment in this patch is how implicitly created clauses are 
added to the current
clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
new clauses to the end,
this patch adds them at the position "after initial non-map clauses, but right 
before any existing
map clauses".

Probably you haven't been testing such a configuration; I've just pushed
"Fix up 'c-c++-common/goacc/firstprivate-mappings-1.c' for C, non-LP64"
to devel/omp/gcc-10 branch in commit
c51cc3b96f0b562deaffcfbcc51043aed216801a, see attached.


Thanks, I was relying on eyeballing to know where to fix testcases like this;
I did fix another similar case, but missed this one.




The reason for this is: when combined with other map clauses, for example:

#pragma omp target map(rec.ptr[:N])
for (int i = 0; i < N; i++)
  rec.ptr[i] += 1;

There will be an implicit map created for map(rec), because of the access 
inside the target region.
The expectation is that 'rec' is implicitly mapped, and then the pointed 
array-section part by 'rec.ptr'
will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' 
(in that order).

If the implicit 'map(rec)' is appended to the end, instead of placed before 
other maps, the attachment
operation will not find anything to attach to, and the entire region will fail.

But that doesn't (negatively) affect user-visible semantics (OpenMP, and
also OpenACC, if applicable), in that more/bigger objects then get mapped
than were before?  (I suppose not?)


It probably won't affect user level semantics, although we should look out if 
this change in convention
exposes some other bugs.

Chung-Lin


Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> "H.J. Lu via Gcc-patches"  writes:
>> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  wrote:
>> >>
>> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
>> >>  wrote:
>> >> >
>> >> > "H.J. Lu via Gcc-patches"  writes:
>> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
>> >> > >  wrote:
>> >> > >>
>> >> > >> "H.J. Lu via Gcc-patches"  writes:
>> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers so 
>> >> > >> > that
>> >> > >> > associated hard registers can be properly spilled onto stack.  But 
>> >> > >> > there
>> >> > >> > are cases where associated hard registers will never be spilled 
>> >> > >> > onto
>> >> > >> > stack.  gen_reg_rtx is changed to take an argument for register 
>> >> > >> > alignment
>> >> > >> > so that stack realignment can be avoided when not needed.
>> >> > >>
>> >> > >> How is it guaranteed that they will never be spilled though?
>> >> > >> I don't think that that guarantee exists for any kind of pseudo,
>> >> > >> except perhaps for the temporary pseudos that the RA creates to
>> >> > >> replace (match_scratch …)es.
>> >> > >>
>> >> > >
>> >> > > The caller of creating pseudo registers with specific alignment must
>> >> > > guarantee that they will never be spilled.   I am only using it in
>> >> > >
>> >> > >   /* Make operand1 a register if it isn't already.  */
>> >> > >   if (can_create_pseudo_p ()
>> >> > >   && !register_operand (op0, mode)
>> >> > >   && !register_operand (op1, mode))
>> >> > > {
>> >> > >   /* NB: Don't increase stack alignment requirement when forcing
>> >> > >  operand1 into a pseudo register to copy data from one memory
>> >> > >  location to another since it doesn't require a spill.  */
>> >> > >   emit_move_insn (op0,
>> >> > >   force_reg (GET_MODE (op0), op1,
>> >> > >  (UNITS_PER_WORD * BITS_PER_UNIT)));
>> >> > >   return;
>> >> > > }
>> >> > >
>> >> > > for vector moves.  RA shouldn't spill it.
>> >> >
>> >> > But this is the point: it's a case of hoping that the RA won't spill it,
>> >> > rather than having a guarantee that it won't.
>> >> >
>> >> > Even if the moves start out adjacent, they could be separated by later
>> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
>> >> > scheduling
>> >> > isn't enabled by default for x86, but it can still be enabled 
>> >> > explicitly.)
>> >> > Or if the same data is being copied to two locations, we might reuse
>> >> > values loaded by the first copy for the second copy as well.
>> >
>> > There are cases where pseudo vector registers are created as pure
>> > temporary registers in the backend and they shouldn't ever be spilled
>> > to stack.   They will be spilled to stack only if there are other 
>> > non-temporary
>> > vector register usage in which case stack will be properly re-aligned.
>> > Caller of creating pseudo registers with specific alignment guarantees
>> > that they are used only as pure temporary registers.
>>
>> I don't think there's really a distinct category of pure temporary
>> registers though.  The things I mentioned above can happen for any
>> kind of pseudo register.
>
> I wonder if for the cases HJ thinks of it is appropriate to use hardregs?
> Do we generally handle those well?  That is, are they again subject
> to be allocated by RA when no longer live?

Yeah, using hard registers should work.  Of course, any given fixed choice
of hard register has the potential to be suboptimal in some situation,
but it should be safe.

Thanks,
Richard


[PATCH] Fix genversion linker error.

2021-05-10 Thread Martin Liška

When renaming gcov-iov to genversion I forgot one hunk that led

to the following linker error:



ld: error: build/genversion.o: requires dynamic R_X86_64_32 reloc which may 
overflow at runtime; recompile with -fPIC



It's a copy&paste error and I'm going to push it.



Martin

gcc/ChangeLog:

* Makefile.in: Add missing genversion rule.
---
 gcc/Makefile.in | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8091057a8a3..487db220d8c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3032,6 +3032,10 @@ CFLAGS-build/genversion.o += -DBASEVER=$(BASEVER_s) 
-DDATESTAMP=$(DATESTAMP_s) \
 
 build/genversion.o: genversion.c $(BCONFIG_H) $(SYSTEM_H)
 
+build/genversion$(build_exeext): build/genversion.o

+   +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) \
+   build/genversion.o -o $@
+
 version.h: s-version; @true
 s-version: build/genversion$(build_exeext)
build/genversion$(build_exeext) > tmp-version.h
--
2.31.1



Re: [PATCH] middle-end/100464 - avoid spurious TREE_ADDRESSABLE in folding debug stmts

2021-05-10 Thread Richard Biener
On Sun, 9 May 2021, Jason Merrill wrote:

> On 5/7/21 6:21 AM, Richard Biener wrote:
> > On Fri, May 7, 2021 at 12:17 PM Richard Biener  wrote:
> >>
> >> canonicalize_constructor_val was setting TREE_ADDRESSABLE on bases
> >> of ADDR_EXPRs but that's futile when we're dealing with CTOR values
> >> in debug stmts.  This rips out the code which was added for Java
> >> and should have been an assertion when we didn't have debug stmts.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages
> >> which revealed PR100468 for which I added the cp/class.c hunk below.
> >> Re-testing with that in progress.
> >>
> >> OK for trunk and branch?  It looks like this C++ code is new in GCC 11.
> > 
> > I mislooked, the code is old.
> > 
> > This hunk also breaks (or fixes) g++.dg/tree-ssa/array-temp1.C where
> > the gimplifier previously passes the
> > 
> > && (flag_merge_constants >= 2 || !TREE_ADDRESSABLE (object))
> > 
> > check guarding it against unifying addresses of different instances
> > of variables.  Clearly in the case of the testcase there are addresses to
> > this variable as part of the initializer list construction.  So the hunk
> > fixes
> > wrong-code, but it breaks the testcase.
> > 
> > Any comments?  I can of course change the testcase accordingly.
> 
> Hmm, I suppose if the optimization is wrong for PR38615, it's also wrong for
> the initializer_list variant:
> 
> extern "C" void abort (void);
> #include 
> 
> int f(int t, const int *a)
> {
>   std::initializer_list b = { 1, 2, 3 };
>   const int *p = b.begin();
>   if (!t)
> return f (1, p);
>  return p == a;
> }
> 
> int main(void)
> {
>  if (f(0, 0))
>abort ();
>  return 0;
> }
> 
> so adjusting the array-temp testcase seems like the right answer.

Done as follows.  Bootstrapped / tested on x86_64-unknown-linux-gnu,
pushed to trunk sofar.

Richard.

>From a076632e274abe344ca7648b7c7f299273d4cbe0 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 7 May 2021 09:51:18 +0200
Subject: [PATCH] middle-end/100464 - avoid spurious TREE_ADDRESSABLE in
 folding debug stmts
To: gcc-patches@gcc.gnu.org

canonicalize_constructor_val was setting TREE_ADDRESSABLE on bases
of ADDR_EXPRs but that's futile when we're dealing with CTOR values
in debug stmts.  This rips out the code which was added for Java
and should have been an assertion when we didn't have debug stmts.
To not regress g++.dg/tree-ssa/array-temp1.C we have to adjust the
testcase to not look for a no longer applied invalid optimization.

2021-05-10  Richard Biener  

PR middle-end/100464
PR c++/100468
gcc/
* gimple-fold.c (canonicalize_constructor_val): Do not set
TREE_ADDRESSABLE.

gcc/cp/
* call.c (set_up_extended_ref_temp): Mark the temporary
addressable if the TARGET_EXPR was.

gcc/testsuite/
* gcc.dg/pr100464.c: New testcase.
* g++.dg/tree-ssa/array-temp1.C: Adjust.
---
 gcc/cp/call.c   |  2 ++
 gcc/gimple-fold.c   |  4 +++-
 gcc/testsuite/g++.dg/tree-ssa/array-temp1.C |  6 --
 gcc/testsuite/gcc.dg/pr100464.c | 16 
 4 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100464.c

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d985e4e8eda..f07e09a36d1 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -12478,6 +12478,8 @@ set_up_extended_ref_temp (tree decl, tree expr, 
vec **cleanups,
  VAR.  */
   if (TREE_CODE (expr) != TARGET_EXPR)
 expr = get_target_expr (expr);
+  else if (TREE_ADDRESSABLE (expr))
+TREE_ADDRESSABLE (var) = 1;
 
   if (TREE_CODE (decl) == FIELD_DECL
   && extra_warnings && !TREE_NO_WARNING (decl))
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index aa33779b753..768ef89d876 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -245,7 +245,9 @@ canonicalize_constructor_val (tree cval, tree from_decl)
   if (TREE_TYPE (base) == error_mark_node)
return NULL_TREE;
   if (VAR_P (base))
-   TREE_ADDRESSABLE (base) = 1;
+   /* ???  We should be able to assert that TREE_ADDRESSABLE is set,
+  but since the use can be in a debug stmt we can't.  */
+   ;
   else if (TREE_CODE (base) == FUNCTION_DECL)
{
  /* Make sure we create a cgraph node for functions we'll reference.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/array-temp1.C 
b/gcc/testsuite/g++.dg/tree-ssa/array-temp1.C
index 97c2e0521c9..3df7aadd30a 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/array-temp1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/array-temp1.C
@@ -13,9 +13,3 @@ int f()
   using AR = const int[];
   return AR{ 1,42,3,4,5,6,7,8,9,0 }[5];
 }
-
-int g()
-{
-  std::initializer_list a = {1,42,3};
-  return a.begin()[0];
-}
diff --git a/gcc/testsuite/gcc.dg/pr100464.c b/gcc/testsuite/gcc.dg/pr100464.c
new file mode 100644
index 000..46cc37dff54
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100464.c
@@ -0,0 +1,

[PATCH][PUSHED] gcc_update: fix check for local source tree.

2021-05-10 Thread Martin Liška

Pushed as obvious.

Thanks,
Martin

contrib/ChangeLog:

* gcc_update: Start using reload.c instead of version.c.
---
 contrib/gcc_update | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 45a27b76cc3..04dc4907fd9 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -243,7 +243,7 @@ apply_patch () {
 }
 
 # Check whether this indeed looks like a local tree.

-if [ ! -f gcc/version.c ]; then
+if [ ! -f gcc/reload.c ]; then
 echo "This does not seem to be a GCC tree!"
 exit
 fi
--
2.31.1



[committed] libphobos: Fix visibility of std.process.searchPathFor

2021-05-10 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adjusts the visibility of std.process.searchPathFor so it can
be used from other modules in the phobos library.  In particular, this
symbol is used by std.file.thisExePath on OpenBSD.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline and backported to releases/gcc-11.

Regards,
Iain.

---
libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos 32cfe9b61.
---
 libphobos/src/MERGE | 2 +-
 libphobos/src/std/process.d | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libphobos/src/MERGE b/libphobos/src/MERGE
index 6f9740404ef..49622c5c548 100644
--- a/libphobos/src/MERGE
+++ b/libphobos/src/MERGE
@@ -1,4 +1,4 @@
-e6907ff3e28d3c43469c46df4a0426726ecb8631
+32cfe9b61570d52d9885b0208fd20de0d351b51e
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/phobos repository.
diff --git a/libphobos/src/std/process.d b/libphobos/src/std/process.d
index 9cbeca8e9a8..63ec49365b9 100644
--- a/libphobos/src/std/process.d
+++ b/libphobos/src/std/process.d
@@ -887,7 +887,7 @@ version (Windows) @system unittest
 // Searches the PATH variable for the given executable file,
 // (checking that it is in fact executable).
 version (Posix)
-private string searchPathFor(in char[] executable)
+package(std) string searchPathFor(in char[] executable)
 @trusted //TODO: @safe nothrow
 {
 import std.algorithm.iteration : splitter;
-- 
2.27.0



[committed] d: Fix qualifier ignored in alias definition if parentheses are not present

2021-05-10 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes a regression where the qualifier was ignored in an
alias definition if parentheses were not present.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline and backported to releases/gcc-11.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd b7d146c4c.
---
 gcc/d/dmd/MERGE   | 2 +-
 gcc/d/dmd/dsymbolsem.c| 7 +--
 gcc/testsuite/gdc.test/compilable/test21898.d | 7 +++
 3 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/test21898.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 86fb308d759..d29d462f42f 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-0450061c8de71328815da9323bd35c92b37d51d2
+b7d146c4c34469f876a63f26ff19091a7f9d54d7
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/dsymbolsem.c b/gcc/d/dmd/dsymbolsem.c
index 33f74edf6ba..7a44ed2c41d 100644
--- a/gcc/d/dmd/dsymbolsem.c
+++ b/gcc/d/dmd/dsymbolsem.c
@@ -4880,8 +4880,11 @@ static void aliasInstanceSemantic(TemplateInstance 
*tempinst, Scope *sc, Templat
 
 TemplateTypeParameter *ttp = 
(*tempdecl->parameters)[0]->isTemplateTypeParameter();
 Type *ta = isType(tempinst->tdtypes[0]);
-Declaration *d = new AliasDeclaration(tempinst->loc, ttp->ident, 
ta->addMod(tempdecl->onemember->isAliasDeclaration()->type->mod));
-d->storage_class |= STCtemplateparameter;
+AliasDeclaration *ad = tempdecl->onemember->isAliasDeclaration();
+
+// Note: qualifiers can be in both 'ad.type.mod' and 'ad.storage_class'
+Declaration *d = new AliasDeclaration(tempinst->loc, ttp->ident, 
ta->addMod(ad->type->mod));
+d->storage_class |= STCtemplateparameter | ad->storage_class;
 dsymbolSemantic(d, sc);
 
 paramscope->pop();
diff --git a/gcc/testsuite/gdc.test/compilable/test21898.d 
b/gcc/testsuite/gdc.test/compilable/test21898.d
new file mode 100644
index 000..9ac18b8cb6b
--- /dev/null
+++ b/gcc/testsuite/gdc.test/compilable/test21898.d
@@ -0,0 +1,7 @@
+// https://issues.dlang.org/show_bug.cgi?id=21898
+
+alias Works(T) = immutable(T);
+alias Fails(T) = immutable T;
+
+static assert(is(Works!int == immutable int));
+static assert(is(Fails!int == immutable int));
-- 
2.27.0



Re: [Patch] contrib/gcc-changelog: Add/improve --help

2021-05-10 Thread Martin Liška

On 5/7/21 11:28 AM, Tobias Burnus wrote:

Hi all, hi Martin,

when running the scripts manually, I tend to get confused
which one is which.  --help helps a bit :-)


Good idea.

I see some "Q000 Remove bad quotes" flake8 errors:
Please use rather multiline python string:

"""
first_line
second_line
...
"""

Martin



OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf





Re: [Patch] contrib/gcc-changelog: Detect if same file appears twice

2021-05-10 Thread Martin Liška

On 5/7/21 12:39 PM, Tobias Burnus wrote:

Test for a copyed-but-not-fully-edited error.

OK?


Yes, please install it.

Thanks,
Martin



Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf




[PATCH] tree-optimization/100492 - avoid irreducible regions in loop distribution

2021-05-10 Thread Richard Biener
When we distribute away a condition we rely on the ability to
change it to either 1 != 0 or 0 != 0 depending on the direction
of the exit branch in the respective loop.  But when the loop
contains an irreducible sub-region then for the conditions inside
this this fails and can lead to infinite loops being generated.

Avoid distibuting loops with irreducible sub-regions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

Richard.

2021-05-10  Richard Biener  

PR tree-optimization/100492
* tree-loop-distribution.c (find_seed_stmts_for_distribution):
Find nothing when the loop contains an irreducible region.

* gcc.dg/torture/pr100492.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr100492.c | 26 +
 gcc/tree-loop-distribution.c| 10 ++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr100492.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr100492.c 
b/gcc/testsuite/gcc.dg/torture/pr100492.c
new file mode 100644
index 000..75229c8813b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr100492.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-loop-distribution" } */
+
+extern void abort (void);
+
+signed char a, c;
+int b, d, *e = &d, g;
+signed static char f;
+int main() {
+  int h = 0;
+  int a_ = a;
+  for (; a_ < 1; a = ++a_) {
+int *i[5], **j = &i[4], ***k[3][2] = {{&j}}, l = &k[2][1], *m = &l;
+char *n = &c;
+f = *e = g = 0;
+for (; g < 2; g++) {
+  for (b = 0; b < 3; b++)
+h = (h && (*n = 0)) == 0;
+  if (g)
+break;
+}
+  }
+  if (f != 0)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 8b91a30d1c6..65aa1df4aba 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -3203,6 +3203,16 @@ find_seed_stmts_for_distribution (class loop *loop, 
vec *work_list)
   /* Initialize the worklist with stmts we seed the partitions with.  */
   for (unsigned i = 0; i < loop->num_nodes; ++i)
 {
+  /* In irreducible sub-regions we don't know how to redirect
+conditions, so fail.  See PR100492.  */
+  if (bbs[i]->flags & BB_IRREDUCIBLE_LOOP)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "loop %d contains an irreducible region.\n",
+loop->num);
+ work_list->truncate (0);
+ break;
+   }
   for (gphi_iterator gsi = gsi_start_phis (bbs[i]);
   !gsi_end_p (gsi); gsi_next (&gsi))
{
-- 
2.26.2


Re: [PATCH][PUSHED] gcc_update: fix check for local source tree.

2021-05-10 Thread Richard Biener via Gcc-patches
On Mon, May 10, 2021 at 1:06 PM Martin Liška  wrote:
>
> Pushed as obvious.

Of course reload.c is a particularly bad choice of file to check for ... ;)

Richard.

> Thanks,
> Martin
>
> contrib/ChangeLog:
>
> * gcc_update: Start using reload.c instead of version.c.
> ---
>   contrib/gcc_update | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/contrib/gcc_update b/contrib/gcc_update
> index 45a27b76cc3..04dc4907fd9 100755
> --- a/contrib/gcc_update
> +++ b/contrib/gcc_update
> @@ -243,7 +243,7 @@ apply_patch () {
>   }
>
>   # Check whether this indeed looks like a local tree.
> -if [ ! -f gcc/version.c ]; then
> +if [ ! -f gcc/reload.c ]; then
>   echo "This does not seem to be a GCC tree!"
>   exit
>   fi
> --
> 2.31.1
>


Re: [PATCH 1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use only 's' builtin version

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping for the series?

On Fri, 30 Apr 2021 at 16:09, Christophe Lyon
 wrote:
>
> There is no need to have a signed and an unsigned version of these
> builtins. This is similar to what we do for Neon in arm_neon.h.
> This mechanical patch enables later cleanup patches.
>
> 2021-03-01  Christophe Lyon  
>
> gcc/
> * config/arm/arm_mve.h (__arm_vcmpeq*u*, __arm_vcmpne*u*): Call
> the 's' version of the builtin.
> ---
>  gcc/config/arm/arm_mve.h | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 3a40c6e..e4dfe91 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -3695,21 +3695,21 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpneq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpneq_sv8hi ((int16x8_t)__a, (int16x8_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv4si (__a, __b);
> +  return __builtin_mve_vcmpneq_sv4si ((int32x4_t)__a, (int32x4_t)__b);
>  }
>
>  __extension__ extern __inline int8x16_t
> @@ -3932,7 +3932,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> @@ -3953,14 +3953,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpeqq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> @@ -4774,7 +4774,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv8hi ((int16x8_t)__a, (int16_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> @@ -4795,14 +4795,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv8hi ((int16x8_t)__a, (int16x8_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u16 (uint16x8_t __a, uint16_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpeqq_n_sv8hi ((int16x8_t)__a, (int16_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> @@ -5616,7 +5616,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv4si (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv4si ((int32x4_t)__a, (int32_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> @@ -5637,14 +5637,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv4si (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv4si ((int32x4_t)__a, (int32x4_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u32 (uint32x4_t __a, uint32_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_uv4si (__a, __b);
> +  return __builtin_mve_vcmpeqq_n_sv4si ((int32x4_t)__a, (int32_t)__b);
>  }
>
>  __extension__ extern __inline mve_pred16_t
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Add mve-vsub-scalar-1.c test

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
 wrote:
>
> This patchs adds a test similar to mve-vsub_1.c, but operates on a
> scalar as second argument. For the moment we do not select the T2 vsub
> variant operating on a scalar final argument, and we use vadd of the
> opposite.
>
> 2021-04-26  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vsub-scalar-1.c: New test.
> ---
>  .../gcc.target/arm/simd/mve-vsub-scalar-1.c| 47 
> ++
>  1 file changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
> new file mode 100644
> index 000..61a9a0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, \
> +TYPE##BITS##_t *a) { \
> +int i; \
> +for (i=0; i +  dest[i] = a[i] OP 1; \
> +}  \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC_IMM(s, int, 32, 4, -, vsubimm)
> +FUNC_IMM(u, uint, 32, 4, -, vsubimm)
> +FUNC_IMM(s, int, 16, 8, -, vsubimm)
> +FUNC_IMM(u, uint, 16, 8, -, vsubimm)
> +FUNC_IMM(s, int, 8, 16, -, vsubimm)
> +FUNC_IMM(u, uint, 8, 16, -, vsubimm)
> +
> +/* For the moment we do not select the T2 vsub variant operating on a scalar
> +   final argument, and we use vadd of the opposite.  */
> +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, r[0-9]+} 2 
> { xfail *-*-* } } } */
> +
> +void test_vsubimm_f32 (float * dest, float * a) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] - 5.0;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, r[0-9]+} 1 
> { xfail *-*-* } } } */
> +
> +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> +void test_vsubimm_f16 (__fp16 * dest, __fp16 * a) {
> +  int i;
> +  __fp16 b = 5.0f16;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] - b;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, r[0-9]+} 1 
> { xfail *-*-* } } } */
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Add mve-vmul-scalar-1.c test

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
 wrote:
>
> Support for vmul has been present for a while, but it was lacking a
> test for the scalar variant.
>
> This patch adds one, precisely noting that we do not yet use the T2
> variants of vmul, which take a scalar as final argument.
>
> 2021-04-22  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vmul-scalar-1: New.
> ---
>  .../gcc.target/arm/simd/mve-vmul-scalar-1.c| 60 
> ++
>  1 file changed, 60 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> new file mode 100644
> index 000..22be452
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, \
> +TYPE##BITS##_t *a) { \
> +int i; \
> +for (i=0; i +  dest[i] = a[i] OP 5; \
> +}  \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC_IMM(s, int, 32, 4, *, vmulimm)
> +FUNC_IMM(u, uint, 32, 4, *, vmulimm)
> +FUNC_IMM(s, int, 16, 8, *, vmulimm)
> +FUNC_IMM(u, uint, 16, 8, *, vmulimm)
> +FUNC_IMM(s, int, 8, 16, *, vmulimm)
> +FUNC_IMM(u, uint, 8, 16, *, vmulimm)
> +
> +/* For the moment we do not select the T2 vmul variant operating on a scalar
> +   final argument.  */
> +/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vmul\.i8\tq[0-9]+, q[0-9]+, r[0-9]+} 2 
> { xfail *-*-* } } } */
> +
> +void test_vmul_f32 (float * dest, float * a, float * b) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] * b[1];
> +  }
> +}
> +void test_vmulimm_f32 (float * dest, float * a) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] * 5.0;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vmul\.f32\tq[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +
> +void test_vmul_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> +  int i;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] * b[i];
> +  }
> +}
> +
> +/* Note that dest[i] = a[i] * 5.0f16 is not vectorized.  */
> +void test_vmulimm_f16 (__fp16 * dest, __fp16 * a) {
> +  int i;
> +  __fp16 b = 5.0f16;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] * b;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
 wrote:
>
> This patch adds a test for the scalar mode of vadd, precisely noting
> that we do not yet use the T2 variants of vadd, which take a scalar as
> final argument.
>
> 2021-04-22  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vadd-scalar-1: New.
> ---
>  .../gcc.target/arm/simd/mve-vadd-scalar-1.c| 47 
> ++
>  1 file changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> new file mode 100644
> index 000..bbf70e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, \
> +TYPE##BITS##_t *a) { \
> +int i; \
> +for (i=0; i +  dest[i] = a[i] OP 1; \
> +}  \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC_IMM(s, int, 32, 4, +, vaddimm)
> +FUNC_IMM(u, uint, 32, 4, +, vaddimm)
> +FUNC_IMM(s, int, 16, 8, +, vaddimm)
> +FUNC_IMM(u, uint, 16, 8, +, vaddimm)
> +FUNC_IMM(s, int, 8, 16, +, vaddimm)
> +FUNC_IMM(u, uint, 8, 16, +, vaddimm)
> +
> +/* For the moment we do not select the T2 vadd variant operating on a scalar
> +   final argument.  */
> +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, r[0-9]+} 
> 2 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, r[0-9]+} 2 
> { xfail *-*-* } } } */
> +
> +void test_vaddimm_f32 (float * dest, float * a) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] + 5.0;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, r[0-9]+} 1 
> { xfail *-*-* } } } */
> +
> +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> +void test_vaddimm_f16 (__fp16 * dest, __fp16 * a) {
> +  int i;
> +  __fp16 b = 5.0f16;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] + b;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, r[0-9]+} 1 
> { xfail *-*-* } } } */
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Add mve-vadd-1.c test

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
 wrote:
>
> Support for vadd has been present for a while, but it was lacking a
> test.
>
> 2021-04-22  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vadd-1.c: New.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c | 43 
> ++
>  1 file changed, 43 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> new file mode 100644
> index 000..15a9daa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, \
> +TYPE##BITS##_t *a, 
> TYPE##BITS##_t *b) { \
> +int i; \
> +for (i=0; i +  dest[i] = a[i] OP b[i];  \
> +}  \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC(s, int, 32, 4, +, vadd)
> +FUNC(u, uint, 32, 4, +, vadd)
> +FUNC(s, int, 16, 8, +, vadd)
> +FUNC(u, uint, 16, 8, +, vadd)
> +FUNC(s, int, 8, 16, +, vadd)
> +FUNC(u, uint, 8, 16, +, vadd)
> +
> +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, q[0-9]+} 
> 2 } } */
> +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, q[0-9]+} 
> 2 } } */
> +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, q[0-9]+} 2 
> } } */
> +
> +void test_vadd_f32 (float * dest, float * a, float * b) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] + b[i];
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, q[0-9]+} 1 
> } } */
> +
> +void test_vadd_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> +  int i;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] + b[i];
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, q[0-9]+} 1 
> } } */
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Factorize and increase coverage in mve-sub_1.c

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
 wrote:
>
> Use a template macro to factorize the existing test functions.
>
> This patch also adds a version to check subtraction with __fp16 type.
>
> 2021-04-26  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vsub_1.c: Factorize and add __fp16 test.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c | 60 
> +-
>  1 file changed, 21 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> index 842e5c6..5a6c345 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> @@ -5,60 +5,42 @@
>
>  #include 
>
> -void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
> -  int i;
> -  for (i=0; i<4; i++) {
> -dest[i] = a[i] - b[i];
> -  }
> +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, \
> +TYPE##BITS##_t *a, 
> TYPE##BITS##_t *b) { \
> +int i; \
> +for (i=0; i +  dest[i] = a[i] OP b[i];  \
> +}  \
>  }
>
> -void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
> -  int i;
> -  for (i=0; i<4; i++) {
> -dest[i] = a[i] - b[i];
> -  }
> -}
> +/* 128-bit vectors.  */
> +FUNC(s, int, 32, 4, -, vsub)
> +FUNC(u, uint, 32, 4, -, vsub)
> +FUNC(s, int, 16, 8, -, vsub)
> +FUNC(u, uint, 16, 8, -, vsub)
> +FUNC(s, int, 8, 16, -, vsub)
> +FUNC(u, uint, 8, 16, -, vsub)
>
>  /* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 
> 2 } } */
> -
> -void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
> -  int i;
> -  for (i=0; i<8; i++) {
> -dest[i] = a[i] - b[i];
> -  }
> -}
> -
> -void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
> -  int i;
> -  for (i=0; i<8; i++) {
> -dest[i] = a[i] - b[i];
> -  }
> -}
> -
>  /* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 
> 2 } } */
> +/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 
> } } */
>
> -void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
> -  int i;
> -  for (i=0; i<16; i++) {
> -dest[i] = a[i] - b[i];
> -  }
> -}
> -
> -void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
> +void test_vsub_f32 (float * dest, float * a, float * b) {
>int i;
> -  for (i=0; i<16; i++) {
> +  for (i=0; i<4; i++) {
>  dest[i] = a[i] - b[i];
>}
>  }
> +/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 
> 1 } } */
>
> -/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 
> } } */
>
> -void test_vsub_f32 (float * dest, float * a, float * b) {
> +void test_vsub_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
>int i;
> -  for (i=0; i<4; i++) {
> +  for (i=0; i<8; i++) {
>  dest[i] = a[i] - b[i];
>}
>  }
>
> -/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 
> 1 } } */
> +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 
> 1 } } */
>
> --
> 2.7.4
>


Re: [PATCH] testsuite/arm: Improve mve-vshr.c

2021-05-10 Thread Christophe Lyon via Gcc-patches
Ping?

On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
 wrote:
>
> Vector right shifts by immediate use vshr, while right shifts by
> vectors instead use vneg and vshl.
>
> This patch adds the corresponding scan-assembler-times that were
> missing.
>
> 2021-04-22  Christophe Lyon  
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vshr.c: Add more scan-assembler-times.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index d4e658c..d4258e9 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -55,5 +55,12 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
>
>  /* MVE has only 128-bit vectors, so we can vectorize only half of the
> functions above.  */
> +/* Vector right shifts use vneg and left shifts.  */
> +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
> +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
> +/* { dg-final { scan-assembler-times {vneg.s[0-9]+  q[0-9]+, q[0-9]+} 6 } } 
> */
> +
> +
> +/* Shift by immediate.  */
>  /* { dg-final { scan-assembler-times {vshr.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
>  /* { dg-final { scan-assembler-times {vshr.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
> --
> 2.7.4
>


[PATCH] testsuite/100452 - fix g++.dg/vect/slp-pr99971.cc

2021-05-10 Thread Richard Biener
This makes sure to align data so targets without unaligned
accesses can vectorize it.

Tested on x86_64-unknown-linux-gnu and sparc-solaris by Rainer, pushed.

2021-05-10  Richard Biener  

PR testsuite/100452
* g++.dg/vect/slp-pr99971.cc: Align data.
---
 gcc/testsuite/g++.dg/vect/slp-pr99971.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr99971.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr99971.cc
index bec6418d4e8..cf22b3331d2 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr99971.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr99971.cc
@@ -22,7 +22,7 @@ struct A
   d -= that.d;
   return *this;
 }
-};
+} __attribute__((aligned(__BIGGEST_ALIGNMENT__)));
 
 void test(A& x, A const& y1, A const& y2)
 {
-- 
2.26.2


RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-10 Thread Richard Biener
On Fri, 7 May 2021, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 7, 2021 12:46 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Wed, 5 May 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds support for a dot product where the sign of the
> > > multiplication arguments differ. i.e. one is signed and one is
> > > unsigned but the precisions are the same.
> > >
> > > #define N 480
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 signed
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 unsigned
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > {
> > >   int av = a[i];
> > >   int bv = b[i];
> > >   SIGNEDNESS_2 short mult = av * bv;
> > >   res += mult;
> > > }
> > >   return res;
> > > }
> > >
> > > The operations are performed as if the operands were extended to a 32-bit
> > value.
> > > As such this operation isn't valid if there is an intermediate
> > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > >
> > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > > the same optab is used but the operands are flipped in the optab
> > expansion.
> > >
> > > To support this the patch extends the dot-product detection to
> > > optionally ignore operands with different signs and stores this
> > > information in the optab subtype which is now made a bitfield.
> > >
> > > The subtype can now additionally controls which optab an EXPR can expand
> > to.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * optabs.def (usdot_prod_optab): New.
> > >   * doc/md.texi: Document it.
> > >   * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > >   * optabs-tree.h (enum optab_subtype): Likewise.
> > >   * optabs.c (expand_widen_pattern_expr): Likewise.
> > >   * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > >   * tree-vect-loop.c (vect_determine_dot_kind): New.
> > >   (vectorizable_reduction): Query dot-product kind.
> > >   * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > >   optab subtype.
> > >   (vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > >   mismatch types.
> > >   (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > f2
> > > e66bc80d7d23 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > takes
> > > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > @samp{udot_prod@var{m}}
> > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > +@samp{usdot_prod@var{m}}
> > >  Compute the sum of the products of two signed/unsigned elements.
> > > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > > a mode equal or -wider than the mode of the product. The result is
> > > placed in operand 0, which -is of the same mode as operand 3.
> > > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > > +Their product, which is of a wider mode, is computed and added to
> > operand 3.
> > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > +The result is placed in operand 0, which is of the same mode as operand 
> > > 3.
> > 
> > This doesn't really say what the 's', 'u' and 'us' specify.  Since we're 
> > doing a
> > widen multiplication and then a non-widening addition we only need to
> > know the effective sign of the multiplication so I think the existing 's' 
> > and 'u'
> > are enough to cover all cases?
> 
> The existing 's' and 'u' enforce that both operands of the multiplication are 
> of the
> same sign.  So for e.g. 'u' both operand must be unsigned.
> 
> In the `us` case one can be signed and one unsigned. Operationally this does 
> a sign
> extension to the wider type for the signed value, and the unsigned value gets 
> zero extended
> first, and then converts it to unsigned to perform the
> unsigned multiplication, conforming to the C promotion rules.
> 
> TL;DR; Without a new optab I can't tell during expansion which semantic the 
> operation
> had at the gimple/C level as modes don't carry signs.
> 
> Long version:
> 
> T

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Bernd Edlinger
Hi Eric,

I have a slightly different issue, last week it was still okay,
but now I get (using gcc-4.8 as bootstrap compiler):

gcc -std=gnu99 -c -g  -gnatpg -gnatwns -gnata -W -Wall -nostdinc -I- -I. 
-Iada/generated -Iada -Iada/gcc-interface -I../../gcc-trunk/gcc/ada 
-I../../gcc-trunk/gcc/ada/gcc-interface -Iada/libgnat 
-I../../gcc-trunk/gcc/ada/libgnat ../../gcc-trunk/gcc/ada/atree.adb -o 
ada/atree.o
atree.adb:569:30: "Shift_Right" is not visible (more references follow)
atree.adb:569:30: non-visible declaration at interfac.ads:147
atree.adb:569:30: non-visible declaration at interfac.ads:127
atree.adb:569:30: non-visible declaration at interfac.ads:107
atree.adb:569:30: non-visible declaration at interfac.ads:87
atree.adb:569:30: non-visible declaration at s-unstyp.ads:220
atree.adb:569:30: non-visible declaration at s-unstyp.ads:200
atree.adb:569:30: non-visible declaration at s-unstyp.ads:180
atree.adb:569:30: non-visible declaration at s-unstyp.ads:160
atree.adb:569:30: non-visible declaration at s-unstyp.ads:140
atree.adb:569:30: non-visible declaration at s-unstyp.ads:120
atree.adb:631:26: "Shift_Left" is not visible (more references follow)
atree.adb:631:26: non-visible declaration at interfac.ads:143
atree.adb:631:26: non-visible declaration at interfac.ads:123
atree.adb:631:26: non-visible declaration at interfac.ads:103
atree.adb:631:26: non-visible declaration at interfac.ads:83
atree.adb:631:26: non-visible declaration at s-unstyp.ads:216
atree.adb:631:26: non-visible declaration at s-unstyp.ads:196
atree.adb:631:26: non-visible declaration at s-unstyp.ads:176
atree.adb:631:26: non-visible declaration at s-unstyp.ads:156
atree.adb:631:26: non-visible declaration at s-unstyp.ads:136
atree.adb:631:26: non-visible declaration at s-unstyp.ads:116
make[3]: *** [ada/atree.o] Error 1
make[3]: *** Waiting for unfinished jobs


On 5/10/21 10:51 AM, Eric Botcazou wrote:
>> Ready for master?
> 
> This breaks the build for me:
> 
> make[3]: *** No rule to make target '/home/eric/cvs/gcc/gcc/version.c', 
> needed 
> by 'ada/stamp-sdefault'.  Stop.
> make[3]: *** Waiting for unfinished jobs
> 


Thanks
Bernd.


[PATCH] Fix awk substr invocation in libgo buildsystem

2021-05-10 Thread choeger
From: Christoph Höger 

The awk script used a zero-based index which worked on surprisingly
many plattforms. According to the man page, however, the function
expects one-based indexing.

For reference see this bug in the go git repository:

https://github.com/golang/go/issues/45843

Signed-off-by: Christoph Höger 
---
 ChangeLog | 4 
 libgo/mklinknames.awk | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 2174ab1ea90..495e6f79b76 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2021-05-10  Christoph Höger  
+
+   * libgo/mklinknames.awk: Fix awk substr invocation
+
 2021-05-04  Nick Clifton  
 
* configure.ac (AC_PROG_CC): Replace with AC_PROG_CC_C99.
diff --git a/libgo/mklinknames.awk b/libgo/mklinknames.awk
index 71cb3be7966..0e49c07349e 100644
--- a/libgo/mklinknames.awk
+++ b/libgo/mklinknames.awk
@@ -37,7 +37,7 @@ BEGIN {
 # The goal is to extract "__timegm50".
 if ((def | getline fndef) > 0 && match(fndef, "__asm__\\(\"\\*?")) {
asmname = substr(fndef, RSTART + RLENGTH)
-   asmname = substr(asmname, 0, length(asmname) - 2)
+   asmname = substr(asmname, 1, length(asmname) - 2)
printf("//go:linkname %s %s\n", gofnname, asmname)
 } else {
# Assume the asm name is the same as the declared C name.
-- 
2.31.1



RE: [PATCH] testsuite/arm: Add mve-vsub-scalar-1.c test

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:07
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] testsuite/arm: Add mve-vsub-scalar-1.c test
> 
> This patchs adds a test similar to mve-vsub_1.c, but operates on a
> scalar as second argument. For the moment we do not select the T2 vsub
> variant operating on a scalar final argument, and we use vadd of the
> opposite.

Ok.
Thanks,
Kyrill

> 
> 2021-04-26  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/arm/simd/mve-vsub-scalar-1.c: New test.
> ---
>  .../gcc.target/arm/simd/mve-vsub-scalar-1.c| 47
> ++
>  1 file changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-
> 1.c
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
> new file mode 100644
> index 000..61a9a0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub-scalar-1.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME) \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> +  TYPE##BITS##_t *a) { \
> +int i;   \
> +for (i=0; i +  dest[i] = a[i] OP 1;   \
> +}
> \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC_IMM(s, int, 32, 4, -, vsubimm)
> +FUNC_IMM(u, uint, 32, 4, -, vsubimm)
> +FUNC_IMM(s, int, 16, 8, -, vsubimm)
> +FUNC_IMM(u, uint, 16, 8, -, vsubimm)
> +FUNC_IMM(s, int, 8, 16, -, vsubimm)
> +FUNC_IMM(u, uint, 8, 16, -, vsubimm)
> +
> +/* For the moment we do not select the T2 vsub variant operating on a
> scalar
> +   final argument, and we use vadd of the opposite.  */
> +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +
> +void test_vsubimm_f32 (float * dest, float * a) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] - 5.0;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, r[0-9]+} 1
> { xfail *-*-* } } } */
> +
> +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> +void test_vsubimm_f16 (__fp16 * dest, __fp16 * a) {
> +  int i;
> +  __fp16 b = 5.0f16;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] - b;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, r[0-9]+} 1
> { xfail *-*-* } } } */
> --
> 2.7.4



RE: [PATCH] testsuite/arm: Add mve-vmul-scalar-1.c test

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:06
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] testsuite/arm: Add mve-vmul-scalar-1.c test
> 
> Support for vmul has been present for a while, but it was lacking a
> test for the scalar variant.
> 
> This patch adds one, precisely noting that we do not yet use the T2
> variants of vmul, which take a scalar as final argument.

Ok.
Thanks, I think the vmul-by-scalar code generation is something Victor is 
working on.
Kyrill

> 
> 2021-04-22  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/arm/simd/mve-vmul-scalar-1: New.
> ---
>  .../gcc.target/arm/simd/mve-vmul-scalar-1.c| 60
> ++
>  1 file changed, 60 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-
> 1.c
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> new file mode 100644
> index 000..22be452
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME) \
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> +  TYPE##BITS##_t *a) { \
> +int i;   \
> +for (i=0; i +  dest[i] = a[i] OP 5;   \
> +}
> \
> +}
> +
> +/* 128-bit vectors.  */
> +FUNC_IMM(s, int, 32, 4, *, vmulimm)
> +FUNC_IMM(u, uint, 32, 4, *, vmulimm)
> +FUNC_IMM(s, int, 16, 8, *, vmulimm)
> +FUNC_IMM(u, uint, 16, 8, *, vmulimm)
> +FUNC_IMM(s, int, 8, 16, *, vmulimm)
> +FUNC_IMM(u, uint, 8, 16, *, vmulimm)
> +
> +/* For the moment we do not select the T2 vmul variant operating on a
> scalar
> +   final argument.  */
> +/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {vmul\.i8\tq[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +
> +void test_vmul_f32 (float * dest, float * a, float * b) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] * b[1];
> +  }
> +}
> +void test_vmulimm_f32 (float * dest, float * a) {
> +  int i;
> +  for (i=0; i<4; i++) {
> +dest[i] = a[i] * 5.0;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vmul\.f32\tq[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> +
> +void test_vmul_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> +  int i;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] * b[i];
> +  }
> +}
> +
> +/* Note that dest[i] = a[i] * 5.0f16 is not vectorized.  */
> +void test_vmulimm_f16 (__fp16 * dest, __fp16 * a) {
> +  int i;
> +  __fp16 b = 5.0f16;
> +  for (i=0; i<8; i++) {
> +dest[i] = a[i] * b;
> +  }
> +}
> +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, r[0-9]+} 2
> { xfail *-*-* } } } */
> --
> 2.7.4



RE: [PATCH 1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use only 's' builtin version

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:10
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use
> only 's' builtin version
> 
> There is no need to have a signed and an unsigned version of these
> builtins. This is similar to what we do for Neon in arm_neon.h.
> This mechanical patch enables later cleanup patches.

Ok.
Thanks, the patches up to 4/9 seem good mechanical clean ups, the code gen 
changes are after 5/9. I'll get to them soon...
Kyrill

> 
> 2021-03-01  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm_mve.h (__arm_vcmpeq*u*, __arm_vcmpne*u*):
> Call
>   the 's' version of the builtin.
> ---
>  gcc/config/arm/arm_mve.h | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 3a40c6e..e4dfe91 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -3695,21 +3695,21 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpneq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpneq_sv8hi ((int16x8_t)__a, (int16x8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return __builtin_mve_vcmpneq_uv4si (__a, __b);
> +  return __builtin_mve_vcmpneq_sv4si ((int32x4_t)__a, (int32x4_t)__b);
>  }
> 
>  __extension__ extern __inline int8x16_t
> @@ -3932,7 +3932,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
> @@ -3953,14 +3953,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpeqq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
> @@ -4774,7 +4774,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv8hi ((int16x8_t)__a, (int16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
> @@ -4795,14 +4795,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv8hi ((int16x8_t)__a, (int16x8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u16 (uint16x8_t __a, uint16_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_uv8hi (__a, __b);
> +  return __builtin_mve_vcmpeqq_n_sv8hi ((int16x8_t)__a, (int16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
> @@ -5616,7 +5616,7 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_uv4si (__a, __b);
> +  return __builtin_mve_vcmpneq_n_sv4si ((int32x4_t)__a, (int32_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
> @@ -5637,14 +5637,14 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_uv4si (__a, __b);
> +  return __builtin_mve_vcmpeqq_sv4si ((int32x4_t)__a, (int32x4_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial

[PATCH] gcc-changelog: Accept ref_name argument in GitCommit.

2021-05-10 Thread Martin Liška

Hello.



As Jakub correctly noticed, we can remove ChangeLog locations only based

on branch (master, a release branch). The patch removes 2 locations for the 
current

master (or any further release) based on the current master.



gccadmin/hooks-bin/commit_checker needs to be updated correspondingly with:


commits = parse_git_revisions(os.environ['GIT_DIR'], commit_rev, 
ref_name)

Thoughts?
Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Remove ChangeLog locations
based on ref_name.
---
 contrib/gcc-changelog/git_commit.py | 29 +
 contrib/gcc-changelog/git_repository.py |  5 +++--
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index b28f7deac23..89b12cda712 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -19,8 +19,9 @@
 import difflib
 import os
 import re
+import sys
 
-changelog_locations = {

+default_changelog_locations = {
 'c++tools',
 'config',
 'contrib',
@@ -287,7 +288,7 @@ class GitInfo:
 
 
 class GitCommit:

-def __init__(self, info, strict=True, commit_to_info_hook=None):
+def __init__(self, info, strict=True, commit_to_info_hook=None, 
ref_name=None):
 self.original_info = info
 self.info = info
 self.message = None
@@ -300,6 +301,7 @@ class GitCommit:
 self.cherry_pick_commit = None
 self.revert_commit = None
 self.commit_to_info_hook = commit_to_info_hook
+self.init_changelog_locations(ref_name)
 
 # Skip Update copyright years commits

 if self.info.lines and self.info.lines[0] == 'Update copyright years.':
@@ -361,15 +363,14 @@ class GitCommit:
 else:
 return False
 
-@classmethod

-def find_changelog_location(cls, name):
+def find_changelog_location(self, name):
 if name.startswith('\t'):
 name = name[1:]
 if name.endswith(':'):
 name = name[:-1]
 if name.endswith('/'):
 name = name[:-1]
-return name if name in changelog_locations else None
+return name if name in self.changelog_locations else None
 
 @classmethod

 def format_git_author(cls, author):
@@ -389,6 +390,17 @@ class GitCommit:
 modified_files.append((parts[2], 'A'))
 return modified_files
 
+def init_changelog_locations(self, ref_name):

+self.changelog_locations = list(default_changelog_locations)
+if ref_name:
+version = sys.maxsize
+if ref_name.startswith('refs/heads/releases/gcc-'):
+version = int(ref_name.split('-')[-1])
+if version >= 12:
+# HSA and BRIG were removed in GCC 12
+self.changelog_locations.remove('gcc/brig')
+self.changelog_locations.remove('libhsail-rt')
+
 def parse_lines(self, all_are_ignored):
 body = self.info.lines
 
@@ -586,7 +598,7 @@ class GitCommit:

 for file in entry.files:
 location = self.get_file_changelog_location(file)
 if (location == ''
-   or (location and location in changelog_locations)):
+   or (location and location in self.changelog_locations)):
 if changelog and changelog != location:
 msg = 'could not deduce ChangeLog file, ' \
   'not unique location'
@@ -606,11 +618,10 @@ class GitCommit:
 return True
 return False
 
-@classmethod

-def get_changelog_by_path(cls, path):
+def get_changelog_by_path(self, path):
 components = path.split('/')
 while components:
-if '/'.join(components) in changelog_locations:
+if '/'.join(components) in self.changelog_locations:
 break
 components = components[:-1]
 return '/'.join(components)
diff --git a/contrib/gcc-changelog/git_repository.py 
b/contrib/gcc-changelog/git_repository.py
index a0e293d756d..501c0d931f5 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -29,7 +29,7 @@ except ImportError:
 from git_commit import GitCommit, GitInfo, decode_path
 
 
-def parse_git_revisions(repo_path, revisions, strict=True):

+def parse_git_revisions(repo_path, revisions, strict=True, ref_name=None):
 repo = Repo(repo_path)
 
 def commit_to_info(commit):

@@ -73,6 +73,7 @@ def parse_git_revisions(repo_path, revisions, strict=True):
 
 for commit in commits:

 git_commit = GitCommit(commit_to_info(commit.hexsha), strict=strict,
-   commit_to_info_hook=commit_to_info)
+   commit_to_info_hook=commit_to_info,
+   ref_name=ref_name)
 parsed_commits.a

Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)

2021-05-10 Thread Richard Biener via Gcc-patches
On Fri, May 7, 2021 at 6:29 AM Feng Xue OS  wrote:
>
> >> gcc/
> >> PR tree-optimization/98598
> >> * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o.
> >> * common.opt (-ftree-loop-mgo): New option.
> >
> > Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a 
> > good
> > name.  -floop-mgo would be better but still I'd have no idea what this 
> > would do.
> >
> > I don't have a good suggestion here other than to expand it to
> > -floop-gather-memory (?!).
>
> OK. Better than "mgo", this abbr. is only a term for development use.
>
> > The option documentation isn't informative either.
> >
> > From:
> >
> >   outer-loop ()
> > {
> >   inner-loop (iter, iter_count)
> > {
> >   Type1 v1 = LOAD (iter);
> >   Type2 v2 = LOAD (v1);
> >   Type3 v3 = LOAD (v2);
> >   ...
> >   iter = NEXT (iter);
> > }
> > }
> >
> > To:
> >
> >   typedef struct cache_elem
> > {
> >   bool   init;
> >   Type1  c_v1;
> >   Type2  c_v2;
> >   Type3  c_v3;
> > } cache_elem;
> >
> >   cache_elem *cache_arr = calloc (iter_count, sizeof (cache_elem));
> >
> >   outer-loop ()
> > {
> >   size_t cache_idx = 0;
> >
> >   inner-loop (iter, iter_count)
> > {
> >   if (!cache_arr[cache_idx]->init)
> > {
> >   v1 = LOAD (iter);
> >   v2 = LOAD (v1);
> >   v3 = LOAD (v2);
> >
> >   cache_arr[cache_idx]->init = true;
> >   cache_arr[cache_idx]->c_v1 = v1;
> >   cache_arr[cache_idx]->c_v2 = v2;
> >   cache_arr[cache_idx]->c_v3 = v3;
> > }
> >   else
> > {
> >   v1 = cache_arr[cache_idx]->c_v1;
> >   v2 = cache_arr[cache_idx]->c_v2;
> >   v3 = cache_arr[cache_idx]->c_v3;
> > }
> >   ...
> >   cache_idx++;
> >   iter = NEXT (iter);
> > }
> > }
> >
> >   free (cache_arr);
> >
> > This is a _very_ special transform.  What it seems to do is
> > optimize the dependent loads for outer loop iteration n > 1
> > by caching the result(s).  If that's possible then you should
> > be able to distribute the outer loop to one doing the caching
> > and one using the cache.  Then this transform would be more
> > like a tradidional array expansion of scalars?  In some cases
> > also loop interchange could remove the need for the caching.
> >
> > Doing MGO as the very first loop pass thus looks bad, I think
> > MGO should be much later, for example after interchange.
> > I also think that MGO should work in concert with loop
> > distribution (which could have an improved cost model)
> > rather than being a separate pass.
> >
> > Your analysis phase looks quite expensive, building sth
> > like a on-the side representation very closely matching SSA.
> > It seems to work from PHI defs to uses, which looks backwards.
>
> Did not catch this point very clearly. Would you please detail it more?

I don't remember exactly but you are building a lot of data structures
that resemble ones readily available when you do find_dep_loads.
You're looking at each and every stmt searching for operands
matching up sth and then you're walking SSA uses again.
And the searching uses linear vector walks (vec::contains).

> > You seem to roll your own dependence analysis code :/  Please
> > have a look at loop distribution.
> >
> > Also you build an actual structure type for reasons that escape
> > me rather than simply accessing the allocated storage at
> > appropriate offsets.
> >
> > I think simply calling 'calloc' isn't OK because you might need
> > aligned storage and because calloc might not be available.
> > Please at least use 'malloc' and make sure MALLOC_ABI_ALIGNMENT
> > is large enough for the data you want to place (or perform
> > dynamic re-alignment yourself).  We probably want some generic
> > middle-end utility to obtain aligned allocated storage at some
> > point.
> >
> > As said above I think you want to re-do this transform as
> > a loop distribution transform.  I think if caching works then
> > the loads should be distributable and the loop distribution
> > transform should be enhanced to expand the scalars to arrays.
>
> I checked code of loop distribution, and its trigger strategy seems
> to be very conservative, now only targets simple and regular
> index-based loop, and could not handle link-list traversal, which
> consists of a series of discrete memory accesses, and MGO would
> matter a lot. Additionally, for some complicate cases,  we could
> not completely decompose MGO as two separate loops for
> "do caching" and "use caching" respectively. An example:
>
> for (i = 0; i < N; i++)
>   {
> for (j = 0; j < i; j++)
>{
>Type1 v1 = LOAD_FN1 (j);
>Type2 v2 = LOAD_FN2 (v1);
>Type3 v3 = LOAD_FN3 (v2);
>
>...
>
>condition = ...
>   

RE: [PATCH 2/9] arm: MVE: Cleanup vcmpne/vcmpeq builtins

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:10
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/9] arm: MVE: Cleanup vcmpne/vcmpeq builtins
> 
> After the previous patch, we no longer need to emit the unsigned
> variants of vcmpneq/vcmpeqq. This patch removes them as well as the
> corresponding iterator entries.

Ok.
Thanks,
Kyrill

> 
> 2021-03-01  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm_mve_builtins.def (vcmpneq_u): Remove.
>   (vcmpneq_n_u): Likewise.
>   (vcmpeqq_u,): Likewise.
>   (vcmpeqq_n_u): Likewise.
>   * config/arm/iterators.md (supf): Remove VCMPNEQ_U,
> VCMPEQQ_U,
>   VCMPEQQ_N_U and VCMPNEQ_N_U.
>   * config/arm/mve.md (mve_vcmpneq): Remove  iteration.
>   (mve_vcmpeqq_n): Likewise.
>   (mve_vcmpeqq): Likewise.
>   (mve_vcmpneq_n): Likewise.
> 
> arm_mve_builtins.def: Remove vcmpneq_u, vcmpneq_n_u, vcmpeqq_u,
> vcmpeqq_n_u.
> iterators.md: Update VCMPNEQ VCMPEQQ VCMPEQQ_N VCMPNEQ_N
> mve.md: Remove vcmpneq_s vcmpeqq_n_u vcmpeqq_u, vcmpneq_n_u,
> ---
>  gcc/config/arm/arm_mve_builtins.def |  4 
>  gcc/config/arm/iterators.md | 15 +++
>  gcc/config/arm/mve.md   | 16 
>  3 files changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index 460f6ba..ee34fd1 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -90,7 +90,6 @@ VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi,
> v8hi, v4si)
>  VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
>  VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_s, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpneq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
> @@ -118,11 +117,8 @@ VAR3 (BINOP_UNONE_UNONE_UNONE,
> vhsubq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpneq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_u, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpeqq_u, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpeqq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8fb723e..0aba93f 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -1279,13 +1279,12 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
>  (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U
> "u")
>  (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s")
> (VSHLQ_S "s")
> -(VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
> +(VADDLVQ_P_U "u") (VCMPNEQ_S "s")
>  (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
>  (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
>  (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s")
> -(VBRSRQ_N_U "u") (VCMPEQQ_S "s") (VCMPEQQ_U "u")
> -(VCMPEQQ_N_S "s") (VCMPEQQ_N_U "u")
> (VCMPNEQ_N_S "s")
> -(VCMPNEQ_N_U "u")
> +(VBRSRQ_N_U "u") (VCMPEQQ_S "s")
> +(VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s")
>  (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s")
>  (VHADDQ_U "u") (VHSUBQ_N_S "s")
>   (VHSUBQ_N_U "u")
>  (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u")
> (VHSUBQ_U "u")
> @@ -1541,16 +1540,16 @@ (define_int_iterator VCREATEQ [VCREATEQ_U
> VCREATEQ_S])
>  (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U])
>  (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S
> VCVTQ_N_FROM_F_U])
>  (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U])
> -(define_int_iterator VCMPNEQ [VCMPNEQ_U VCMPNEQ_S])
> +(define_int_iterator VCMPNEQ [VCMPNEQ_S])
>  (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U])
>  (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
>  (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
>  (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
>  (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
>  (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
> -(define_int_iterator VCMPEQQ [VCMPEQQ_U VCMPEQQ_S])
> -(de

RE: [PATCH 3/9] arm: MVE: Remove _s and _u suffixes from vcmp* builtins.

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:10
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 3/9] arm: MVE: Remove _s and _u suffixes from vcmp*
> builtins.
> 
> This patch brings more unification in the vector comparison builtins,
> by removing the useless 's' (signed) suffix since we no longer need
> unsigned versions.
> 

Ok.
Thanks,
Kyrill

> 2021-03-01  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm_mve.h (__arm_vcmp*): Remove 's' suffix.
>   * config/arm/arm_mve_builtins.def (vcmp*): Remove 's' suffix.
>   * config/arm/mve.md (mve_vcmp*): Remove 's' suffix in pattern
>   names.
> ---
>  gcc/config/arm/arm_mve.h| 120 ++-
> -
>  gcc/config/arm/arm_mve_builtins.def |  32 +-
>  gcc/config/arm/mve.md   |  64 +--
>  3 files changed, 108 insertions(+), 108 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index e4dfe91..5d78269 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -3674,42 +3674,42 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv16qi (__a, __b);
> +  return __builtin_mve_vcmpneq_v16qi (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_s16 (int16x8_t __a, int16x8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv8hi (__a, __b);
> +  return __builtin_mve_vcmpneq_v8hi (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_s32 (int32x4_t __a, int32x4_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv4si (__a, __b);
> +  return __builtin_mve_vcmpneq_v4si (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
> +  return __builtin_mve_vcmpneq_v16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv8hi ((int16x8_t)__a, (int16x8_t)__b);
> +  return __builtin_mve_vcmpneq_v8hi ((int16x8_t)__a, (int16x8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return __builtin_mve_vcmpneq_sv4si ((int32x4_t)__a, (int32x4_t)__b);
> +  return __builtin_mve_vcmpneq_v4si ((int32x4_t)__a, (int32x4_t)__b);
>  }
> 
>  __extension__ extern __inline int8x16_t
> @@ -3932,49 +3932,49 @@ __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpneq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
> +  return __builtin_mve_vcmpneq_n_v16qi ((int8x16_t)__a, (int8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmphiq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmphiq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmphiq_v16qi (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmphiq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmphiq_n_uv16qi (__a, __b);
> +  return __builtin_mve_vcmphiq_n_v16qi (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_sv16qi ((int8x16_t)__a, (int8x16_t)__b);
> +  return __builtin_mve_vcmpeqq_v16qi ((int8x16_t)__a, (int8x16_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpeqq_n_u8 (uint8x16_t __a, uint8_t __b)
>  {
> -  return __builtin_mve_vcmpeqq_n_sv16qi ((int8x16_t)__a, (int8_t)__b);
> +  return __builtin_mve_vcmpeqq_n_v16qi ((int8x16_t)__a, (int8_t)__b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpcsq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return __builtin_mve_vcmpcsq_uv16qi (__a, __b);
> +  return __builtin_mve_vcmpcsq_v16qi (__a, __b);
>  }
> 
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__al

RE: [PATCH 4/9] arm: MVE: Factorize all vcmp* integer patterns

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:10
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 4/9] arm: MVE: Factorize all vcmp* integer patterns
> 
> After removing the signed and unsigned suffixes in the previous
> patches, we can now factorize the vcmp* patterns: there is no longer
> an asymmetry where operators do not have the same set of signed and
> unsigned variants.
> 
> The will make maintenance easier.

Ok.
Thanks,
Kyrill

> 
> MVE has a different set of vector comparison operators than Neon,
> so we have to introduce dedicated iterators.
> 
> 2021-03-01  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (MVE_COMPARISONS): New.
>   (mve_cmp_op): New.
>   (mve_cmp_type): New.
>   * config/arm/mve.md (mve_vcmpq_): New,
> merge all
>   mve_vcmp patterns.
>   (mve_vcmpneq_, mve_vcmpcsq_n_,
> mve_vcmpcsq_)
>   (mve_vcmpeqq_n_, mve_vcmpeqq_,
> mve_vcmpgeq_n_)
>   (mve_vcmpgeq_, mve_vcmpgtq_n_,
> mve_vcmpgtq_)
>   (mve_vcmphiq_n_, mve_vcmphiq_,
> mve_vcmpleq_n_)
>   (mve_vcmpleq_, mve_vcmpltq_n_,
> mve_vcmpltq_)
>   (mve_vcmpneq_n_, mve_vcmpltq_n_,
> mve_vcmpltq_)
>   (mve_vcmpneq_n_): Remove.
> ---
>  gcc/config/arm/iterators.md |   8 ++
>  gcc/config/arm/mve.md   | 250 
> 
>  2 files changed, 27 insertions(+), 231 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 0aba93f..29347f7 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -285,6 +285,8 @@ (define_code_iterator GTUGEU [gtu geu])
> 
>  ;; Comparisons for vc
>  (define_code_iterator COMPARISONS [eq gt ge le lt])
> +;; Comparisons for MVE
> +(define_code_iterator MVE_COMPARISONS [eq ge geu gt gtu le lt ne])
> 
>  ;; A list of ...
>  (define_code_iterator IOR_XOR [ior xor])
> @@ -336,8 +338,14 @@ (define_code_attr arith_shift_insn
>  (define_code_attr cmp_op [(eq "eq") (gt "gt") (ge "ge") (lt "lt") (le "le")
>(gtu "gt") (geu "ge")])
> 
> +(define_code_attr mve_cmp_op [(eq "eq") (gt "gt") (ge "ge") (lt "lt") (le 
> "le")
> +  (gtu "hi") (geu "cs") (ne "ne")])
> +
>  (define_code_attr cmp_type [(eq "i") (gt "s") (ge "s") (lt "s") (le "s")])
> 
> +(define_code_attr mve_cmp_type [(eq "i") (gt "s") (ge "s") (lt "s") (le "s")
> +(gtu "u") (geu "u") (ne "i")])
> +
>  (define_code_attr vfml_op [(plus "a") (minus "s")])
> 
>  (define_code_attr ss_op [(ss_plus "qadd") (ss_minus "qsub")])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index e9f095d..40baff7 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -836,17 +836,30 @@ (define_insn "mve_vaddlvq_p_v4si"
> (set_attr "length""8")])
> 
>  ;;
> -;; [vcmpneq_])
> +;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_,
> vcmpleq_, vcmpltq_])
>  ;;
> -(define_insn "mve_vcmpneq_"
> +(define_insn "mve_vcmpq_"
>[
> (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand:MVE_2 2 "s_register_operand" "w")]
> -  VCMPNEQ))
> + (MVE_COMPARISONS:HI (match_operand:MVE_2 1
> "s_register_operand" "w")
> + (match_operand:MVE_2 2 "s_register_operand" "w")))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vcmp.%#  , %q1, %q2"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vcmpcsq_n_, vcmpeqq_n_, vcmpgeq_n_, vcmpgtq_n_, vcmphiq_n_,
> vcmpleq_n_, vcmpltq_n_, vcmpneq_n_])
> +;;
> +(define_insn "mve_vcmpq_n_"
> +  [
> +   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> + (MVE_COMPARISONS:HI (match_operand:MVE_2 1
> "s_register_operand" "w")
> + (match_operand: 2 "s_register_operand" "r")))
>]
>"TARGET_HAVE_MVE"
> -  "vcmp.i%#  ne, %q1, %q2"
> +  "vcmp.%#  , %q1, %2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -1005,231 +1018,6 @@ (define_expand "cadd3"
>  )
> 
>  ;;
> -;; [vcmpcsq_n_])
> -;;
> -(define_insn "mve_vcmpcsq_n_"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand: 2 "s_register_operand" "r")]
> -  VCMPCSQ_N_U))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vcmp.u%#   cs, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpcsq_])
> -;;
> -(define_insn "mve_vcmpcsq_"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand:MVE_2 2 "s_register_operand" "w")]
> -  VCMPCSQ_U))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vcmp.u%#   cs, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpeqq_n_])
> -;;
> -(define_insn "mve_vcmpeqq_n_"
> -  [
> -   (set (match_operand:HI 0 "vpr_regis

RE: [PATCH 5/9] arm: MVE: Factorize vcmp_*f*

2021-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 30 April 2021 15:10
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 5/9] arm: MVE: Factorize vcmp_*f*
> 
> Like in the previous, we factorize the vcmp_*f* patterns to make
> maintenance easier.

Ok.
Thanks,
Kyrill

> 
> 2021-03-12  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (MVE_FP_COMPARISONS): New.
>   * config/arm/mve.md (mve_vcmpq_f)
>   (mve_vcmpq_n_f): New, merge all vcmp_*f*
>   patterns.
>   (mve_vcmpeqq_f, mve_vcmpeqq_n_f,
> mve_vcmpgeq_f)
>   (mve_vcmpgeq_n_f, mve_vcmpgtq_f)
>   (mve_vcmpgtq_n_f, mve_vcmpleq_f)
>   (mve_vcmpleq_n_f, mve_vcmpltq_f)
>   (mve_vcmpltq_n_f, mve_vcmpneq_f)
>   (mve_vcmpneq_n_f): Remove.
>   * config/arm/unspecs.md (VCMPEQQ_F, VCMPEQQ_N_F,
> VCMPGEQ_F)
>   (VCMPGEQ_N_F, VCMPGTQ_F, VCMPGTQ_N_F, VCMPLEQ_F,
> VCMPLEQ_N_F)
>   (VCMPLTQ_F, VCMPLTQ_N_F, VCMPNEQ_F, VCMPNEQ_N_F):
> Remove.
> ---
>  gcc/config/arm/iterators.md |   1 +
>  gcc/config/arm/mve.md   | 172 
> +++-
>  gcc/config/arm/unspecs.md   |  12 
>  3 files changed, 11 insertions(+), 174 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 29347f7..95df8bd 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -287,6 +287,7 @@ (define_code_iterator GTUGEU [gtu geu])
>  (define_code_iterator COMPARISONS [eq gt ge le lt])
>  ;; Comparisons for MVE
>  (define_code_iterator MVE_COMPARISONS [eq ge geu gt gtu le lt ne])
> +(define_code_iterator MVE_FP_COMPARISONS [eq ge gt le lt ne])
> 
>  ;; A list of ...
>  (define_code_iterator IOR_XOR [ior xor])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 40baff7..7c846a4 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1926,182 +1926,30 @@ (define_insn "mve_vcaddq"
>  ])
> 
>  ;;
> -;; [vcmpeqq_f])
> +;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f])
>  ;;
> -(define_insn "mve_vcmpeqq_f"
> +(define_insn "mve_vcmpq_f"
>[
> (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand:MVE_0 2 "s_register_operand" "w")]
> -  VCMPEQQ_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   eq, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpeqq_n_f])
> -;;
> -(define_insn "mve_vcmpeqq_n_f"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand: 2 "s_register_operand" "r")]
> -  VCMPEQQ_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   eq, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpgeq_f])
> -;;
> -(define_insn "mve_vcmpgeq_f"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand:MVE_0 2 "s_register_operand" "w")]
> -  VCMPGEQ_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   ge, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpgeq_n_f])
> -;;
> -(define_insn "mve_vcmpgeq_n_f"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand: 2 "s_register_operand" "r")]
> -  VCMPGEQ_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   ge, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpgtq_f])
> -;;
> -(define_insn "mve_vcmpgtq_f"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand:MVE_0 2 "s_register_operand" "w")]
> -  VCMPGTQ_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   gt, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vcmpgtq_n_f])
> -;;
> -(define_insn "mve_vcmpgtq_n_f"
> -  [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> - (unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
> - (match_operand: 2 "s_register_operand" "r")]
> -  VCMPGTQ_N_F))
> + (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1
> "s_register_operand" "w")
> +(match_operand:MVE_0 2 "s_register_operand"
> "w")))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcmp.f%#   gt, %q1, %2"
> +  "vcmp.f%#   , %q1, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
>  ;;
> -;; [vcmpleq_f])
> +;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f,
> vcmpneq_n_f])
>  ;;
> -(define_insn "mve_vcmpleq_f"
> +(define_i

Re: [PATCH] AArch64: Improve GOT addressing

2021-05-10 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches  writes:
> Improve GOT addressing by emitting the instructions as a pair.  This reduces
> register pressure and improves code quality. With -fPIC codesize improves by
> 0.65% and SPECINT2017 improves by 0.25%.
>
> Passes bootstrap and regress. OK for commit?

Normally we should only put two instructions in the same define_insn
if there's a specific ABI or architectural reason for not separating
them.  Doing it purely for optimisation reasons is going against the
general direction of travel.  So I think the first question is: why
don't we simply delay the split until after reload instead, since
that's the more normal way of handling this kind of thing?

Also, the patch means that we use RTL of the form:

  (set (reg:PTR R)
   (unspec:PTR [(mem:PTR (symbol_ref:PTR S))]
   UNSPEC_GOTSMALLPIC))

to represent the move of S into R.  This should just be represented as:

  (set (reg:PTR R) (symbol_ref:PTR S))

and go through the normal move patterns.

Thanks,
Richard

> ChangeLog:
> 2021-05-05  Wilco Dijkstra  
>
> * config/aarch64/aarch64.md (ldr_got_small_): Emit ADRP+LDR GOT 
> sequence.
> (ldr_got_small_sidi): Likewise.
> * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): 
> Remove tmp_reg.
> (aarch64_print_operand): Correctly print got_lo12 in L specifier.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..32c5c76d3c001a79d2a69b7f8243f1f1f605f901
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3625,27 +3625,21 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
>  
>   rtx insn;
>   rtx mem;
> - rtx tmp_reg = dest;
>   machine_mode mode = GET_MODE (dest);
>  
> - if (can_create_pseudo_p ())
> -   tmp_reg = gen_reg_rtx (mode);
> -
> - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
>   if (mode == ptr_mode)
> {
>   if (mode == DImode)
> -   insn = gen_ldr_got_small_di (dest, tmp_reg, imm);
> +   insn = gen_ldr_got_small_di (dest, imm);
>   else
> -   insn = gen_ldr_got_small_si (dest, tmp_reg, imm);
> +   insn = gen_ldr_got_small_si (dest, imm);
>  
>   mem = XVECEXP (SET_SRC (insn), 0, 0);
> }
>   else
> {
>   gcc_assert (mode == Pmode);
> -
> - insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm);
> + insn = gen_ldr_got_small_sidi (dest, imm);
>   mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
> }
>  
> @@ -11019,7 +11013,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>switch (aarch64_classify_symbolic_expression (x))
>   {
>   case SYMBOL_SMALL_GOT_4G:
> -   asm_fprintf (asm_out_file, ":lo12:");
> +   asm_fprintf (asm_out_file, ":got_lo12:");
> break;
>  
>   case SYMBOL_SMALL_TLSGD:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> abfd84526745d029ad4953eabad6dd17b159a218..36c5c054f86e9cdd1f0945cdbc1beb47aa7ad80a
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6705,25 +6705,23 @@ (define_insn "add_losym_"
>  
>  (define_insn "ldr_got_small_"
>[(set (match_operand:PTR 0 "register_operand" "=r")
> - (unspec:PTR [(mem:PTR (lo_sum:PTR
> -   (match_operand:PTR 1 "register_operand" "r")
> -   (match_operand:PTR 2 "aarch64_valid_symref" 
> "S")))]
> + (unspec:PTR [(mem:PTR (match_operand:PTR 1 "aarch64_valid_symref" "S"))]
>   UNSPEC_GOTSMALLPIC))]
>""
> -  "ldr\\t%0, [%1, #:got_lo12:%c2]"
> -  [(set_attr "type" "load_")]
> +  "adrp\\t%0, %A1\;ldr\\t%0, [%0, %L1]"
> +  [(set_attr "type" "load_")
> +   (set_attr "length" "8")]
>  )
>  
>  (define_insn "ldr_got_small_sidi"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (zero_extend:DI
> -  (unspec:SI [(mem:SI (lo_sum:DI
> -  (match_operand:DI 1 "register_operand" "r")
> -  (match_operand:DI 2 "aarch64_valid_symref" "S")))]
> +  (unspec:SI [(mem:SI (match_operand:DI 1 "aarch64_valid_symref" "S"))]
>   UNSPEC_GOTSMALLPIC)))]
>"TARGET_ILP32"
> -  "ldr\\t%w0, [%1, #:got_lo12:%c2]"
> -  [(set_attr "type" "load_4")]
> +  "adrp\\t%0, %A1\;ldr\\t%w0, [%0, %L1]"
> +  [(set_attr "type" "load_4")
> +   (set_attr "length" "8")]
>  )
>  
>  (define_insn "ldr_got_small_28k_"


Re: [PATCH 1/2] REE: PR rtl-optimization/100264: Handle more PARALLEL SET expressions

2021-05-10 Thread Christoph Müllner via Gcc-patches
On Thu, May 6, 2021 at 5:29 AM Jim Wilson  wrote:
>
> On Fri, Apr 30, 2021 at 4:10 PM Christoph Müllner via Gcc-patches 
>  wrote:
>>
>> On Sat, May 1, 2021 at 12:48 AM Jeff Law  wrote:
>> > On 4/26/2021 5:38 AM, Christoph Muellner via Gcc-patches wrote:
>> > > [ree] PR rtl-optimization/100264: Handle more PARALLEL SET expressions
>> > >
>> > >  PR rtl-optimization/100264
>> > >  * ree.c (get_sub_rtx): Ignore SET expressions without register
>> > >  destinations.
>> > >  (merge_def_and_ext): Eliminate destination check for register
>> > >  as such SET expressions can't occur anymore.
>> > >  (combine_reaching_defs): Likewise.
>> >
>> > This is pretty sensible.  Do you have commit privs for GCC?
>
>
> This looks reasonable to me also.  But I tried a build and check with an 
> rv64gc/lp64d linux toolchain built from riscv-gnu-toolchain and I get two 
> extra failures in the gfortran testsuite.
>
> /scratch/jimw/fsf-testing/patched/riscv-gcc/gcc/testsuite/gfortran.dg/typebound\
> _operator_3.f03:93:21: internal compiler error: in get_sub_rtx, at ree.c:705^M
> 0x15664f8 get_sub_rtx^M
> ../../../patched/riscv-gcc/gcc/ree.c:705^M
> 0x15672ce merge_def_and_ext^M
> ../../../patched/riscv-gcc/gcc/ree.c:719^M
> 0x15672ce combine_reaching_defs^M
> ../../../patched/riscv-gcc/gcc/ree.c:1020^M
> 0x1568308 find_and_remove_re^M
> ../../../patched/riscv-gcc/gcc/ree.c:1319^M
> 0x1568308 rest_of_handle_ree^M
> ../../../patched/riscv-gcc/gcc/ree.c:1390^M
> 0x1568308 execute^M
> ../../../patched/riscv-gcc/gcc/ree.c:1418^M
> Please submit a full bug report,^M
> with preprocessed source if appropriate.^M
> Please include the complete backtrace with any bug report.^M
> See  for instructions.^M
> compiler exited with status 1
> FAIL: gfortran.dg/typebound_operator_3.f03   -Os  (internal compiler error)
> FAIL: gfortran.dg/typebound_operator_3.f03   -Os  (test for excess errors)

The issue comes from the assertion at the end of get_sub_rtx().
It is not valid anymore, because it is possible to have PARALLEL expressions
without a single SET expression for a register. Without my patch, we would
trigger this assertion if there would be a PARALLEL expression without any SET.

My solution is to eliminate the assertion, as the function is supposed
to return NULL in case no matching SET is found (even before my patch).
This also shortens the whole function a bit because the logic for
non-PARALLEL SETs
can be simplified.

Thanks,
Christoph


[PATCH v2] REE: PR rtl-optimization/100264: Handle more PARALLEL SET expressions

2021-05-10 Thread Christoph Muellner via Gcc-patches
Move the check for register targets (i.e. REG_P ()) into the function
get_sub_rtx () and change the restriction of REE to "only one child of
a PARALLEL expression is a SET register expression" (was "only one child of
a PARALLEL expression is a SET expression").

This allows to handle more PARALLEL SET expressions.

gcc/ChangeLog:
PR rtl-optimization/100264
* ree.c (get_sub_rtx): Ignore SET expressions without register
destinations and remove assertion, as it is not valid anymore
with this new behaviour.
(merge_def_and_ext): Eliminate destination check for register
as such SET expressions can't occur anymore.
(combine_reaching_defs): Likewise.
---
 gcc/ree.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/gcc/ree.c b/gcc/ree.c
index 65457c582c6a..e31ca2fa1a80 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -658,10 +658,11 @@ make_defs_and_copies_lists (rtx_insn *extend_insn, 
const_rtx set_pat,
   return ret;
 }
 
-/* If DEF_INSN has single SET expression, possibly buried inside
-   a PARALLEL, return the address of the SET expression, else
-   return NULL.  This is similar to single_set, except that
-   single_set allows multiple SETs when all but one is dead.  */
+/* If DEF_INSN has single SET expression with a register
+   destination, possibly buried inside a PARALLEL, return
+   the address of the SET expression, else return NULL.
+   This is similar to single_set, except that single_set
+   allows multiple SETs when all but one is dead.  */
 static rtx *
 get_sub_rtx (rtx_insn *def_insn)
 {
@@ -675,6 +676,8 @@ get_sub_rtx (rtx_insn *def_insn)
   rtx s_expr = XVECEXP (PATTERN (def_insn), 0, i);
   if (GET_CODE (s_expr) != SET)
 continue;
+ if (!REG_P (SET_DEST (s_expr)))
+   continue;
 
   if (sub_rtx == NULL)
 sub_rtx = &XVECEXP (PATTERN (def_insn), 0, i);
@@ -686,14 +689,12 @@ get_sub_rtx (rtx_insn *def_insn)
 }
 }
   else if (code == SET)
-sub_rtx = &PATTERN (def_insn);
-  else
 {
-  /* It is not a PARALLEL or a SET, what could it be ? */
-  return NULL;
+   rtx s_expr = PATTERN (def_insn);
+   if (REG_P (SET_DEST (s_expr)))
+ sub_rtx = &PATTERN (def_insn);
 }
 
-  gcc_assert (sub_rtx != NULL);
   return sub_rtx;
 }
 
@@ -712,13 +713,12 @@ merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, 
ext_state *state)
   if (sub_rtx == NULL)
 return false;
 
-  if (REG_P (SET_DEST (*sub_rtx))
-  && (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
+  if (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
  || ((state->modified[INSN_UID (def_insn)].kind
   == (cand->code == ZERO_EXTEND
   ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT))
  && state->modified[INSN_UID (def_insn)].mode
-== ext_src_mode)))
+== ext_src_mode))
 {
   if (GET_MODE_UNIT_SIZE (GET_MODE (SET_DEST (*sub_rtx)))
  >= GET_MODE_UNIT_SIZE (cand->mode))
@@ -853,8 +853,7 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
 CAND->insn, then this transformation is not safe.  Note we have
 to test in the widened mode.  */
   rtx *dest_sub_rtx = get_sub_rtx (def_insn);
-  if (dest_sub_rtx == NULL
- || !REG_P (SET_DEST (*dest_sub_rtx)))
+  if (dest_sub_rtx == NULL)
return false;
 
   rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (set)),
@@ -947,8 +946,7 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
break;
 
  rtx *dest_sub_rtx2 = get_sub_rtx (def_insn2);
- if (dest_sub_rtx2 == NULL
- || !REG_P (SET_DEST (*dest_sub_rtx2)))
+ if (dest_sub_rtx2 == NULL)
break;
 
  /* On RISC machines we must make sure that changing the mode of
-- 
2.31.1



Re: [PATCH] testsuite/arm: Add mve-vmul-scalar-1.c test

2021-05-10 Thread Christophe Lyon via Gcc-patches
On Mon, 10 May 2021 at 13:50, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 30 April 2021 15:06
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] testsuite/arm: Add mve-vmul-scalar-1.c test
> >
> > Support for vmul has been present for a while, but it was lacking a
> > test for the scalar variant.
> >
> > This patch adds one, precisely noting that we do not yet use the T2
> > variants of vmul, which take a scalar as final argument.
>
> Ok.

Thanks

> Thanks, I think the vmul-by-scalar code generation is something Victor is 
> working on.

Ack, good to know, that's on my list too :-)

I asked a question about vadd-with-scalar last week on IRC,
wondering how/if the vectorizer could actually take advantage of vadd
qX, qY, rZ,
since ISTM that it only checks if a vector add with the same 3 (vector) types
is available. I guess the same applies to vmul-by-scalar?

> Kyrill
>
> >
> > 2021-04-22  Christophe Lyon  
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/simd/mve-vmul-scalar-1: New.
> > ---
> >  .../gcc.target/arm/simd/mve-vmul-scalar-1.c| 60
> > ++
> >  1 file changed, 60 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-
> > 1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> > new file mode 100644
> > index 000..22be452
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul-scalar-1.c
> > @@ -0,0 +1,60 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include 
> > +
> > +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME) \
> > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> > __restrict__ dest, \
> > +  TYPE##BITS##_t *a) { \
> > +int i;   \
> > +for (i=0; i > +  dest[i] = a[i] OP 5;   \
> > +}  
> >   \
> > +}
> > +
> > +/* 128-bit vectors.  */
> > +FUNC_IMM(s, int, 32, 4, *, vmulimm)
> > +FUNC_IMM(u, uint, 32, 4, *, vmulimm)
> > +FUNC_IMM(s, int, 16, 8, *, vmulimm)
> > +FUNC_IMM(u, uint, 16, 8, *, vmulimm)
> > +FUNC_IMM(s, int, 8, 16, *, vmulimm)
> > +FUNC_IMM(u, uint, 8, 16, *, vmulimm)
> > +
> > +/* For the moment we do not select the T2 vmul variant operating on a
> > scalar
> > +   final argument.  */
> > +/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, 
> > r[0-9]+} 2
> > { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, 
> > r[0-9]+} 2
> > { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-times {vmul\.i8\tq[0-9]+, q[0-9]+, r[0-9]+} 
> > 2
> > { xfail *-*-* } } } */
> > +
> > +void test_vmul_f32 (float * dest, float * a, float * b) {
> > +  int i;
> > +  for (i=0; i<4; i++) {
> > +dest[i] = a[i] * b[1];
> > +  }
> > +}
> > +void test_vmulimm_f32 (float * dest, float * a) {
> > +  int i;
> > +  for (i=0; i<4; i++) {
> > +dest[i] = a[i] * 5.0;
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vmul\.f32\tq[0-9]+, q[0-9]+, 
> > r[0-9]+} 2
> > { xfail *-*-* } } } */
> > +
> > +void test_vmul_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > +  int i;
> > +  for (i=0; i<8; i++) {
> > +dest[i] = a[i] * b[i];
> > +  }
> > +}
> > +
> > +/* Note that dest[i] = a[i] * 5.0f16 is not vectorized.  */
> > +void test_vmulimm_f16 (__fp16 * dest, __fp16 * a) {
> > +  int i;
> > +  __fp16 b = 5.0f16;
> > +  for (i=0; i<8; i++) {
> > +dest[i] = a[i] * b;
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, 
> > r[0-9]+} 2
> > { xfail *-*-* } } } */
> > --
> > 2.7.4
>


Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-10 Thread Pat Haugen via Gcc-patches
On 5/7/21 6:00 PM, Segher Boessenkool wrote:
>> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned 
>> long long y,
>>  /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
>>  /* { dg-final { scan-assembler-times "vslw" 1 } } */
>>  /* { dg-final { scan-assembler-times "vsld" 1 } } */
>> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */
>> +/* { dg-final { scan-assembler-times "xxlor" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
> So what is this replaced with?  Was it an "xxlmr" and it is just
> unnecessary now?

Different RA choice made the reg copy unnecessary.

<   xxspltib 0,8
<   xxlor 32,0,0
---
>   xxspltib 32,8

-Pat


RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-10 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Monday, May 10, 2021 12:40 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Fri, 7 May 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, May 7, 2021 12:46 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Wed, 5 May 2021, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds support for a dot product where the sign of the
> > > > multiplication arguments differ. i.e. one is signed and one is
> > > > unsigned but the precisions are the same.
> > > >
> > > > #define N 480
> > > > #define SIGNEDNESS_1 unsigned
> > > > #define SIGNEDNESS_2 signed
> > > > #define SIGNEDNESS_3 signed
> > > > #define SIGNEDNESS_4 unsigned
> > > >
> > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > SIGNEDNESS_3 char *restrict a,
> > > >SIGNEDNESS_4 char *restrict b)
> > > > {
> > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > {
> > > >   int av = a[i];
> > > >   int bv = b[i];
> > > >   SIGNEDNESS_2 short mult = av * bv;
> > > >   res += mult;
> > > > }
> > > >   return res;
> > > > }
> > > >
> > > > The operations are performed as if the operands were extended to a
> > > > 32-bit
> > > value.
> > > > As such this operation isn't valid if there is an intermediate
> > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > >
> > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > flipped the same optab is used but the operands are flipped in the
> > > > optab
> > > expansion.
> > > >
> > > > To support this the patch extends the dot-product detection to
> > > > optionally ignore operands with different signs and stores this
> > > > information in the optab subtype which is now made a bitfield.
> > > >
> > > > The subtype can now additionally controls which optab an EXPR can
> > > > expand
> > > to.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * optabs.def (usdot_prod_optab): New.
> > > > * doc/md.texi: Document it.
> > > > * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > * optabs-tree.h (enum optab_subtype): Likewise.
> > > > * optabs.c (expand_widen_pattern_expr): Likewise.
> > > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > * tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > (vectorizable_reduction): Query dot-product kind.
> > > > * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > optional
> > > > optab subtype.
> > > > (vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > ignore
> > > > mismatch types.
> > > > (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > f2
> > > > e66bc80d7d23 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > takes
> > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> @cindex
> > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > @samp{udot_prod@var{m}}
> > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > +@samp{usdot_prod@var{m}}
> > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > which is of a -wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > product. The result is placed in operand 0, which -is of the same mode
> as operand 3.
> > > > +Operand 1 and operand 2 are of the same mode but may differ in
> signs.
> > > > +Their product, which is of a wider mode, is computed and added to
> > > operand 3.
> > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > +The result is placed in operand 0, which is of the same mode as
> operand 3.
> > >
> > > This doesn't really say what the 's', 'u' and 'us' specify.  Since
> > > we're doing a widen multiplication and then a non-widening addition
> > > we only need to know the effective sign of the multiplication so I think
> the existing 's' and 'u'
> > > are enough to cover all cases?
> >
> > The existing 's' and 'u' enforce that both op

Re: [PATCH 1/2] opts: change write_symbols to support bitmasks

2021-05-10 Thread Richard Biener via Gcc-patches
On Thu, May 6, 2021 at 2:31 AM Indu Bhagat via Gcc-patches
 wrote:
>
> To support multiple debug formats, we need to move away from explicit
> enumeration of each individual combination of debug formats.

debug_set_names with its static buffer seems unused?  You wire quite some
APIs with gcc_assert on having a single bit set - that doesn't look forward
looking.

I suppose the BTF followups will "fix" this, but see comments below.

> gcc/c-family/ChangeLog:
>
> * c-opts.c (c_common_post_options): Adjust access to debug_type_names.
> * c-pch.c (struct c_pch_validity): Use type uint32_t.
> (pch_init): Renamed member.
> (c_common_valid_pch): Adjust access to debug_type_names.
>
> gcc/ChangeLog:
>
> * common.opt: Change type to support bitmasks.
> * flag-types.h (enum debug_info_type): Rename enumerator constants.
> (NO_DEBUG): New bitmask.
> (DBX_DEBUG): Likewise.
> (DWARF2_DEBUG): Likewise.
> (XCOFF_DEBUG): Likewise.
> (VMS_DEBUG): Likewise.
> (VMS_AND_DWARF2_DEBUG): Likewise.
> * flags.h (debug_set_to_format): New function declaration.
> (debug_set_count): Likewise.
> (debug_set_names): Likewise.
> * opts.c (debug_type_masks): Array of bitmasks for debug formats.
> (debug_set_to_format): New function definition.
> (debug_set_count): Likewise.
> (debug_set_names): Likewise.
> (set_debug_level): Update access to debug_type_names.
> * toplev.c: Likewise.
>
> gcc/objc/ChangeLog:
>
> * objc-act.c (synth_module_prologue): Use uint32_t instead of enum
> debug_info_type.
> ---
>  gcc/c-family/c-opts.c |  10 +++--
>  gcc/c-family/c-pch.c  |  12 +++---
>  gcc/common.opt|   2 +-
>  gcc/flag-types.h  |  29 ++
>  gcc/flags.h   |  17 +++-
>  gcc/objc/objc-act.c   |   2 +-
>  gcc/opts.c| 109 
> +-
>  gcc/toplev.c  |   9 +++--
>  8 files changed, 158 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
> index 89e05a4..e463240 100644
> --- a/gcc/c-family/c-opts.c
> +++ b/gcc/c-family/c-opts.c
> @@ -1112,9 +1112,13 @@ c_common_post_options (const char **pfilename)
>   /* Only -g0 and -gdwarf* are supported with PCH, for other
>  debug formats we warn here and refuse to load any PCH files.  */
>   if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
> -   warning (OPT_Wdeprecated,
> -"the %qs debug format cannot be used with "
> -"pre-compiled headers", debug_type_names[write_symbols]);
> +   {
> + gcc_assert (debug_set_count (write_symbols) <= 1);

Why this assert?  Iff then simply include the count check in the
condition of the warning.

> + warning (OPT_Wdeprecated,
> +  "the %qs debug format cannot be used with "
> +  "pre-compiled headers",
> +  debug_type_names[debug_set_to_format (write_symbols)]);

Maybe simply emit another diagnostic if debug_set_count > 1.

> +   }
> }
>else if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
> c_common_no_more_pch ();
> diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
> index fd94c37..6804388 100644
> --- a/gcc/c-family/c-pch.c
> +++ b/gcc/c-family/c-pch.c
> @@ -52,7 +52,7 @@ enum {
>
>  struct c_pch_validity
>  {
> -  unsigned char debug_info_type;
> +  uint32_t pch_write_symbols;
>signed char match[MATCH_SIZE];
>void (*pch_init) (void);
>size_t target_data_length;
> @@ -108,7 +108,7 @@ pch_init (void)
>pch_outfile = f;
>
>memset (&v, '\0', sizeof (v));
> -  v.debug_info_type = write_symbols;
> +  v.pch_write_symbols = write_symbols;
>{
>  size_t i;
>  for (i = 0; i < MATCH_SIZE; i++)
> @@ -252,13 +252,15 @@ c_common_valid_pch (cpp_reader *pfile, const char 
> *name, int fd)
>/* The allowable debug info combinations are that either the PCH file
>   was built with the same as is being used now, or the PCH file was
>   built for some kind of debug info but now none is in use.  */
> -  if (v.debug_info_type != write_symbols
> +  if (v.pch_write_symbols != write_symbols
>&& write_symbols != NO_DEBUG)
>  {
> +  gcc_assert (debug_set_count (v.pch_write_symbols) <= 1);
> +  gcc_assert (debug_set_count (write_symbols) <= 1);

So the read-in PCH will have at most one bit set but I don't think
you can assert on write_symbols here.

Otherwise looks OK.  Did you check for write_symbols uses in FEs and targets?

Richard.

>cpp_warning (pfile, CPP_W_INVALID_PCH,
>"%s: created with -g%s, but used with -g%s", name,
> -  debug_type_names[v.debug_info_type],
> -  debug_type_names[write_symbols]);
> +  debug_type

Re: [PATCH 1/2] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2021-05-10 Thread Richard Biener via Gcc-patches
On Tue, Apr 27, 2021 at 5:25 PM Martin Jambor  wrote:
>
> Hi,
>
> PR 93385 reveals that if the user explicitely disables DCE, IPA-SRA
> can leave behind statements which are useless because their results
> are eventually not used but can have problematic side effects,
> especially since their inputs are now bogus that useless parameters
> were removed.
>
> This patch fixes the problem by doing a def-use walk when
> materializing clones, marking which statements should not be copied
> and which SSA_NAMEs do not need to be computed because eventually they
> would be DCEd.
>
> When an argument of a call within such a function is removed,
> however, that change needs to be communicated to call redirection code.
> This is call specific information and therefore cannot be reasonably
> encoded in clone node summary and has to be put in call summaries.
> Combining these with stuff in performed_splits in clone_info would be
> very cumbersome and therefore this patch removes performed_splits and
> moves all information it into call summaries too.  This has also the
> advantage that the code is hopefully a bit easier to understand and we
> do not need any special dummy variables.
>
> The new edge summaries are private to ipa-param-manipulation.c and
> hopefully will never be needed elsewhere.  It simply contains 1) a
> mapping from the original argument indices to the actual indices in the
> call statement as it is now, 2) information needed to identify
> arguments representing pass-through IPA-SRA splits with which have
> been added to the call arguments in place of an original
> argument/reference and 3) a delta to the index where va_args may
> start.
>
> Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux.
> Also LTO-bootstrapped and LTO-profiledbootstrapped on x86_64-linux.
>
> OK for trunk?

I've tried to have a look at this patch but it does a lot of IPA specific
refactoring(?), so the actual DCE bits are hard to find.  Is it possible
to split the patch up or is it too entangled?

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-03-24  Martin Jambor  
>
> PR ipa/93385
> * symtab-clones.h (clone_info): Removed member param_adjustments.
> * ipa-param-manipulation.h: Adjust initial comment to reflect how we
> deal with pass-through splits now.
> (ipa_param_performed_split): Removed.
> (ipa_param_adjustments::modify_call): Adjusted parameters.
> (class ipa_param_body_adjustments): New members m_dead_stmts,
> m_dead_ssas, mark_dead_statements, modify_call_argument and
> m_new_call_arg_modification_info.  Adjusted parameters of
> register_replacement, modify_gimple_stmt and modify_call_stmt.
> (ipa_verify_edge_has_no_modifications): Declare.
> * ipa-param-manipulation.c (struct pass_through_split_map): New type.
> (ipa_edge_modification_info): Likewise.
> (ipa_edge_modification_sum): Likewise.
> (ipa_edge_modifications): New edge summary.
> (ipa_verify_edge_has_no_modifications): New function.
> (transitive_split_p): Removed.
> (transitive_split_map): Likewise.
> (init_transitive_splits): Likewise.
> (ipa_param_adjustments::modify_call): Adjusted to use the new edge
> summary instead of performed_splits.
> (ipa_param_body_adjustments::register_replacement): Drop dummy
> parameter, set base_index of the created ipa_param_body_replacement.
> (phi_arg_will_live_p): New function.
> (ipa_param_body_adjustments::mark_dead_statements): New method.
> (ipa_param_body_adjustments::common_initialization): Call it.  Do not
> create IPA_SRA dummy decls.
> (ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
> new mwmbers.
> (simple_tree_swap_info): Removed.
> (remap_split_decl_to_dummy): Likewise.
> (record_argument_state_1): New function.
> (record_argument_state): Likewise.
> (ipa_param_body_adjustments::modify_call_stmt): New parameter
> orig_stmt.  Do not work with dummy decls, save necessary info about
> changes to ipa_edge_modifications.
> (ipa_param_body_adjustments::modify_gimple_stmt): New parameter
> orig_stmt, pass it to modify_call_stmt.
> (ipa_param_body_adjustments::modify_cfun_body): Adjust call to
> modify_gimple_stmt.
> * tree-inline.c (remap_gimple_stmt): Do not copy dead statements,
> reset dead debug statements, pass original statement to
> modify_gimple_stmt.
> (copy_phis_for_bb): Do not copy dead PHI nodes.
> (expand_call_inline): Do not remap performed_splits.
> (update_clone_info): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2021-03-22  Martin Jambor  
>
> PR ipa/93385
> * gcc.dg/ipa/pr93385.c: New test.
> * gcc.dg/ipa/ipa-sra-23.c: Likewise.
> * gcc.dg/ipa/ipa-

Re: [PATCH 2/2] ipa-sra: Improve debug info for removed parameters (PR 93385)

2021-05-10 Thread Richard Biener via Gcc-patches
On Tue, Apr 27, 2021 at 5:26 PM Martin Jambor  wrote:
>
> Hi,
>
> Whereas the previous patch fixed issues with code left behind after
> IPA-SRA removed a parameter but only reset all affected debug bind
> statements, this one updates them with expressions which can allow the
> debugger to print the removed value - see the added test-case.
>
> Even though I originally did not want to create DEBUG_EXPR_DECLs for
> intermediate values, I ended up doing so, because otherwise the code
> started creating statements like
>
># DEBUG __aD.198693 => &MEM[(const struct _Alloc_nodeD.171110 
> *)D#195]._M_tD.184726->_M_implD.171154
>
> which not only is a bit scary but also gimple-fold ICEs on
> it. Therefore I decided they are probably quite necessary and have
> them.
>
> The patch simply notes each removed SSA name present in a debug
> statement and then works from it backwards, looking if it can
> reconstruct the expression it represents (which can fail if a
> non-degenerate PHI node is in the way).  If it can, it populates two
> hash maps with those expressions so that 1) removed assignments are
> replaced with a debug bind defining a new intermediate debug_decl_expr
> and 2) existing debug binds that refer to SSA names that are bing
> removed now refer to corresponding debug_decl_exprs.

Isn't this what insert_debug_temp_for_var_def already does when you
remove a stmt and if you take care to do that back-to-front?  So with
IPA SRA removing a parameter you'd "only" need to make sure to
set up a debug stmt for the parameter itself and that be picked up
for the (uninitialized) default-def you map to?

> If a removed parameter is passed to another function, the debugging
> information still cannot describe its value there - see the xfailed
> test in the testcase.  I sort of know what needs to be done but the
> handling of debug information for removed parameters is LTO unfriendly
> in general and so needs a bit more work.
>
> Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux.
> Also LTO-bootstrapped and LTO-profiledbootstrapped on x86_64-linux.
>
> OK for trunk?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-03-29  Martin Jambor  
>
> PR ipa/93385
> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
> members remap_with_debug_expressions, m_dead_ssa_debug_equiv,
> m_dead_stmt_debug_equiv and prepare_debug_expressions.  Added
> parameter to mark_dead_statements.
> * ipa-param-manipulation.c: Include tree-phinodes.h and cfgexpand.h.
> (ipa_param_body_adjustments::mark_dead_statements): New parameter
> debugstack, push into it all SSA names used in debug statements,
> produce m_dead_ssa_debug_equiv mapping for the removed param.
> (replace_with_mapped_expr): New function.
> (ipa_param_body_adjustments::remap_with_debug_expressions): Likewise.
> (ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
> (ipa_param_body_adjustments::common_initialization): Gather and
> procecc SSA which will be removed but are in debug statements. 
> Simplify.
> (ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
> new members.
> * tree-inline.c (remap_gimple_stmt): Create a debug bind when possible
> when avoiding a copy of an unnecessary statement.  Remap removed SSA
> names in existing debug statements.
> (tree_function_versioning): Do not create DEBUG_EXPR_DECL for removed
> parameters if we have already done so.
>
> gcc/testsuite/ChangeLog:
>
> 2021-03-29  Martin Jambor  
>
> PR ipa/93385
> * gcc.dg/guality/ipa-sra-1.c: New test.
> ---
>  gcc/ipa-param-manipulation.c | 281 ++-
>  gcc/ipa-param-manipulation.h |  12 +-
>  gcc/testsuite/gcc.dg/guality/ipa-sra-1.c |  45 
>  gcc/tree-inline.c|  45 ++--
>  4 files changed, 306 insertions(+), 77 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c
>
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 3e07fd72fe2..a202501fc95 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "alloc-pool.h"
>  #include "symbol-summary.h"
>  #include "symtab-clones.h"
> +#include "tree-phinodes.h"
> +#include "cfgexpand.h"
>
>
>  /* Actual prefixes of different newly synthetized parameters.  Keep in sync
> @@ -989,10 +991,12 @@ phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, 
> tree arg)
>
>  /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
> any replacement or splitting.  REPL is the replacement VAR_SECL to base 
> any
> -   remaining uses of a removed parameter on.  */
> +   remaining uses of a removed parameter on.  Push all removed SSA names that
> +   are used within debug statem

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-10 Thread Richard Biener
On Mon, 10 May 2021, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, May 10, 2021 12:40 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Fri, 7 May 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > > where the sign for the multiplicant changes.
> > > >
> > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds support for a dot product where the sign of the
> > > > > multiplication arguments differ. i.e. one is signed and one is
> > > > > unsigned but the precisions are the same.
> > > > >
> > > > > #define N 480
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 signed
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 unsigned
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >SIGNEDNESS_4 char *restrict b)
> > > > > {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > {
> > > > >   int av = a[i];
> > > > >   int bv = b[i];
> > > > >   SIGNEDNESS_2 short mult = av * bv;
> > > > >   res += mult;
> > > > > }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > The operations are performed as if the operands were extended to a
> > > > > 32-bit
> > > > value.
> > > > > As such this operation isn't valid if there is an intermediate
> > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > >
> > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > flipped the same optab is used but the operands are flipped in the
> > > > > optab
> > > > expansion.
> > > > >
> > > > > To support this the patch extends the dot-product detection to
> > > > > optionally ignore operands with different signs and stores this
> > > > > information in the optab subtype which is now made a bitfield.
> > > > >
> > > > > The subtype can now additionally controls which optab an EXPR can
> > > > > expand
> > > > to.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * optabs.def (usdot_prod_optab): New.
> > > > >   * doc/md.texi: Document it.
> > > > >   * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > >   * optabs-tree.h (enum optab_subtype): Likewise.
> > > > >   * optabs.c (expand_widen_pattern_expr): Likewise.
> > > > >   * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > >   * tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > >   (vectorizable_reduction): Query dot-product kind.
> > > > >   * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > > optional
> > > > >   optab subtype.
> > > > >   (vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > > ignore
> > > > >   mismatch types.
> > > > >   (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > >
> > > > > --- inline copy of patch --
> > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > >
> > > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > f2
> > > > > e66bc80d7d23 100644
> > > > > --- a/gcc/doc/md.texi
> > > > > +++ b/gcc/doc/md.texi
> > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > > takes
> > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > @cindex
> > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > @samp{udot_prod@var{m}}
> > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > +@samp{usdot_prod@var{m}}
> > > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > product. The result is placed in operand 0, which -is of the same mode
> > as operand 3.
> > > > > +Operand 1 and operand 2 are of the same mode but may differ in
> > signs.
> > > > > +Their product, which is of a wider mode, is computed and added to
> > > > operand 3.
> > > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > > +The result is placed in operand 0, which is of the same mode as
> > operand 3.
> > > >
> > > > This doesn't really say what the 's', 'u' and 'us' specify.  Since

Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-10 Thread H.J. Lu via Gcc-patches
On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> "H.J. Lu via Gcc-patches"  writes:
> >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  wrote:
> >> >>
> >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
> >> >>  wrote:
> >> >> >
> >> >> > "H.J. Lu via Gcc-patches"  writes:
> >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
> >> >> > >  wrote:
> >> >> > >>
> >> >> > >> "H.J. Lu via Gcc-patches"  writes:
> >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers 
> >> >> > >> > so that
> >> >> > >> > associated hard registers can be properly spilled onto stack.  
> >> >> > >> > But there
> >> >> > >> > are cases where associated hard registers will never be spilled 
> >> >> > >> > onto
> >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for register 
> >> >> > >> > alignment
> >> >> > >> > so that stack realignment can be avoided when not needed.
> >> >> > >>
> >> >> > >> How is it guaranteed that they will never be spilled though?
> >> >> > >> I don't think that that guarantee exists for any kind of pseudo,
> >> >> > >> except perhaps for the temporary pseudos that the RA creates to
> >> >> > >> replace (match_scratch …)es.
> >> >> > >>
> >> >> > >
> >> >> > > The caller of creating pseudo registers with specific alignment must
> >> >> > > guarantee that they will never be spilled.   I am only using it in
> >> >> > >
> >> >> > >   /* Make operand1 a register if it isn't already.  */
> >> >> > >   if (can_create_pseudo_p ()
> >> >> > >   && !register_operand (op0, mode)
> >> >> > >   && !register_operand (op1, mode))
> >> >> > > {
> >> >> > >   /* NB: Don't increase stack alignment requirement when forcing
> >> >> > >  operand1 into a pseudo register to copy data from one 
> >> >> > > memory
> >> >> > >  location to another since it doesn't require a spill.  */
> >> >> > >   emit_move_insn (op0,
> >> >> > >   force_reg (GET_MODE (op0), op1,
> >> >> > >  (UNITS_PER_WORD * BITS_PER_UNIT)));
> >> >> > >   return;
> >> >> > > }
> >> >> > >
> >> >> > > for vector moves.  RA shouldn't spill it.
> >> >> >
> >> >> > But this is the point: it's a case of hoping that the RA won't spill 
> >> >> > it,
> >> >> > rather than having a guarantee that it won't.
> >> >> >
> >> >> > Even if the moves start out adjacent, they could be separated by later
> >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
> >> >> > scheduling
> >> >> > isn't enabled by default for x86, but it can still be enabled 
> >> >> > explicitly.)
> >> >> > Or if the same data is being copied to two locations, we might reuse
> >> >> > values loaded by the first copy for the second copy as well.
> >> >
> >> > There are cases where pseudo vector registers are created as pure
> >> > temporary registers in the backend and they shouldn't ever be spilled
> >> > to stack.   They will be spilled to stack only if there are other 
> >> > non-temporary
> >> > vector register usage in which case stack will be properly re-aligned.
> >> > Caller of creating pseudo registers with specific alignment guarantees
> >> > that they are used only as pure temporary registers.
> >>
> >> I don't think there's really a distinct category of pure temporary
> >> registers though.  The things I mentioned above can happen for any
> >> kind of pseudo register.
> >
> > I wonder if for the cases HJ thinks of it is appropriate to use hardregs?
> > Do we generally handle those well?  That is, are they again subject
> > to be allocated by RA when no longer live?
>
> Yeah, using hard registers should work.  Of course, any given fixed choice
> of hard register has the potential to be suboptimal in some situation,
> but it should be safe.

I tried hard registers.  The generated code isn't as good as pseudo registers.
But I want to avoid align the shack when YMM registers are only used to
inline memcpy/memset.  Any suggestions?

Thanks.

-- 
H.J.


Re: [GOVERNANCE] Where to file complaints re project-maintainers?

2021-05-10 Thread abebeos via Gcc-patches
It is just fascinating to see how you don't realize that this affects
mainly gcc.

On Mon, 10 May 2021 at 01:42, Eric Botcazou 
wrote:

> > It is a gcc issue, see the very first link you've quoted (
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92729).
>
> IIUC you're complaining about the bounty process, not about the GCC PR, so
> this technical list is not the appropriate place to do it.  AFAICS you
> have
> already filed a complaint with Bountysource, so it's up to them to decide
> whether to accept or reject it.
>
> --
> Eric Botcazo
>
>
>


Re: [GOVERNANCE] Where to file complaints re project-maintainers?

2021-05-10 Thread abebeos via Gcc-patches
Again, just heavily fascinating to see how you ignore the overall essence
of this, which is of course directly related to gcc.

(bountysource is just a secondary disaster, it all starts here, at gcc.



On Mon, 10 May 2021 at 12:19, Jakub Jelinek  wrote:

> On Sun, May 09, 2021 at 07:48:50PM -0700, Ian Lance Taylor via Gcc-patches
> wrote:
> > On Sun, May 9, 2021 at 8:33 AM abebeos 
> wrote:
> > >
> > > To me this sounds quite like an "disorganized mess, where bullies,
> abusers and even IT-fascists can thrive".
> > >
> > > It is clear to me that some gcc project maintainers, the steering
> committee and bountysource are crossing ethical (if not legal) boundaries.
> >
> > The GCC project maintainers and the steering committee are definitely
> > not crossing ethical or legal boundaries here.
> >
> > I don't know anything about Bountysource.  Bountysource is completely
> > separate from GCC.  It appears from your link that John Paul Adrian
> > Glaubitz posted a bounty for some GCC work.  A number of people and
> > organizations supported the bounty, but the GCC project itself did
> > not.  Although the work is for GCC, the GCC project has nothing to do
> > with that bounty.  That is handled entirely by Bountysource.
>
> Yeah, all that happened on the GCC project side is the agreement
> to deprecate and eventually remove ports that still rely on internal
> details that were obsolete 20 years ago, see
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01256.html
> and then patch review of changes that were posted to gcc-patches.
> The GCC reviewers review posted patches based on the technical
> merits and whether copyright assignment for parts that require copyright
> assignment is available, regardless of whether the people who submit their
> work did the work in their spare time without being compensated for it,
> whether their employers compensated them for it, whether they got
> contracted by
> some company for that work or other means (e.g. bountysource).
> All that is outside of the scope of the GCC project.
> Bountysource AFAIK has its own terms and rules and I believe ultimately it
> is the people who donated money for it that vote about that.
>
> Jakub
>
>


Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-10 Thread Peter Bergner via Gcc-patches
On 5/10/21 7:52 AM, Pat Haugen wrote:
> On 5/7/21 6:00 PM, Segher Boessenkool wrote:
>> So what is this replaced with?  Was it an "xxlmr" and it is just
>> unnecessary now?
> 
> Different RA choice made the reg copy unnecessary.
> 
> < xxspltib 0,8
> < xxlor 32,0,0
> ---
>>  xxspltib 32,8

Given how we use xxlor's for vsx reg copies and how easily they
can change, I'm not sure we should even be counting them at all,
since they can change with the phase of the moon or the day of 
the week.

Peter



Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-10 Thread Richard Biener via Gcc-patches
On Sat, May 8, 2021 at 10:05 AM Kewen.Lin  wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/5/7 下午5:43, Richard Biener wrote:
> > On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> When I was investigating density_test heuristics, I noticed that
> >> the current rs6000_density_test could be used for single scalar
> >> iteration cost calculation, through the call trace:
> >>   vect_compute_single_scalar_iteration_cost
> >> -> rs6000_finish_cost
> >>  -> rs6000_density_test
> >>
> >> It looks unexpected as its desriptive function comments and Bill
> >> helped to confirm this needs to be fixed (thanks!).
> >>
> >> So this patch is to check the passed data, if it's the same as
> >> the one in loop_vinfo, it indicates it's working on vector version
> >> cost calculation, otherwise just early return.
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> >>
> >> Nothing remarkable was observed with SPEC2017 Power9 full run.
> >>
> >> Is it ok for trunk?
> >
> > +  /* Only care about cost of vector version, so exclude scalar
> > version here.  */
> > +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
> > +return;
> >
> > Hmm, looks like a quite "random" test to me.  What about adding a
> > parameter to finish_cost () (or init_cost?) indicating the cost kind?
> >
>
> I originally wanted to change the hook interface, but noticed that
> the finish_cost in function vect_estimate_min_profitable_iters is
> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> it looks enough to differentiate the scalar version costing or
> vector version costing for loop.  Do you mean this observation/
> assumption easy to be broken sometime later?

Yes, this field is likely to become stale.

>
> The attached patch to add one new parameter to indicate the
> costing kind explicitly as you suggested.
>
> Does it look better?
>
> gcc/ChangeLog:
>
> * doc/tm.texi: Regenerated.
> * target.def (init_cost): Add new parameter costing_for_scalar.
> * targhooks.c (default_init_cost): Adjust for new parameter.
> * targhooks.h (default_init_cost): Likewise.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
> (vect_compute_single_scalar_iteration_cost): Likewise.
> (vect_analyze_loop_2): Likewise.
> * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
> (vect_bb_vectorization_profitable_p): Likewise.
> * tree-vectorizer.h (init_cost): Likewise.
> * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
> * config/i386/i386.c (ix86_init_cost): Likewise.
> * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
>
> > OTOH we already pass scalar_stmt to individual add_stmt_cost,
> > so not sure whether the context really matters.  That said,
> > the density test looks "interesting" ... the intent was that finish_cost
> > might look at gathered data from add_stmt, not that it looks at
> > the GIMPLE IL ... so why are you not counting vector_stmt vs.
> > scalar_stmt entries in vect_body and using that for this metric?
> >
>
> Good to know the intention behind finish_cost, thanks!
>
> I'm afraid that the check on vector_stmt and scalar_stmt entries
> from add_stmt_cost doesn't work for the density test here.  The
> density test focuses on the vector version itself, there are some
> stmts whose relevants are marked as vect_unused_in_scope, IIUC
> they won't be passed down when costing for both versions.  But the
> existing density check would like to know the cost for the
> non-vectorized part.  The current implementation does:
>
>  vec_cost = data->cost[vect_body]
>
>   if (!STMT_VINFO_RELEVANT_P (stmt_info)
>   && !STMT_VINFO_IN_PATTERN_P (stmt_info))
> not_vec_cost++
>
>  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
>
> it takes those unrelevant stmts into account, and then has
> both costs from the non-vectorized part (not_vec_cost)
> and vectorized part (cost[vect_body]), it can calculate the
> vectorization code density ratio.

Yes, but then what "relevant" stmts are actually needed and what
not is missed by your heuristics.  It's really some GIGO one
I fear - each vectorized data reference will add a pointer IV
(eventually commoned by IVOPTs later) and pointer value updates
that are not accounted for in costing (the IVs and updates in the
scalar code are marked as not relevant).  Are those the stmts
this heuristic wants to look at?

The patch looks OK btw.

Thanks,
Richard.

>
> BR,
> Kewen


Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-10 Thread Richard Biener via Gcc-patches
On Mon, May 10, 2021 at 3:29 PM H.J. Lu  wrote:
>
> On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
>  wrote:
> >
> > Richard Biener via Gcc-patches  writes:
> > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
> > >  wrote:
> > >>
> > >> "H.J. Lu via Gcc-patches"  writes:
> > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  wrote:
> > >> >>
> > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
> > >> >>  wrote:
> > >> >> >
> > >> >> > "H.J. Lu via Gcc-patches"  writes:
> > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
> > >> >> > >  wrote:
> > >> >> > >>
> > >> >> > >> "H.J. Lu via Gcc-patches"  writes:
> > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers 
> > >> >> > >> > so that
> > >> >> > >> > associated hard registers can be properly spilled onto stack.  
> > >> >> > >> > But there
> > >> >> > >> > are cases where associated hard registers will never be 
> > >> >> > >> > spilled onto
> > >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for 
> > >> >> > >> > register alignment
> > >> >> > >> > so that stack realignment can be avoided when not needed.
> > >> >> > >>
> > >> >> > >> How is it guaranteed that they will never be spilled though?
> > >> >> > >> I don't think that that guarantee exists for any kind of pseudo,
> > >> >> > >> except perhaps for the temporary pseudos that the RA creates to
> > >> >> > >> replace (match_scratch …)es.
> > >> >> > >>
> > >> >> > >
> > >> >> > > The caller of creating pseudo registers with specific alignment 
> > >> >> > > must
> > >> >> > > guarantee that they will never be spilled.   I am only using it in
> > >> >> > >
> > >> >> > >   /* Make operand1 a register if it isn't already.  */
> > >> >> > >   if (can_create_pseudo_p ()
> > >> >> > >   && !register_operand (op0, mode)
> > >> >> > >   && !register_operand (op1, mode))
> > >> >> > > {
> > >> >> > >   /* NB: Don't increase stack alignment requirement when 
> > >> >> > > forcing
> > >> >> > >  operand1 into a pseudo register to copy data from one 
> > >> >> > > memory
> > >> >> > >  location to another since it doesn't require a spill.  */
> > >> >> > >   emit_move_insn (op0,
> > >> >> > >   force_reg (GET_MODE (op0), op1,
> > >> >> > >  (UNITS_PER_WORD * 
> > >> >> > > BITS_PER_UNIT)));
> > >> >> > >   return;
> > >> >> > > }
> > >> >> > >
> > >> >> > > for vector moves.  RA shouldn't spill it.
> > >> >> >
> > >> >> > But this is the point: it's a case of hoping that the RA won't 
> > >> >> > spill it,
> > >> >> > rather than having a guarantee that it won't.
> > >> >> >
> > >> >> > Even if the moves start out adjacent, they could be separated by 
> > >> >> > later
> > >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
> > >> >> > scheduling
> > >> >> > isn't enabled by default for x86, but it can still be enabled 
> > >> >> > explicitly.)
> > >> >> > Or if the same data is being copied to two locations, we might reuse
> > >> >> > values loaded by the first copy for the second copy as well.
> > >> >
> > >> > There are cases where pseudo vector registers are created as pure
> > >> > temporary registers in the backend and they shouldn't ever be spilled
> > >> > to stack.   They will be spilled to stack only if there are other 
> > >> > non-temporary
> > >> > vector register usage in which case stack will be properly re-aligned.
> > >> > Caller of creating pseudo registers with specific alignment guarantees
> > >> > that they are used only as pure temporary registers.
> > >>
> > >> I don't think there's really a distinct category of pure temporary
> > >> registers though.  The things I mentioned above can happen for any
> > >> kind of pseudo register.
> > >
> > > I wonder if for the cases HJ thinks of it is appropriate to use hardregs?
> > > Do we generally handle those well?  That is, are they again subject
> > > to be allocated by RA when no longer live?
> >
> > Yeah, using hard registers should work.  Of course, any given fixed choice
> > of hard register has the potential to be suboptimal in some situation,
> > but it should be safe.
>
> I tried hard registers.  The generated code isn't as good as pseudo registers.
> But I want to avoid align the shack when YMM registers are only used to
> inline memcpy/memset.  Any suggestions?

I wonder if we can mark pseudos with a new reg flag, like 'nospill' and
enforce this in LRA or ICE if we can't?  That said, we should be able
to verify our assumption holds.  Now, we then of course need to avoid
CSE re-using such pseudo in ways that could lead to spilling
(not sure how that could happen, but ...).

Did you investigate closer what made the hardreg case generate worse
code?  Can we hide the copies behind UNSPECs and split them late
after reload?  Or is that too awkward to support when generating the
sequence from the middle-end (I suppose it's not going via the optabs?)

Ri

[committed] libstdc++: Rename test type to avoid clashing with std::any

2021-05-10 Thread Jonathan Wakely via Gcc-patches
When PCH are enabled this test file includes  and so the
using-directive brings std::any into the global scope. It isn't
currently a problem, because the -std option in the dg-options means
that PCH is not used. If that option is removed, the test fails with PCH
and passes without.

This just renames the type to avoid the name classh (and also the 'none'
type for consistency).

libstdc++-v3/ChangeLog:

* testsuite/20_util/variant/compile.cc: Rename 'any' to avoid
clash with std::any.

Tested powerpc64le-linux. Committed to trunk.

commit 2bbacc18b35e44d45676a46eced26129f8f8378a
Author: Jonathan Wakely 
Date:   Mon May 10 13:57:49 2021

libstdc++: Rename test type to avoid clashing with std::any

When PCH are enabled this test file includes  and so the
using-directive brings std::any into the global scope. It isn't
currently a problem, because the -std option in the dg-options means
that PCH is not used. If that option is removed, the test fails with PCH
and passes without.

This just renames the type to avoid the name classh (and also the 'none'
type for consistency).

libstdc++-v3/ChangeLog:

* testsuite/20_util/variant/compile.cc: Rename 'any' to avoid
clash with std::any.

diff --git a/libstdc++-v3/testsuite/20_util/variant/compile.cc 
b/libstdc++-v3/testsuite/20_util/variant/compile.cc
index 33f198c2cc3..e5042751e66 100644
--- a/libstdc++-v3/testsuite/20_util/variant/compile.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/compile.cc
@@ -172,15 +172,15 @@ void arbitrary_ctor()
   static_assert(is_constructible_v, ConvertibleToBool>);
 }
 
-struct none { none() = delete; };
-struct any { template  any(T&&) {} };
+struct None { None() = delete; };
+struct Any { template  Any(T&&) {} };
 
 void in_place_index_ctor()
 {
   variant a(in_place_index<0>, "a");
   variant b(in_place_index<1>, {'a'});
 
-  static_assert(!is_constructible_v, 
std::in_place_index_t<0>>, "PR libstdc++/90165");
+  static_assert(!is_constructible_v, 
std::in_place_index_t<0>>, "PR libstdc++/90165");
 }
 
 void in_place_type_ctor()
@@ -188,7 +188,7 @@ void in_place_type_ctor()
   variant a(in_place_type, "a");
   variant b(in_place_type, {'a'});
   static_assert(!is_constructible_v, 
in_place_type_t, const char*>);
-  static_assert(!is_constructible_v, 
std::in_place_type_t>, "PR libstdc++/90165");
+  static_assert(!is_constructible_v, 
std::in_place_type_t>, "PR libstdc++/90165");
 }
 
 void dtor()


Re: [PATCH] AArch64: Improve GOT addressing

2021-05-10 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

> Normally we should only put two instructions in the same define_insn
> if there's a specific ABI or architectural reason for not separating
> them.  Doing it purely for optimisation reasons is going against the
> general direction of travel.  So I think the first question is: why
> don't we simply delay the split until after reload instead, since
> that's the more normal way of handling this kind of thing?

Well there are no optimizations that benefit from them being split, and there
is no gain from scheduling them independently. Keeping them together
means the linker could perform relaxations on the pair without adding new
relocations. So if we split after reload we'd still want to keep them together.

> This should just be represented as:
>
>  (set (reg:PTR R) (symbol_ref:PTR S))
>
> and go through the normal move patterns.

Yes that should be feasible. Note the UNSPEC is unnecessary even in the
original pattern - given it uses the standard HIGH operator for GOT accesses,
we can also use LO_SUM for the 2nd part since the indirection is hidden.

Cheers,
Wilco

Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-10 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin via Gcc-patches"  writes:
> on 2021/5/7 下午5:43, Richard Biener wrote:
>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
>>  wrote:
>>>
>>> Hi,
>>>
>>> When I was investigating density_test heuristics, I noticed that
>>> the current rs6000_density_test could be used for single scalar
>>> iteration cost calculation, through the call trace:
>>>   vect_compute_single_scalar_iteration_cost
>>> -> rs6000_finish_cost
>>>  -> rs6000_density_test
>>>
>>> It looks unexpected as its desriptive function comments and Bill
>>> helped to confirm this needs to be fixed (thanks!).
>>>
>>> So this patch is to check the passed data, if it's the same as
>>> the one in loop_vinfo, it indicates it's working on vector version
>>> cost calculation, otherwise just early return.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>>
>>> Nothing remarkable was observed with SPEC2017 Power9 full run.
>>>
>>> Is it ok for trunk?
>> 
>> +  /* Only care about cost of vector version, so exclude scalar
>> version here.  */
>> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
>> +return;
>> 
>> Hmm, looks like a quite "random" test to me.  What about adding a
>> parameter to finish_cost () (or init_cost?) indicating the cost kind?
>> 
>
> I originally wanted to change the hook interface, but noticed that
> the finish_cost in function vect_estimate_min_profitable_iters is
> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> it looks enough to differentiate the scalar version costing or
> vector version costing for loop.  Do you mean this observation/
> assumption easy to be broken sometime later?
>
> The attached patch to add one new parameter to indicate the
> costing kind explicitly as you suggested.
>
> Does it look better?
>
> gcc/ChangeLog:
>
>   * doc/tm.texi: Regenerated.
>   * target.def (init_cost): Add new parameter costing_for_scalar.
>   * targhooks.c (default_init_cost): Adjust for new parameter.
>   * targhooks.h (default_init_cost): Likewise.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
>   (vect_compute_single_scalar_iteration_cost): Likewise.
>   (vect_analyze_loop_2): Likewise.
>   * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
>   (vect_bb_vectorization_profitable_p): Likewise.
>   * tree-vectorizer.h (init_cost): Likewise.
>   * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
>   * config/i386/i386.c (ix86_init_cost): Likewise.
>   * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.  

Just wanted to say thanks for doing this.  I hit the same problem
when doing the Neoverse V1 tuning near the end of stage 4.  Due to
the extreme lateness of the changes, I couldn't reasonably ask for
target-independent help at that time, but this patch will make
things simpler for AArch64. :-)

Richard


[PATCH] OpenMP: Add support for 'close' in map clause

2021-05-10 Thread Marcel Vollweiler

Hi,

This patch adds handling for the map-type-modifier 'close' in the map
clause that was introduced with OpenMP 5.0: "The close map-type-modifier
is a hint to the runtime to allocate memory close to the target device."

In OpenMP 5.0 'close' can be used beside/together with 'always' in a
list of map-type-modifiers.

With this patch, 'close' will be parsed and ignored for C and C++. A
patch for Fortran will be provided separately.

Marcel
-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP: Add support for 'close' in map clause

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_clause_map): Support map-type-modifier 
'close'.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_clause_map): Support map-type-modifier 
'close'.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/map-6.c: New test.
* c-c++-common/gomp/map-7.c: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 5cdeb21..78cba7f 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15643,14 +15643,19 @@ c_parser_omp_clause_depend (c_parser *parser, tree 
list)
map-kind:
  alloc | to | from | tofrom | release | delete
 
-   map ( always [,] map-kind: variable-list ) */
+   map ( always [,] map-kind: variable-list )
+
+   OpenMP 5.0:
+   map ( [map-type-modifier[,] ...] map-kind: variable-list )
+
+   map-type-modifier:
+ always | close */
 
 static tree
 c_parser_omp_clause_map (c_parser *parser, tree list)
 {
   location_t clause_loc = c_parser_peek_token (parser)->location;
   enum gomp_map_kind kind = GOMP_MAP_TOFROM;
-  int always = 0;
   enum c_id_kind always_id_kind = C_ID_NONE;
   location_t always_loc = UNKNOWN_LOCATION;
   tree always_id = NULL_TREE;
@@ -15660,37 +15665,54 @@ c_parser_omp_clause_map (c_parser *parser, tree list)
   if (!parens.require_open (parser))
 return list;
 
-  if (c_parser_next_token_is (parser, CPP_NAME))
+  int always = 0;
+  int close = 0;
+  int pos = 1;
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
 {
-  c_token *tok = c_parser_peek_token (parser);
+  c_token *tok = c_parser_peek_nth_token_raw (parser, pos);
   const char *p = IDENTIFIER_POINTER (tok->value);
-  always_id_kind = tok->id_kind;
-  always_loc = tok->location;
-  always_id = tok->value;
   if (strcmp ("always", p) == 0)
{
- c_token *sectok = c_parser_peek_2nd_token (parser);
- if (sectok->type == CPP_COMMA)
+ if (always)
{
- c_parser_consume_token (parser);
- c_parser_consume_token (parser);
- always = 2;
+ c_parser_error (parser, "expected modifier % only once");
+ parens.skip_until_found_close (parser);
+ return list;
+   }
+
+ always_id_kind = tok->id_kind;
+ always_loc = tok->location;
+ always_id = tok->value;
+
+ always++;
+   }
+  else if (strcmp ("close", p) == 0)
+   {
+ if (close)
+   {
+ c_parser_error (parser, "expected modifier % only once");
+ parens.skip_until_found_close (parser);
+ return list;
}
- else if (sectok->type == CPP_NAME)
+
+ close++;
+   }
+  else if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == 
CPP_COLON)
+   {
+ for (int i = 1; i < pos; ++i)
{
- p = IDENTIFIER_POINTER (sectok->value);
- if (strcmp ("alloc", p) == 0
- || strcmp ("to", p) == 0
- || strcmp ("from", p) == 0
- || strcmp ("tofrom", p) == 0
- || strcmp ("release", p) == 0
- || strcmp ("delete", p) == 0)
-   {
- c_parser_consume_token (parser);
- always = 1;
-   }
+ c_parser_peek_token(parser);
+ c_parser_consume_token (parser);
}
+ break;
}
+  else
+   break;
+
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COMMA)
+   pos++;
+  pos++;
 }
 
   if (c_parser_next_token_is (parser, CPP_NAME)
@@ -15719,35 +15741,6 @@ c_parser_omp_clause_map (c_parser *parser, tree list)
   c_parser_consume_token (parser);
   c_parser_consume_token (parser);
 }
-  else if (always)
-{
-  if (always_id_kind != C_ID_ID)
-   {
- c_parser_error (parser, "expected identifier");
- parens.skip_until_found_close (parser);
- return list;
-   }
-
-  tree t = lookup_name (always_id);
-  if (t == NULL_TREE)
-   {
- undeclared_variable (always_loc, always_id);
- t = error_mark_node;
-   }
-  if (t != error_mark_node)
-   {
- tree u = build_omp_clause (clause_loc, OMP_CLAUSE_MAP);
- OMP_CLAUSE_DECL (u) =

Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-10 Thread H.J. Lu via Gcc-patches
On Mon, May 10, 2021 at 6:59 AM Richard Biener
 wrote:
>
> On Mon, May 10, 2021 at 3:29 PM H.J. Lu  wrote:
> >
> > On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener via Gcc-patches  writes:
> > > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
> > > >  wrote:
> > > >>
> > > >> "H.J. Lu via Gcc-patches"  writes:
> > > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  wrote:
> > > >> >>
> > > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
> > > >> >>  wrote:
> > > >> >> >
> > > >> >> > "H.J. Lu via Gcc-patches"  writes:
> > > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
> > > >> >> > >  wrote:
> > > >> >> > >>
> > > >> >> > >> "H.J. Lu via Gcc-patches"  writes:
> > > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo 
> > > >> >> > >> > registers so that
> > > >> >> > >> > associated hard registers can be properly spilled onto 
> > > >> >> > >> > stack.  But there
> > > >> >> > >> > are cases where associated hard registers will never be 
> > > >> >> > >> > spilled onto
> > > >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for 
> > > >> >> > >> > register alignment
> > > >> >> > >> > so that stack realignment can be avoided when not needed.
> > > >> >> > >>
> > > >> >> > >> How is it guaranteed that they will never be spilled though?
> > > >> >> > >> I don't think that that guarantee exists for any kind of 
> > > >> >> > >> pseudo,
> > > >> >> > >> except perhaps for the temporary pseudos that the RA creates to
> > > >> >> > >> replace (match_scratch …)es.
> > > >> >> > >>
> > > >> >> > >
> > > >> >> > > The caller of creating pseudo registers with specific alignment 
> > > >> >> > > must
> > > >> >> > > guarantee that they will never be spilled.   I am only using it 
> > > >> >> > > in
> > > >> >> > >
> > > >> >> > >   /* Make operand1 a register if it isn't already.  */
> > > >> >> > >   if (can_create_pseudo_p ()
> > > >> >> > >   && !register_operand (op0, mode)
> > > >> >> > >   && !register_operand (op1, mode))
> > > >> >> > > {
> > > >> >> > >   /* NB: Don't increase stack alignment requirement when 
> > > >> >> > > forcing
> > > >> >> > >  operand1 into a pseudo register to copy data from one 
> > > >> >> > > memory
> > > >> >> > >  location to another since it doesn't require a spill.  
> > > >> >> > > */
> > > >> >> > >   emit_move_insn (op0,
> > > >> >> > >   force_reg (GET_MODE (op0), op1,
> > > >> >> > >  (UNITS_PER_WORD * 
> > > >> >> > > BITS_PER_UNIT)));
> > > >> >> > >   return;
> > > >> >> > > }
> > > >> >> > >
> > > >> >> > > for vector moves.  RA shouldn't spill it.
> > > >> >> >
> > > >> >> > But this is the point: it's a case of hoping that the RA won't 
> > > >> >> > spill it,
> > > >> >> > rather than having a guarantee that it won't.
> > > >> >> >
> > > >> >> > Even if the moves start out adjacent, they could be separated by 
> > > >> >> > later
> > > >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
> > > >> >> > scheduling
> > > >> >> > isn't enabled by default for x86, but it can still be enabled 
> > > >> >> > explicitly.)
> > > >> >> > Or if the same data is being copied to two locations, we might 
> > > >> >> > reuse
> > > >> >> > values loaded by the first copy for the second copy as well.
> > > >> >
> > > >> > There are cases where pseudo vector registers are created as pure
> > > >> > temporary registers in the backend and they shouldn't ever be spilled
> > > >> > to stack.   They will be spilled to stack only if there are other 
> > > >> > non-temporary
> > > >> > vector register usage in which case stack will be properly 
> > > >> > re-aligned.
> > > >> > Caller of creating pseudo registers with specific alignment 
> > > >> > guarantees
> > > >> > that they are used only as pure temporary registers.
> > > >>
> > > >> I don't think there's really a distinct category of pure temporary
> > > >> registers though.  The things I mentioned above can happen for any
> > > >> kind of pseudo register.
> > > >
> > > > I wonder if for the cases HJ thinks of it is appropriate to use 
> > > > hardregs?
> > > > Do we generally handle those well?  That is, are they again subject
> > > > to be allocated by RA when no longer live?
> > >
> > > Yeah, using hard registers should work.  Of course, any given fixed choice
> > > of hard register has the potential to be suboptimal in some situation,
> > > but it should be safe.
> >
> > I tried hard registers.  The generated code isn't as good as pseudo 
> > registers.
> > But I want to avoid align the shack when YMM registers are only used to
> > inline memcpy/memset.  Any suggestions?
>
> I wonder if we can mark pseudos with a new reg flag, like 'nospill' and
> enforce this in LRA or ICE if we can't?  That said, we should be able
> to verify our assumption holds.  Now, we then of course need to avoid
> CSE re-using such pseu

Re: [GCC][PATCH] arm: Remove duplicate definitions from arm_mve.h (pr100419).

2021-05-10 Thread Richard Earnshaw via Gcc-patches




On 05/05/2021 13:39, Srinath Parvathaneni via Gcc-patches wrote:

Hi Richard,


-Original Message-
From: Richard Earnshaw 
Sent: 05 May 2021 11:15
To: Srinath Parvathaneni ; gcc-
patc...@gcc.gnu.org
Cc: Richard Earnshaw 
Subject: Re: [GCC][PATCH] arm: Remove duplicate definitions from
arm_mve.h (pr100419).



On 05/05/2021 10:56, Srinath Parvathaneni via Gcc-patches wrote:

Hi All,

This patch removes several duplicated intrinsic definitions from
arm_mve.h mentioned in PR100419 and also fixes the wrong arguments
in few of intrinsics polymorphic variants.

Regression tested and found no issues.

Ok for master ? GCC-11 and GCC-10 branch backports?
gcc/ChangeLog:

2021-05-04  Srinath Parvathaneni  

  PR target/100419
  * config/arm/arm_mve.h (__arm_vstrwq_scatter_offset): Fix wrong

arguments.

  (__arm_vcmpneq): Remove duplicate definition.
  (__arm_vstrwq_scatter_offset_p): Likewise.
  (__arm_vmaxq_x): Likewise.
  (__arm_vmlsdavaq): Likewise.
  (__arm_vmlsdavaxq): Likewise.
  (__arm_vmlsdavq_p): Likewise.
  (__arm_vmlsdavxq_p): Likewise.
  (__arm_vrmlaldavhaq): Likewise.
  (__arm_vstrbq_p): Likewise.
  (__arm_vstrbq_scatter_offset): Likewise.
  (__arm_vstrbq_scatter_offset_p): Likewise.
  (__arm_vstrdq_scatter_offset): Likewise.
  (__arm_vstrdq_scatter_offset_p): Likewise.
  (__arm_vstrdq_scatter_shifted_offset): Likewise.
  (__arm_vstrdq_scatter_shifted_offset_p): Likewise.

Co-authored-by: Joe Ramsay  


Let's take this example:

-#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 =
(p1); \
+#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 =
(p0); \
 __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(p0, int32_t *), __p1,
__ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(p0, uint32_t *),
__p1,
__ARM_mve_coerce(__p2, uint32x4_t)));})
+  _Generic( (int
(*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int32_t *), p1,
__ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *),
p1,
__ARM_mve_coerce(__p2, uint32x4_t)));})

It removes the safe shadow copy of p1 but adds a safe shadow copy of p0.
   Why?  Isn't it better (and safer) to just create shadow copies of all
the arguments and let the compiler worry about when it's safe to
eliminate them?


As you already know polymorphic variants are used to select the intrinsics 
based on type of their arguments.

Consider the following code from arm_mve.h:
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_s32 (int32_t * __base, uint32x4_t __offset, 
int32x4_t __value)
{
   __builtin_mve_vstrwq_scatter_offset_sv4si ((__builtin_neon_si *) __base, 
__offset, __value);
}

__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_u32 (uint32_t * __base, uint32x4_t __offset, 
uint32x4_t __value)
{
   __builtin_mve_vstrwq_scatter_offset_uv4si ((__builtin_neon_si *) __base, 
__offset, __value);
}

__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_f32 (float32_t * __base, uint32x4_t __offset, 
float32x4_t __value)
{
   __builtin_mve_vstrwq_scatter_offset_fv4sf ((__builtin_neon_si *) __base, 
__offset, __value);
}

Of above 3 functions, which function is to be called from the following 
polymorphic variant is
decided based on type of arguments p0, p1 and p2.
#define __arm_vstrwq_scatter_offset(p0,p1,p2)

For the 3 function definitions mentioned above, only type of arguments 1 (p0) 
and 3 (p2) varies
whereas type of second argument (p1) is same (uint32x4_t).

This is the reason we need only shadow copy of p0 and p2 to determine the 
actual function to be called
and type of p1 is irrelevant. Previously p1 was wrongly used to determine the 
function instead of p0
and that is a bug, which got fixed in this patch.

Since type of p1 is irrelevant in deciding the function to be called and I 
believe adding shadow copy
for p1 (__typeof(p1) __p1 = (p1) ) in this macro expansion is of no use. 
Considering we have more than
250 polymorphic variants defined in arm_mve.h headers, this results in more 
than 250 lines of extra code.



Ah sorry, I'd missed that this was using the _Generic() feature and that 
p1 was only being used once in each variant.


On that basis, OK

Re: [PR66791][ARM] Replace __builtin_neon_vtst*

2021-05-10 Thread Richard Earnshaw via Gcc-patches




On 06/05/2021 01:14, Prathamesh Kulkarni via Gcc-patches wrote:

Hi,
The attached patch replaces __builtin_neon_vtst* (a, b) with (a & b) != 0.
Bootstrapped and tested on arm-linux-gnueabihf and cross-tested on arm*-*-*.
OK to commit ?

Thanks,
Prathamesh



You're missing the ChangeLog details.

Also, if you're removing these, do we still need the neon_vtst 
patterns in neon.md?  They generate unspecs, so if they're no-longer 
needed for these expansions, they can likely be dropped entirely.


R.


Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-10 Thread Segher Boessenkool
On Mon, May 10, 2021 at 08:53:31AM -0500, Peter Bergner wrote:
> On 5/10/21 7:52 AM, Pat Haugen wrote:
> > On 5/7/21 6:00 PM, Segher Boessenkool wrote:
> >> So what is this replaced with?  Was it an "xxlmr" and it is just
> >> unnecessary now?
> > 
> > Different RA choice made the reg copy unnecessary.
> > 
> > <   xxspltib 0,8
> > <   xxlor 32,0,0
> > ---
> >>xxspltib 32,8
> 
> Given how we use xxlor's for vsx reg copies and how easily they
> can change, I'm not sure we should even be counting them at all,
> since they can change with the phase of the moon or the day of 
> the week.

Yeah -- otoh, it is probably a good idea to keep it for some simpler
testcases, so we are alerted to regressions on this.  It's a tradeoff,
there is no one best way.  But maybe remove it if it "randomly" changed
on a testcase?


Segher


[PATCH] PR fortran/98411 - [10/11/12 Regression] Pointless warning for static variables

2021-05-10 Thread Harald Anlauf via Gcc-patches
A simple, self-explaining patch to avoid a wrong warning.

Regtested on x86_64-pc-linux-gnu.

OK for mainline?  Affected branches?

Thanks,
Harald


PR fortran/98411 - Pointless warning for static variables

Variables with explicit SAVE attribute cannot end up on the stack.
There is no point in checking whether they should be moved off the
stack to static storage.

gcc/fortran/ChangeLog:

PR fortran/98411
* trans-decl.c (gfc_finish_var_decl): Add check for explicit SAVE
attribute.

gcc/testsuite/ChangeLog:

PR fortran/98411
* gfortran.dg/pr98411.f90: New test.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index cc9d85543ca..7cded0a3ede 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -738,6 +738,7 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
   /* Keep variables larger than max-stack-var-size off stack.  */
   if (!(sym->ns->proc_name && sym->ns->proc_name->attr.recursive)
   && !sym->attr.automatic
+  && sym->attr.save != SAVE_EXPLICIT
   && INTEGER_CST_P (DECL_SIZE_UNIT (decl))
   && !gfc_can_put_var_on_stack (DECL_SIZE_UNIT (decl))
 	 /* Put variable length auto array pointers always into stack.  */
diff --git a/gcc/testsuite/gfortran.dg/pr98411.f90 b/gcc/testsuite/gfortran.dg/pr98411.f90
new file mode 100644
index 000..249afaea419
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr98411.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-options "-Wall -fautomatic -fmax-stack-var-size=100" }
+! PR fortran/98411 - Pointless warning for static variables
+
+module try
+  implicit none
+  integer, save :: a(1000)
+contains
+  subroutine initmodule
+real, save :: b(1000)
+logical:: c(1000) ! { dg-warning "moved from stack to static storage" }
+a(1) = 42
+b(2) = 3.14
+c(3) = .true.
+  end subroutine initmodule
+end module try


[PATCH] i386: Force V2SI mode operands to registers in expand_sse_movcc

2021-05-10 Thread Uros Bizjak via Gcc-patches
For some reason middle-end does not enforce operand
predicates for vcond patterns.

2021-05-10  Uroš Bizjak  

gcc/
* config/i386/i386-expand.c (ix86_expand_sse_movcc)
: Force op_true to register.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e9f11bca78a..5cfde5b3d30 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3707,6 +3707,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
 case E_V2SImode:
   if (TARGET_SSE4_1)
{
+ op_true = force_reg (mode, op_true);
+
  gen = gen_mmx_pblendvb;
  if (mode != V8QImode)
d = gen_reg_rtx (V8QImode);


Re: [PATCH] c++: dependent operator expression lookup [PR51577]

2021-05-10 Thread Jason Merrill via Gcc-patches

On 5/9/21 10:33 PM, Patrick Palka wrote:

This unconditionally enables the maybe_save_operator_binding mechanism
for all function templates, so that when resolving a dependent operator
expression from a function template we ignore later-declared
namespace-scope bindings that weren't visible at template definition
time.  This patch additionally makes the mechanism apply to dependent
comma and compound-assignment operator expressions.

Note that this doesn't fix the testcases in PR83035 or PR99692 because
there the dependent operator expressions aren't at function scope.  I'm
not sure how exactly to fix these testcases using the current approach,
since although we'll in both testcases have a TEMPLATE_DECL to associate
the lookup result with, at instantiation time we won't have an
appropriate binding level to push to.  I wonder we could instead encode
dependent operator expressions as appropriately-flagged CALL_EXPRs?


That sounds plausible.


Bootstrapped and regtested on x86_64-pc-linux-gnu, and tested on cmcstl2
and range-v3 and numerous boost libraries, does this look OK for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/51577
* name-lookup.c (maybe_save_operator_binding): Unconditionally
enable for all function templates, not just generic lambdas.
* typeck.c (build_x_compound_expr): Call maybe_save_operator_binding
in the type-dependent case.
(build_x_modify_expr): Likewise.  Move declaration of 'op'
closer to its first use.

gcc/testsuite/ChangeLog:

PR c++/51577
* g++.dg/lookup/operator-3.C: New test.
---
  gcc/cp/name-lookup.c |  15 ++--
  gcc/cp/typeck.c  |  17 ++--
  gcc/testsuite/g++.dg/lookup/operator-3.C | 109 +++
  3 files changed, 128 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/lookup/operator-3.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 4e84e2f9987..a6c9e68a19e 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -9116,7 +9116,7 @@ static const char *const op_bind_attrname = "operator 
bindings";
  void
  maybe_save_operator_binding (tree e)
  {
-  /* This is only useful in a generic lambda.  */
+  /* This is only useful in a template.  */
if (!processing_template_decl)
  return;
  
@@ -9124,13 +9124,12 @@ maybe_save_operator_binding (tree e)

if (!cfn)
  return;
  
-  /* Do this for lambdas and code that will emit a CMI.  In a module's

- GMF we don't yet know whether there will be a CMI.  */
-  if (!module_has_cmi_p () && !global_purview_p () && !current_lambda_expr())
- return;
-
-  tree fnname = ovl_op_identifier (false, TREE_CODE (e));
-  if (!fnname)
+  tree fnname;
+  if(TREE_CODE (e) == MODOP_EXPR)
+fnname = ovl_op_identifier (true, TREE_CODE (TREE_OPERAND (e, 1)));
+  else
+fnname = ovl_op_identifier (false, TREE_CODE (e));
+  if (!fnname || fnname == assign_op_identifier)
  return;
  
tree attributes = DECL_ATTRIBUTES (cfn);

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 50d0f1e6a62..6826fcf139c 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7272,7 +7272,11 @@ build_x_compound_expr (location_t loc, tree op1, tree 
op2,
  {
if (type_dependent_expression_p (op1)
  || type_dependent_expression_p (op2))
-   return build_min_nt_loc (loc, COMPOUND_EXPR, op1, op2);
+   {
+ result = build_min_nt_loc (loc, COMPOUND_EXPR, op1, op2);
+ maybe_save_operator_binding (result);
+ return result;
+   }
op1 = build_non_dependent_expr (op1);
op2 = build_non_dependent_expr (op2);
  }
@@ -8936,7 +8940,6 @@ build_x_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
tree orig_lhs = lhs;
tree orig_rhs = rhs;
tree overload = NULL_TREE;
-  tree op = build_nt (modifycode, NULL_TREE, NULL_TREE);
  
if (lhs == error_mark_node || rhs == error_mark_node)

  return cp_expr (error_mark_node, loc);
@@ -8946,9 +8949,12 @@ build_x_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
if (modifycode == NOP_EXPR
  || type_dependent_expression_p (lhs)
  || type_dependent_expression_p (rhs))
-return build_min_nt_loc (loc, MODOP_EXPR, lhs,
-build_min_nt_loc (loc, modifycode, NULL_TREE,
-  NULL_TREE), rhs);
+   {
+ tree op = build_min_nt_loc (loc, modifycode, NULL_TREE, NULL_TREE);
+ tree rval = build_min_nt_loc (loc, MODOP_EXPR, lhs, op, rhs);
+ maybe_save_operator_binding (rval);
+ return rval;
+   }
  
lhs = build_non_dependent_expr (lhs);

rhs = build_non_dependent_expr (rhs);
@@ -8956,6 +8962,7 @@ build_x_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
  
if (modifycode != NOP_EXPR)

  {
+  tree op = build_nt (modifycode, NULL_TREE, NULL_TREE);
tree 

Re: [PATCH] AArch64: Improve GOT addressing

2021-05-10 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
>> Normally we should only put two instructions in the same define_insn
>> if there's a specific ABI or architectural reason for not separating
>> them.  Doing it purely for optimisation reasons is going against the
>> general direction of travel.  So I think the first question is: why
>> don't we simply delay the split until after reload instead, since
>> that's the more normal way of handling this kind of thing?
>
> Well there are no optimizations that benefit from them being split, and there
> is no gain from scheduling them independently. Keeping them together
> means the linker could perform relaxations on the pair without adding new
> relocations. So if we split after reload we'd still want to keep them 
> together.

The burden of proof is the other way though: there has to be a specific
reason for keeping the instructions together, rather than a specific
reason for splitting them.  How we optimise things after RA changes
with time.

Thanks,
Richard


Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> I have a slightly different issue, last week it was still okay,
> but now I get (using gcc-4.8 as bootstrap compiler):
> 
> gcc -std=gnu99 -c -g  -gnatpg -gnatwns -gnata -W -Wall -nostdinc -I- -I.
> -Iada/generated -Iada -Iada/gcc-interface -I../../gcc-trunk/gcc/ada
> -I../../gcc-trunk/gcc/ada/gcc-interface -Iada/libgnat
> -I../../gcc-trunk/gcc/ada/libgnat ../../gcc-trunk/gcc/ada/atree.adb -o
> ada/atree.o atree.adb:569:30: "Shift_Right" is not visible (more references
> follow) atree.adb:569:30: non-visible declaration at interfac.ads:147
> atree.adb:569:30: non-visible declaration at interfac.ads:127
> atree.adb:569:30: non-visible declaration at interfac.ads:107
> atree.adb:569:30: non-visible declaration at interfac.ads:87
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:220
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:200
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:180
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:160
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:140
> atree.adb:569:30: non-visible declaration at s-unstyp.ads:120
> atree.adb:631:26: "Shift_Left" is not visible (more references follow)
> atree.adb:631:26: non-visible declaration at interfac.ads:143
> atree.adb:631:26: non-visible declaration at interfac.ads:123
> atree.adb:631:26: non-visible declaration at interfac.ads:103
> atree.adb:631:26: non-visible declaration at interfac.ads:83
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:216
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:196
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:176
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:156
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:136
> atree.adb:631:26: non-visible declaration at s-unstyp.ads:116
> make[3]: *** [ada/atree.o] Error 1
> make[3]: *** Waiting for unfinished jobs

Yes, the merge is incomplete, temporarily replace

  pragma Provide_Shift_Operators (Slot);

in atree.ads with

  function Shift_Left (S : Slot; V : Natural) return Slot;
  pragma Import (Intrinsic, Shift_Left);

  function Shift_Right (S : Slot; V : Natural) return Slot;
  pragma Import (Intrinsic, Shift_Right);

-- 
Eric Botcazou





Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> Do you know Eric where version.o needs to be added to be included in the
> problematic command line?

Presumably to the beginning of TOOLS_LIBS in ada/gcc-interface/Makefile.in:

TOOLS_LIBS = ../version.o ../link.o ../targext.o ../../ggc-none.o \

-- 
Eric Botcazou




[committed] libstdc++: Adjust expected errors in tests when compiled as C++20

2021-05-10 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/20_util/scoped_allocator/69293_neg.cc: Add dg-error
for additional errors in C++20.
* 
testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc:
Likewise.
* testsuite/20_util/uses_allocator/69293_neg.cc: Likewise.
* testsuite/27_io/filesystem/path/io/dr2989.cc: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 23972128c83e62011b583f06b32c8501c096b7d8
Author: Jonathan Wakely 
Date:   Mon May 10 15:10:45 2021

libstdc++: Adjust expected errors in tests when compiled as C++20

libstdc++-v3/ChangeLog:

* testsuite/20_util/scoped_allocator/69293_neg.cc: Add dg-error
for additional errors in C++20.
* 
testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc:
Likewise.
* testsuite/20_util/uses_allocator/69293_neg.cc: Likewise.
* testsuite/27_io/filesystem/path/io/dr2989.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc 
b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
index 5efd849ff36..fd37374447f 100644
--- a/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
@@ -47,6 +47,7 @@ test01()
   auto p = sa.allocate(1);
   sa.construct(p);  // this is required to be ill-formed
   // { dg-error "failed: .* uses_allocator is true" "" { target *-*-* } 0 }
+  // { dg-error "too many initializers for 'X'" "" { target c++2a } 0 }
 }
 
 // Needed because of PR c++/92193
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
index b17c6a0b90c..626f2e1c6ee 100644
--- 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
@@ -48,3 +48,5 @@ test02()
 }
 
 // { dg-error "value type is destructible" "" { target *-*-* } 0 }
+// { dg-error "use of deleted function" "" { target c++20 } 0 }
+// { dg-error "is private within this context" "" { target c++20 } 0 }
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
index 921ebbc87de..2c5a62190e8 100644
--- a/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
@@ -45,5 +45,5 @@ test01()
   alloc_type a;
   std::tuple t(std::allocator_arg, a); // this is required to be ill-formed
   // { dg-error "failed: .* uses_allocator is true" "" { target *-*-* } 0 }
-  // { dg-error "no matching function for call" "" { target c++2a } 0 }
+  // { dg-error "too many initializers for 'X'" "" { target c++2a } 0 }
 }
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
index c5cda776477..e858e3508bd 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
@@ -33,3 +33,4 @@ void foo(std::iostream& s) {
   s >> p; // { dg-error "no match" }
 }
 // { dg-prune-output "no type .*enable_if" }
+// { dg-prune-output "template constraint failure" }


[committed 1/9] libstdc++: Remove redundant -std=gnu++17 options from PSTL tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.

commit 646e6c652448bfd8fca535d91f588b4606295a72
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 options from PSTL tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* 
testsuite/20_util/specialized_algorithms/pstl/uninitialized_construct.cc:
Remove -std=gnu++17 from dg-options.
* 
testsuite/20_util/specialized_algorithms/pstl/uninitialized_copy_move.cc:
Likewise.
* 
testsuite/20_util/specialized_algorithms/pstl/uninitialized_fill_destroy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/inplace_merge.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/merge.cc: Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/copy_if.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/copy_move.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/fill.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/generate.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/is_partitioned.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/partition.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/partition_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/remove.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/remove_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/replace.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/replace_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/swap_ranges.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/transform_unary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/unique.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/unique_copy_equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/adjacent_find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/all_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/any_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/count.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_end.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_first_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_if.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/for_each.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/none_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/search_n.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_heap.cc: Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_sorted.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/al

[committed 2/9] libstdc++: Remove redundant -std=gnu++17 options from filesystem tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.


commit aa60ff1c8879f67557efc188b1d18d008458c76a
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 options from filesystem tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/27_io/basic_filebuf/open/char/path.cc: Remove
-std=gnu++17 from dg-options directive.
* testsuite/27_io/basic_fstream/cons/char/path.cc: Likewise.
* testsuite/27_io/basic_fstream/open/char/path.cc: Likewise.
* testsuite/27_io/basic_ifstream/cons/char/path.cc: Likewise.
* testsuite/27_io/basic_ifstream/open/char/path.cc: Likewise.
* testsuite/27_io/basic_ofstream/cons/char/path.cc: Likewise.
* testsuite/27_io/basic_ofstream/open/char/path.cc: Likewise.
* testsuite/27_io/filesystem/directory_entry/86597.cc: Likewise.
* testsuite/27_io/filesystem/directory_entry/lwg3171.cc:
Likewise.
* testsuite/27_io/filesystem/file_status/1.cc: Likewise.
* testsuite/27_io/filesystem/filesystem_error/cons.cc: Likewise.
* testsuite/27_io/filesystem/filesystem_error/copy.cc: Likewise.
* testsuite/27_io/filesystem/iterators/91067.cc: Likewise.
* testsuite/27_io/filesystem/iterators/caching.cc: Likewise.
* testsuite/27_io/filesystem/iterators/directory_iterator.cc:
Likewise.
* testsuite/27_io/filesystem/iterators/pop.cc: Likewise.
* testsuite/27_io/filesystem/iterators/recursion_pending.cc:
Likewise.
* testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
* testsuite/27_io/filesystem/operations/absolute.cc: Likewise.
* testsuite/27_io/filesystem/operations/all.cc: Likewise.
* testsuite/27_io/filesystem/operations/canonical.cc: Likewise.
* testsuite/27_io/filesystem/operations/copy.cc: Likewise.
* testsuite/27_io/filesystem/operations/copy_file.cc: Likewise.
* testsuite/27_io/filesystem/operations/create_directories.cc:
Likewise.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Likewise.
* testsuite/27_io/filesystem/operations/create_symlink.cc:
Likewise.
* testsuite/27_io/filesystem/operations/current_path.cc:
Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Likewise.
* testsuite/27_io/filesystem/operations/exists.cc: Likewise.
* testsuite/27_io/filesystem/operations/file_size.cc: Likewise.
* testsuite/27_io/filesystem/operations/is_empty.cc: Likewise.
* testsuite/27_io/filesystem/operations/last_write_time.cc:
Likewise.
* testsuite/27_io/filesystem/operations/permissions.cc:
Likewise.
* testsuite/27_io/filesystem/operations/proximate.cc: Likewise.
* testsuite/27_io/filesystem/operations/read_symlink.cc:
Likewise.
* testsuite/27_io/filesystem/operations/relative.cc: Likewise.
* testsuite/27_io/filesystem/operations/remove.cc: Likewise.
* testsuite/27_io/filesystem/operations/remove_all.cc: Likewise.
* testsuite/27_io/filesystem/operations/rename.cc: Likewise.
* testsuite/27_io/filesystem/operations/resize_file.cc:
Likewise.
* testsuite/27_io/filesystem/operations/space.cc: Likewise.
* testsuite/27_io/filesystem/operations/status.cc: Likewise.
* testsuite/27_io/filesystem/operations/symlink_status.cc:
Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Likewise.
* testsuite/27_io/filesystem/operations/weakly_canonical.cc:
Likewise.
* testsuite/27_io/filesystem/path/append/path.cc: Likewise.
* testsuite/27_io/filesystem/path/append/source.cc: Likewise.
* testsuite/27_io/filesystem/path/assign/assign.cc: Likewise.
* testsuite/27_io/filesystem/path/assign/copy.cc: Likewise.
* testsuite/27_io/filesystem/path/compare/compare.cc: Likewise.
* testsuite/27_io/filesystem/path/compare/lwg2936.cc: Likewise.
* testsuite/27_io/filesystem/path/compare/path.cc: Likewise.
* testsuite/27_io/filesystem/path/compare/strings.cc: Likewise.
* testsuite/27_io/filesystem/path/concat/92853.cc: Likewise.

[committed 3/9] libstdc++: Remove redundant -std=gnu++17 options from any/optional/variant tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.

commit 8240175b87e331c87993876e782971eda46f9a6e
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 option from any/optional/variant tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/20_util/any/assign/1.cc: Remove -std=gnu++17 from
dg-options directive.
* testsuite/20_util/any/assign/2.cc: Likewise.
* testsuite/20_util/any/assign/emplace.cc: Likewise.
* testsuite/20_util/any/assign/exception.cc: Likewise.
* testsuite/20_util/any/assign/self.cc: Likewise.
* testsuite/20_util/any/cons/1.cc: Likewise.
* testsuite/20_util/any/cons/2.cc: Likewise.
* testsuite/20_util/any/cons/90415.cc: Likewise.
* testsuite/20_util/any/cons/92156.cc: Likewise.
* testsuite/20_util/any/cons/aligned.cc: Likewise.
* testsuite/20_util/any/cons/explicit.cc: Likewise.
* testsuite/20_util/any/cons/in_place.cc: Likewise.
* testsuite/20_util/any/cons/nontrivial.cc: Likewise.
* testsuite/20_util/any/make_any.cc: Likewise.
* testsuite/20_util/any/misc/any_cast.cc: Likewise.
* testsuite/20_util/any/misc/any_cast_neg.cc: Likewise.
* testsuite/20_util/any/misc/any_cast_no_rtti.cc: Likewise.
* testsuite/20_util/any/misc/swap.cc: Likewise.
* testsuite/20_util/any/modifiers/1.cc: Likewise.
* testsuite/20_util/any/modifiers/83658.cc: Likewise.
* testsuite/20_util/any/modifiers/92156.cc: Likewise.
* testsuite/20_util/any/observers/type.cc: Likewise.
* testsuite/20_util/any/requirements.cc: Likewise.
* testsuite/20_util/any/typedefs.cc: Likewise.
* testsuite/20_util/optional/77288.cc: Likewise.
* testsuite/20_util/optional/84601.cc: Likewise.
* testsuite/20_util/optional/assignment/1.cc: Likewise.
* testsuite/20_util/optional/assignment/2.cc: Likewise.
* testsuite/20_util/optional/assignment/3.cc: Likewise.
* testsuite/20_util/optional/assignment/4.cc: Likewise.
* testsuite/20_util/optional/assignment/5.cc: Likewise.
* testsuite/20_util/optional/assignment/6.cc: Likewise.
* testsuite/20_util/optional/assignment/7.cc: Likewise.
* testsuite/20_util/optional/assignment/8.cc: Likewise.
* testsuite/20_util/optional/assignment/9.cc: Likewise.
* testsuite/20_util/optional/bad_access.cc: Likewise.
* testsuite/20_util/optional/cons/77727.cc: Likewise.
* testsuite/20_util/optional/cons/85642.cc: Likewise.
* testsuite/20_util/optional/cons/copy.cc: Likewise.
* testsuite/20_util/optional/cons/deduction.cc: Likewise.
* testsuite/20_util/optional/cons/default.cc: Likewise.
* testsuite/20_util/optional/cons/move.cc: Likewise.
* testsuite/20_util/optional/cons/trivial.cc: Likewise.
* testsuite/20_util/optional/cons/value.cc: Likewise.
* testsuite/20_util/optional/cons/value_neg.cc: Likewise.
* testsuite/20_util/optional/constexpr/cons/default.cc:
Likewise.
* testsuite/20_util/optional/constexpr/cons/value.cc: Likewise.
* testsuite/20_util/optional/constexpr/in_place.cc: Likewise.
* testsuite/20_util/optional/constexpr/make_optional.cc:
Likewise.
* testsuite/20_util/optional/constexpr/nullopt.cc: Likewise.
* testsuite/20_util/optional/constexpr/observers/1.cc: Likewise.
* testsuite/20_util/optional/constexpr/observers/2.cc: Likewise.
* testsuite/20_util/optional/constexpr/observers/3.cc: Likewise.
* testsuite/20_util/optional/constexpr/observers/4.cc: Likewise.
* testsuite/20_util/optional/constexpr/observers/5.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/1.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/2.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/3.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/4.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/5.cc: Likewise.
* testsuite/20_util/optional/constexpr/relops/6.cc: Likewise.
* testsuite/20_util/optional/hash.cc: Likewise.
* testsuite/20_util/optional/in_place.cc: Likewise.
* testsuite/20_

[committed 4/9] libstdc++: Remove redundant -std=gnu++17 options from concurrency tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.


commit 9cd88c022fcad783997cd4111b2e6c3700c4b15b
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 option from concurrency tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic/69769.cc: Remove -std=gnu++17 from
dg-options.
* testsuite/29_atomics/atomic/is_always_lock_free.cc:
* testsuite/29_atomics/atomic/requirements/typedefs.cc:
* testsuite/29_atomics/atomic_integral/is_always_lock_free.cc:
* testsuite/29_atomics/atomic_integral/requirements/typedefs.cc:
* testsuite/30_threads/lock_guard/cons/deduction.cc: Likewise.
* testsuite/30_threads/scoped_lock/cons/1.cc: Likewise.
* testsuite/30_threads/scoped_lock/cons/deduction.cc: Likewise.
* testsuite/30_threads/scoped_lock/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/30_threads/scoped_lock/requirements/typedefs.cc:
Likewise.
* testsuite/30_threads/shared_lock/70766.cc: Likewise.
* testsuite/30_threads/shared_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/shared_mutex/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/shared_mutex/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/shared_mutex/requirements/standard_layout.cc:
Likewise.
* testsuite/30_threads/shared_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/shared_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/shared_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/unique_lock/cons/deduction.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/69769.cc b/libstdc++-v3/testsuite/29_atomics/atomic/69769.cc
index 8258594e67c..d9d0821b8f2 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/69769.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/69769.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 // { dg-require-atomic-builtins "" }
 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/is_always_lock_free.cc b/libstdc++-v3/testsuite/29_atomics/atomic/is_always_lock_free.cc
index 063076ad274..2fa3a40e98c 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/is_always_lock_free.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/is_always_lock_free.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 
 #include 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/requirements/typedefs.cc b/libstdc++-v3/testsuite/29_atomics/atomic/requirements/typedefs.cc
index 28adc675939..c730e1ab117 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/requirements/typedefs.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/requirements/typedefs.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 // { dg-require-atomic-builtins "" }
 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_integral/is_always_lock_free.cc b/libstdc++-v3/testsuite/29_atomics/atomic_integral/is_always_lock_free.cc
index 1d18c93092d..c866118943c 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_integral/is_always_lock_free.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_integral/is_always_lock_free.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 
 #include 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/typedefs.cc b/libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/typedefs.cc
index da27c48e97d..981a85bb504 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/typedefs.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/typedefs.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 // { dg-require-atomic-builtins "" }
 
diff --git a/libstdc++-v3/testsuite/30_threads/lock_guard/cons/deduction.cc b/libstdc++-v3/testsuite/30_threads/lock_guard/cons

[committed 5/9] libstdc++: Remove redundant -std=gnu++17 options from PMR tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.

commit 7a4e52e44a8c9e6c59060adc691de5144d3c6940
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 option from PMR tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/20_util/memory_resource/1.cc: Remove -std=gnu++17
from dg-options directive.
* testsuite/20_util/memory_resource/2.cc: Likewise.
* testsuite/20_util/monotonic_buffer_resource/1.cc: Likewise.
* testsuite/20_util/monotonic_buffer_resource/93208.cc:
Likewise.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc:
Likewise.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc:
Likewise.
* testsuite/20_util/monotonic_buffer_resource/release.cc:
Likewise.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
Likewise.
* testsuite/20_util/polymorphic_allocator/1.cc: Likewise.
* testsuite/20_util/polymorphic_allocator/construct_pair.cc:
Likewise.
* testsuite/20_util/polymorphic_allocator/resource.cc: Likewise.
* testsuite/20_util/polymorphic_allocator/select.cc: Likewise.
* testsuite/20_util/synchronized_pool_resource/allocate.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/allocate_single.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/cons.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/cons_single.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/is_equal.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/multithreaded.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/options.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/release.cc:
Likewise.
* testsuite/20_util/synchronized_pool_resource/release_single.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/allocate-max-chunks.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/cons.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/is_equal.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/options.cc:
Likewise.
* testsuite/20_util/unsynchronized_pool_resource/release.cc:
Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/deque/types/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/deque/types/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/list/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/list/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/multiset/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multiset/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/set/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/set/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_set/pmr_typedefs.cc:
   

[committed 6/9] libstdc++: Remove redundant -std=gnu++17 options from strings tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.

commit 8087e70267ce6fa0787152963339ba987e7b514d
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 option from strings tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/21_strings/basic_string/79162.cc: Remove
-std=gnu++17 from dg-options directive.
* testsuite/21_strings/basic_string/cons/char/7.cc: Likewise.
* testsuite/21_strings/basic_string/cons/char/79162.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/char/86138.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/char/9.cc: Likewise.
* testsuite/21_strings/basic_string/cons/char/deduction.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/char/moveable2_c++17.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/7.cc: Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/79162.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/86138.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/9.cc: Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/deduction.cc:
Likewise.
* testsuite/21_strings/basic_string/cons/wchar_t/moveable2_c++17.cc:
Likewise.
* testsuite/21_strings/basic_string/hash/hash.cc: Likewise.
* testsuite/21_strings/basic_string/lwg2758.cc: Likewise.
* testsuite/21_strings/basic_string/lwg2946.cc: Likewise.
* testsuite/21_strings/basic_string/modifiers/append/char/4.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/append/wchar_t/4.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/assign/char/4.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/assign/wchar_t/4.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/insert/char/3.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/insert/wchar_t/3.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/replace/char/7.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/replace/wchar_t/7.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/compare/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/compare/wchar_t/2.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/data/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/data/char/86169.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/data/wchar_t/2.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/find/char/5.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/find/wchar_t/5.cc:
Likewise.
* testsuite/21_strings/basic_string/operators/char/5.cc:
Likewise.
* testsuite/21_strings/basic_string/operators/wchar_t/5.cc:
Likewise.
* testsuite/21_strings/basic_string_view/capacity/1.cc:
Likewise.
* testsuite/21_strings/basic_string_view/capacity/empty_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/char/1.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/char/3.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/char/nonnull.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/wchar_t/1.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/wchar_t/2.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/wchar_t/3.cc:
Likewise.
* testsuite/21_strings/basic_string_view/cons/wchar_t/nonnull.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/char/1.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
Likewise.
* testsuite/21_

[committed 7/9] libstdc++: Remove redundant -std=gnu++17 options from containers tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.



[committed 8/9] libstdc++: Remove redundant -std=gnu++17 options from algorithm tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.


commit d7b2d92747f8d236050af3ec5741786f0f878716
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:54 2021

libstdc++: Remove redundant -std=gnu++17 option from algorithm tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/20_util/function_objects/searchers.cc: Remove
-std=gnu++17 from dg-options directive.
* testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct/94540.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct/94831.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/94540.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc:
Likewise.
* testsuite/20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc:
Likewise.
* testsuite/25_algorithms/clamp/1.cc: Likewise.
* testsuite/25_algorithms/clamp/2.cc: Likewise.
* testsuite/25_algorithms/clamp/constexpr.cc: Likewise.
* testsuite/25_algorithms/clamp/requirements/explicit_instantiation/1.cc:
Likewise.
* testsuite/25_algorithms/clamp/requirements/explicit_instantiation/pod.cc:
Likewise.
* testsuite/25_algorithms/for_each/for_each_n.cc: Likewise.
* testsuite/25_algorithms/for_each/for_each_n_debug.cc:
Likewise.
* testsuite/25_algorithms/sample/1.cc: Likewise.
* testsuite/25_algorithms/sample/2.cc: Likewise.
* testsuite/25_algorithms/sample/3.cc: Likewise.
* testsuite/25_algorithms/sample/81221.cc: Likewise.
* testsuite/25_algorithms/search/searcher.cc: Likewise.
* testsuite/26_numerics/exclusive_scan/1.cc: Likewise.
* testsuite/26_numerics/inclusive_scan/1.cc: Likewise.
* testsuite/26_numerics/reduce/1.cc: Likewise.
* testsuite/26_numerics/reduce/2.cc: Likewise.
* testsuite/26_numerics/transform_exclusive_scan/1.cc: Likewise.
* testsuite/26_numerics/transform_inclusive_scan/1.cc: Likewise.
* testsuite/26_numerics/transform_reduce/1.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/20_util/function_objects/searchers.cc b/libstdc++-v3/testsuite/20_util/function_objects/searchers.cc
index 4f134d91e00..f8899659cbe 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/searchers.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/searchers.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do run { target c++17 } }
 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc
index a4a7725ce3c..cabed8bd461 100644
--- a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc
+++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do run { target c++17 } }
 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
index 626f2e1c6ee..e114168842e 100644
--- a/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/memory_management_tools/destroy_neg.cc
@@ -15,7 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct/94540.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct/94540.cc
index 51fb18949e4..865f29cddbd 100644
--- a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct/94540.cc
++

[committed 9/9] libstdc++: Remove redundant -std=gnu++17 options from remaining tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.

commit 0498d2d09a2364aae1e6b5e085c8ebb8fc517684
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:54 2021

libstdc++: Remove redundant -std=gnu++17 option from remaining tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/headers/c++2017/all_attributes.cc: Remove
-std=gnu++17 from dg-options directive.
* testsuite/17_intro/headers/c++2017/all_no_exceptions.cc:
Likewise.
* testsuite/17_intro/headers/c++2017/all_pedantic_errors.cc:
Likewise.
* testsuite/17_intro/headers/c++2017/operator_names.cc:
Likewise.
* testsuite/17_intro/headers/c++2017/parallel_mode.cc: Likewise.
* testsuite/17_intro/headers/c++2017/stdc++.cc: Likewise.
* testsuite/17_intro/headers/c++2017/stdc++_multiple_inclusion.cc:
Likewise.
* testsuite/18_support/aligned_alloc/aligned_alloc.cc: Likewise.
* testsuite/18_support/byte/81076.cc: Likewise.
* testsuite/18_support/byte/global_neg.cc: Likewise.
* testsuite/18_support/byte/ops.cc: Likewise.
* testsuite/18_support/byte/requirements.cc: Likewise.
* testsuite/18_support/headers/cfloat/values_c++17.cc: Likewise.
* testsuite/18_support/launder/1.cc: Likewise.
* testsuite/18_support/launder/nodiscard.cc: Likewise.
* testsuite/18_support/launder/requirements.cc: Likewise.
* testsuite/18_support/launder/requirements_neg.cc: Likewise.
* testsuite/18_support/new_aligned.cc: Likewise.
* testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc:
Likewise.
* testsuite/19_diagnostics/error_code/is_error_code_v.cc:
Likewise.
* testsuite/19_diagnostics/error_condition/hash.cc: Likewise.
* testsuite/20_util/addressof/requirements/constexpr.cc:
Likewise.
* testsuite/20_util/as_const/1.cc: Likewise.
* testsuite/20_util/as_const/rvalue_neg.cc: Likewise.
* testsuite/20_util/bind/83427.cc: Likewise.
* testsuite/20_util/bind/is_placeholder_v.cc: Likewise.
* testsuite/20_util/bool_constant/requirements.cc: Likewise.
* testsuite/20_util/duration/arithmetic/constexpr_c++17.cc:
Likewise.
* testsuite/20_util/duration/requirements/treat_as_floating_point_v.cc:
Likewise.
* testsuite/20_util/duration_cast/rounding.cc: Likewise.
* testsuite/20_util/enable_shared_from_this/members/weak_from_this.cc:
Likewise.
* testsuite/20_util/from_chars/1_neg.cc: Likewise.
* testsuite/20_util/from_chars/requirements.cc: Likewise.
* testsuite/20_util/function/91456.cc: Likewise.
* testsuite/20_util/function/cons/deduction.cc: Likewise.
* testsuite/20_util/function_objects/83607.cc: Likewise.
* testsuite/20_util/function_objects/invoke/59768.cc: Likewise.
* testsuite/20_util/function_objects/mem_fn/80478.cc: Likewise.
* testsuite/20_util/function_objects/not_fn/1.cc: Likewise.
* testsuite/20_util/function_objects/not_fn/87538.cc: Likewise.
* testsuite/20_util/has_unique_object_representations/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/20_util/has_unique_object_representations/requirements/typedefs.cc:
Likewise.
* testsuite/20_util/has_unique_object_representations/value.cc:
Likewise.
* testsuite/20_util/hash/nullptr.cc: Likewise.
* testsuite/20_util/in_place/requirements.cc: Likewise.
* testsuite/20_util/is_aggregate/incomplete_neg.cc: Likewise.
* testsuite/20_util/is_aggregate/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/20_util/is_aggregate/requirements/typedefs.cc:
Likewise.
* testsuite/20_util/is_aggregate/value.cc: Likewise.
* testsuite/20_util/is_invocable/83395.cc: Likewise.
* testsuite/20_util/is_invocable/91456.cc: Likewise.
* testsuite/20_util/is_invocable/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/20_util/is_invocable/requirements/typedefs.cc:
Likewise.
* testsuite/20_util/is_invocable/value.cc: Likewise.
* testsuite/20_util/is_literal_type/deprecated-1z.cc: Likewise.
  

Re: [committed 7/9] libstdc++: Remove redundant -std=gnu++17 options from containers tests

2021-05-10 Thread Jonathan Wakely via Gcc-patches

On 10/05/21 16:29 +0100, Jonathan Wakely wrote:

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

Tested powerpc64le-linux. Committed to trunk.


With the patch attached this time.


commit 7c85abec763095045ba3f78c6656117dd8f1fd01
Author: Jonathan Wakely 
Date:   Mon May 10 16:22:53 2021

libstdc++: Remove redundant -std=gnu++17 option from containers tests

GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/array/cons/deduction.cc: Remove
-std=gnu++17 from dg-options directive.
* testsuite/23_containers/array/cons/deduction_neg.cc: Likewise.
* testsuite/23_containers/array/element_access/constexpr_c++17.cc:
Likewise.
* testsuite/23_containers/array/requirements/constexpr_iter.cc:
Likewise.
* testsuite/23_containers/array/specialized_algorithms/swap_cxx17.cc:
Likewise.
* testsuite/23_containers/deque/cons/deduction.cc: Likewise.
* testsuite/23_containers/deque/modifiers/emplace/cxx17_return.cc:
Likewise.
* testsuite/23_containers/forward_list/cons/deduction.cc:
Likewise.
* testsuite/23_containers/forward_list/modifiers/emplace_cxx17_return.cc:
Likewise.
* testsuite/23_containers/list/cons/deduction.cc: Likewise.
* testsuite/23_containers/list/modifiers/emplace/cxx17_return.cc:
Likewise.
* testsuite/23_containers/map/cons/deduction.cc: Likewise.
* testsuite/23_containers/map/modifiers/extract.cc: Likewise.
* testsuite/23_containers/map/modifiers/insert/83226.cc:
Likewise.
* testsuite/23_containers/map/modifiers/insert_or_assign/1.cc:
Likewise.
* testsuite/23_containers/map/modifiers/merge.cc: Likewise.
* testsuite/23_containers/map/modifiers/try_emplace/1.cc:
Likewise.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multimap/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/multimap/modifiers/merge.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/multiset/modifiers/merge.cc: Likewise.
* testsuite/23_containers/priority_queue/deduction.cc: Likewise.
* testsuite/23_containers/queue/deduction.cc: Likewise.
* testsuite/23_containers/queue/members/emplace_cxx17_return.cc:
Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/modifiers/extract.cc: Likewise.
* testsuite/23_containers/set/modifiers/merge.cc: Likewise.
* testsuite/23_containers/set/modifiers/node_swap.cc: Likewise.
* testsuite/23_containers/stack/deduction.cc: Likewise.
* testsuite/23_containers/stack/members/emplace_cxx17_return.cc:
Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_map/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/unordered_map/modifiers/insert_or_assign.cc:
Likewise.
* testsuite/23_containers/unordered_map/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_map/modifiers/try_emplace.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/extract.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/vector/bool/emplace_cxx17_return.cc:
Likewise.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Li

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> Do you know Eric where version.o needs to be added to be included in the
> problematic command line?

You can presumably remove it from GNATLINK_OBJS & GNATMAKE_OBJS.  And it needs 
to be added to GNAT1_C_OBJS instead of GNAT_ADA_OBJS in Make-lang.in.

-- 
Eric Botcazou





[PATCH] ipa: Get rid of IPA_NODE_REF and IPA_EDGE_REF

2021-05-10 Thread Martin Jambor
Hi,

the node and edge summaries defined in ipa-prop.h are probably the
oldest in GCC and so it happened that they are the only ones using
macros to look them up and create them.  With Honza and Martin we
agreed it is ugly and the macros should be removed and the ipa-prop
summaries should be accessed like all the other ones but somehow I
never got to it until now.

The patch is mostly mechanical.  Because the lookup machinery was much
simpler in the old times (something like the fast summaries we have
today), a lot of code queried for the summary multiple times for no
good reasons and I fixed that in places where it was easy.

Also, before we switched to hash based summaries, new summary pointers
had to be obtained whenever the underlying array could be reallocated
because of new cgraph nodes/edges.  This is no longer necessary and so
I removed the instances which I found.

Both kinds of these non-mechanical changes should be specifically called
out in the ChangeLog.

I also removed the IS_VALID_JUMP_FUNC_INDEX macro because it not used
anywhere.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2021-05-07  Martin Jambor  

* ipa-prop.h (IPA_NODE_REF): Removed.
(IPA_NODE_REF_GET_CREATE): Likewise.
(IPA_EDGE_REF): Likewise.
(IPA_EDGE_REF_GET_CREATE): Likewise.
(IS_VALID_JUMP_FUNC_INDEX): Likewise.
* ipa-cp.c (print_all_lattices): Replaced IPA_NODE_REF with a direct
use of ipa_node_params_sum.
(ipcp_versionable_function_p): Likewise.
(push_node_to_stack): Likewise.
(pop_node_from_stack): Likewise.
(set_single_call_flag): Replaced two IPA_NODE_REF with one single
direct use of ipa_node_params_sum.
(initialize_node_lattices): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(ipa_context_from_jfunc): Replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
(ipcp_verify_propagated_values): Replaced IPA_NODE_REF with a direct
use of ipa_node_params_sum.
(self_recursively_generated_p): Likewise.
(propagate_scalar_across_jump_function): Likewise.
(propagate_context_across_jump_function): Replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum, moved the lookup after the early
exit.  Replaced IPA_NODE_REF with a direct use of ipa_node_params_sum.
(propagate_bits_across_jump_function): Replaced IPA_NODE_REF with
direct uses of ipa_node_params_sum.
(propagate_vr_across_jump_function): Likewise.
(propagate_aggregate_lattice): Likewise.
(propagate_aggs_across_jump_function): Likewise.
(propagate_constants_across_call): Likewise, also replaced
IPA_EDGE_REF with a direct use of ipa_edge_args_sum.
(good_cloning_opportunity_p): Replaced IPA_NODE_REF with a direct use
of ipa_node_params_sum.
(estimate_local_effects): Likewise.
(add_all_node_vals_to_toposort): Likewise.
(propagate_constants_topo): Likewise.
(ipcp_propagate_stage): Likewise.
(ipcp_discover_new_direct_edges): Likewise.
(calls_same_node_or_its_all_contexts_clone_p): Likewise.
(cgraph_edge_brings_value_p): Likewise (in both overloaded functions).
(get_info_about_necessary_edges): Likewise.
(want_remove_some_param_p): Likewise.
(create_specialized_node): Likewise.
(self_recursive_pass_through_p): Likewise.
(self_recursive_agg_pass_through_p): Likewise.
(find_more_scalar_values_for_callers_subset): Likewise and also
replaced IPA_EDGE_REF with direct uses of ipa_edge_args_sum, in one
case replacing two of those with a single query.
(find_more_contexts_for_caller_subset): Likewise for the
ipa_polymorphic_call_context overload.
(intersect_aggregates_with_edge): Replaced IPA_EDGE_REF with a direct
use of ipa_edge_args_sum.  Replaced IPA_NODE_REF with direct uses of
ipa_node_params_sum.
(find_aggregate_values_for_callers_subset): Likewise, also reusing
results of ipa_edge_args_sum->get.
(cgraph_edge_brings_all_scalars_for_node): Replaced IPA_NODE_REF with
direct uses of ipa_node_params_sum, replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum.
(cgraph_edge_brings_all_agg_vals_for_node): Likewise, moved node
summary query after the early exit and reused the result later.
(decide_about_value): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(decide_whether_version_node): Likewise.  Removed re-querying for
summaries after cloning.
(spread_undeadness): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(has_undead_caller_from_outside_scc_p): Likewise, reusing results of
some queries.
(identify_dead_nodes): Likewise.
(ipcp_store_bi

Re: [PATCH] ipa: Get rid of IPA_NODE_REF and IPA_EDGE_REF

2021-05-10 Thread Jan Hubicka
> Hi,
> 
> the node and edge summaries defined in ipa-prop.h are probably the
> oldest in GCC and so it happened that they are the only ones using
> macros to look them up and create them.  With Honza and Martin we
> agreed it is ugly and the macros should be removed and the ipa-prop
> summaries should be accessed like all the other ones but somehow I
> never got to it until now.
> 
> The patch is mostly mechanical.  Because the lookup machinery was much
> simpler in the old times (something like the fast summaries we have
> today), a lot of code queried for the summary multiple times for no
> good reasons and I fixed that in places where it was easy.
> 
> Also, before we switched to hash based summaries, new summary pointers
> had to be obtained whenever the underlying array could be reallocated
> because of new cgraph nodes/edges.  This is no longer necessary and so
> I removed the instances which I found.
> 
> Both kinds of these non-mechanical changes should be specifically called
> out in the ChangeLog.
> 
> I also removed the IS_VALID_JUMP_FUNC_INDEX macro because it not used
> anywhere.
> 
> Bootstrapped and tested on x86_64-linux.  OK for trunk?
OK,
thanks!
honza
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-05-07  Martin Jambor  
> 
>   * ipa-prop.h (IPA_NODE_REF): Removed.
>   (IPA_NODE_REF_GET_CREATE): Likewise.
>   (IPA_EDGE_REF): Likewise.
>   (IPA_EDGE_REF_GET_CREATE): Likewise.
>   (IS_VALID_JUMP_FUNC_INDEX): Likewise.
>   * ipa-cp.c (print_all_lattices): Replaced IPA_NODE_REF with a direct
>   use of ipa_node_params_sum.
>   (ipcp_versionable_function_p): Likewise.
>   (push_node_to_stack): Likewise.
>   (pop_node_from_stack): Likewise.
>   (set_single_call_flag): Replaced two IPA_NODE_REF with one single
>   direct use of ipa_node_params_sum.
>   (initialize_node_lattices): Replaced IPA_NODE_REF with a direct use of
>   ipa_node_params_sum.
>   (ipa_context_from_jfunc): Replaced IPA_EDGE_REF with a direct use of
>   ipa_edge_args_sum.
>   (ipcp_verify_propagated_values): Replaced IPA_NODE_REF with a direct
>   use of ipa_node_params_sum.
>   (self_recursively_generated_p): Likewise.
>   (propagate_scalar_across_jump_function): Likewise.
>   (propagate_context_across_jump_function): Replaced IPA_EDGE_REF with a
>   direct use of ipa_edge_args_sum, moved the lookup after the early
>   exit.  Replaced IPA_NODE_REF with a direct use of ipa_node_params_sum.
>   (propagate_bits_across_jump_function): Replaced IPA_NODE_REF with
>   direct uses of ipa_node_params_sum.
>   (propagate_vr_across_jump_function): Likewise.
>   (propagate_aggregate_lattice): Likewise.
>   (propagate_aggs_across_jump_function): Likewise.
>   (propagate_constants_across_call): Likewise, also replaced
>   IPA_EDGE_REF with a direct use of ipa_edge_args_sum.
>   (good_cloning_opportunity_p): Replaced IPA_NODE_REF with a direct use
>   of ipa_node_params_sum.
>   (estimate_local_effects): Likewise.
>   (add_all_node_vals_to_toposort): Likewise.
>   (propagate_constants_topo): Likewise.
>   (ipcp_propagate_stage): Likewise.
>   (ipcp_discover_new_direct_edges): Likewise.
>   (calls_same_node_or_its_all_contexts_clone_p): Likewise.
>   (cgraph_edge_brings_value_p): Likewise (in both overloaded functions).
>   (get_info_about_necessary_edges): Likewise.
>   (want_remove_some_param_p): Likewise.
>   (create_specialized_node): Likewise.
>   (self_recursive_pass_through_p): Likewise.
>   (self_recursive_agg_pass_through_p): Likewise.
>   (find_more_scalar_values_for_callers_subset): Likewise and also
>   replaced IPA_EDGE_REF with direct uses of ipa_edge_args_sum, in one
>   case replacing two of those with a single query.
>   (find_more_contexts_for_caller_subset): Likewise for the
>   ipa_polymorphic_call_context overload.
>   (intersect_aggregates_with_edge): Replaced IPA_EDGE_REF with a direct
>   use of ipa_edge_args_sum.  Replaced IPA_NODE_REF with direct uses of
>   ipa_node_params_sum.
>   (find_aggregate_values_for_callers_subset): Likewise, also reusing
>   results of ipa_edge_args_sum->get.
>   (cgraph_edge_brings_all_scalars_for_node): Replaced IPA_NODE_REF with
>   direct uses of ipa_node_params_sum, replaced IPA_EDGE_REF with a
>   direct use of ipa_edge_args_sum.
>   (cgraph_edge_brings_all_agg_vals_for_node): Likewise, moved node
>   summary query after the early exit and reused the result later.
>   (decide_about_value): Replaced IPA_NODE_REF with a direct use of
>   ipa_node_params_sum.
>   (decide_whether_version_node): Likewise.  Removed re-querying for
>   summaries after cloning.
>   (spread_undeadness): Replaced IPA_NODE_REF with a direct use of
>   ipa_node_params_sum.
>   (has_undead_caller_from_outside_scc_p): Likewise, reusing result

Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-10 Thread Richard Sandiford via Gcc-patches
"H.J. Lu"  writes:
> On Mon, May 10, 2021 at 6:59 AM Richard Biener
>  wrote:
>>
>> On Mon, May 10, 2021 at 3:29 PM H.J. Lu  wrote:
>> >
>> > On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
>> >  wrote:
>> > >
>> > > Richard Biener via Gcc-patches  writes:
>> > > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
>> > > >  wrote:
>> > > >>
>> > > >> "H.J. Lu via Gcc-patches"  writes:
>> > > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  wrote:
>> > > >> >>
>> > > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
>> > > >> >>  wrote:
>> > > >> >> >
>> > > >> >> > "H.J. Lu via Gcc-patches"  writes:
>> > > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
>> > > >> >> > >  wrote:
>> > > >> >> > >>
>> > > >> >> > >> "H.J. Lu via Gcc-patches"  writes:
>> > > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo 
>> > > >> >> > >> > registers so that
>> > > >> >> > >> > associated hard registers can be properly spilled onto 
>> > > >> >> > >> > stack.  But there
>> > > >> >> > >> > are cases where associated hard registers will never be 
>> > > >> >> > >> > spilled onto
>> > > >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for 
>> > > >> >> > >> > register alignment
>> > > >> >> > >> > so that stack realignment can be avoided when not needed.
>> > > >> >> > >>
>> > > >> >> > >> How is it guaranteed that they will never be spilled though?
>> > > >> >> > >> I don't think that that guarantee exists for any kind of 
>> > > >> >> > >> pseudo,
>> > > >> >> > >> except perhaps for the temporary pseudos that the RA creates 
>> > > >> >> > >> to
>> > > >> >> > >> replace (match_scratch …)es.
>> > > >> >> > >>
>> > > >> >> > >
>> > > >> >> > > The caller of creating pseudo registers with specific 
>> > > >> >> > > alignment must
>> > > >> >> > > guarantee that they will never be spilled.   I am only using 
>> > > >> >> > > it in
>> > > >> >> > >
>> > > >> >> > >   /* Make operand1 a register if it isn't already.  */
>> > > >> >> > >   if (can_create_pseudo_p ()
>> > > >> >> > >   && !register_operand (op0, mode)
>> > > >> >> > >   && !register_operand (op1, mode))
>> > > >> >> > > {
>> > > >> >> > >   /* NB: Don't increase stack alignment requirement when 
>> > > >> >> > > forcing
>> > > >> >> > >  operand1 into a pseudo register to copy data from one 
>> > > >> >> > > memory
>> > > >> >> > >  location to another since it doesn't require a spill. 
>> > > >> >> > >  */
>> > > >> >> > >   emit_move_insn (op0,
>> > > >> >> > >   force_reg (GET_MODE (op0), op1,
>> > > >> >> > >  (UNITS_PER_WORD * 
>> > > >> >> > > BITS_PER_UNIT)));
>> > > >> >> > >   return;
>> > > >> >> > > }
>> > > >> >> > >
>> > > >> >> > > for vector moves.  RA shouldn't spill it.
>> > > >> >> >
>> > > >> >> > But this is the point: it's a case of hoping that the RA won't 
>> > > >> >> > spill it,
>> > > >> >> > rather than having a guarantee that it won't.
>> > > >> >> >
>> > > >> >> > Even if the moves start out adjacent, they could be separated by 
>> > > >> >> > later
>> > > >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
>> > > >> >> > scheduling
>> > > >> >> > isn't enabled by default for x86, but it can still be enabled 
>> > > >> >> > explicitly.)
>> > > >> >> > Or if the same data is being copied to two locations, we might 
>> > > >> >> > reuse
>> > > >> >> > values loaded by the first copy for the second copy as well.
>> > > >> >
>> > > >> > There are cases where pseudo vector registers are created as pure
>> > > >> > temporary registers in the backend and they shouldn't ever be 
>> > > >> > spilled
>> > > >> > to stack.   They will be spilled to stack only if there are other 
>> > > >> > non-temporary
>> > > >> > vector register usage in which case stack will be properly 
>> > > >> > re-aligned.
>> > > >> > Caller of creating pseudo registers with specific alignment 
>> > > >> > guarantees
>> > > >> > that they are used only as pure temporary registers.
>> > > >>
>> > > >> I don't think there's really a distinct category of pure temporary
>> > > >> registers though.  The things I mentioned above can happen for any
>> > > >> kind of pseudo register.
>> > > >
>> > > > I wonder if for the cases HJ thinks of it is appropriate to use 
>> > > > hardregs?
>> > > > Do we generally handle those well?  That is, are they again subject
>> > > > to be allocated by RA when no longer live?
>> > >
>> > > Yeah, using hard registers should work.  Of course, any given fixed 
>> > > choice
>> > > of hard register has the potential to be suboptimal in some situation,
>> > > but it should be safe.
>> >
>> > I tried hard registers.  The generated code isn't as good as pseudo 
>> > registers.
>> > But I want to avoid align the shack when YMM registers are only used to
>> > inline memcpy/memset.  Any suggestions?
>>
>> I wonder if we can mark pseudos with a new reg flag, like 'nospill' and

Re: [PATCH]AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown

2021-05-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> Currently when using -mcpu=native or -march=native on a CPU that is unknown to
> the compiler the compiler currently just used -march=armv8-a and enables none
> of the extensions.
>
> To make this a bit more useful this patch changes it to still use 
> -march=armv8.a
> but to enable the extensions.  We still cannot do tuning but at least if using
> this on a future SVE core the compiler will at the very least enable SVE etc.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

OK, thanks.

We'll have to collectively remember that this means we shouldn't try
to enforce minimum architecture versions for features in future.
E.g. -march=armv8-a+sve should remain valid.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/driver-aarch64.c (DEFAULT_ARCH): New.
>   (host_detect_local_cpu): Use it.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cpunative/info_16: New test.
>   * gcc.target/aarch64/cpunative/info_17: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_16.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_17.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/driver-aarch64.c 
> b/gcc/config/aarch64/driver-aarch64.c
> index 
> e2935a1156412c898ea086feb0d698ec92107652..b58591d497461cae6e8014fa39afd9dd26ae67bf
>  100644
> --- a/gcc/config/aarch64/driver-aarch64.c
> +++ b/gcc/config/aarch64/driver-aarch64.c
> @@ -58,6 +58,8 @@ struct aarch64_core_data
>  #define INVALID_IMP ((unsigned char) -1)
>  #define INVALID_CORE ((unsigned)-1)
>  #define ALL_VARIANTS ((unsigned)-1)
> +/* Default architecture to use if -mcpu=native did not detect a known CPU.  
> */
> +#define DEFAULT_ARCH "8A"
>  
>  #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
> PART, VARIANT) \
>{ CORE_NAME, #ARCH, IMP, PART, VARIANT, FLAGS },
> @@ -390,10 +392,18 @@ host_detect_local_cpu (int argc, const char **argv)
>  && (aarch64_cpu_data[i].variant == ALL_VARIANTS
>  || variants[0] == aarch64_cpu_data[i].variant))
> break;
> +
>if (aarch64_cpu_data[i].name == NULL)
> -goto not_found;
> + {
> +   aarch64_arch_driver_info* arch_info
> + = get_arch_from_id (DEFAULT_ARCH);
> +
> +   gcc_assert (arch_info);
>  
> -  if (arch)
> +   res = concat ("-march=", arch_info->name, NULL);
> +   default_flags = arch_info->flags;
> + }
> +  else if (arch)
>   {
> const char *arch_id = aarch64_cpu_data[i].arch;
> aarch64_arch_driver_info* arch_info = get_arch_from_id (arch_id);
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
> new file mode 100644
> index 
> ..b0679579d9167d46c832e55cb63d9077f7a80f70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
> @@ -0,0 +1,8 @@
> +processor: 0
> +BogoMIPS : 100.00
> +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
> +CPU implementer  : 0xff
> +CPU architecture: 8
> +CPU variant  : 0x0
> +CPU part : 0xd08
> +CPU revision : 2
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
> new file mode 100644
> index 
> ..b0679579d9167d46c832e55cb63d9077f7a80f70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
> @@ -0,0 +1,8 @@
> +processor: 0
> +BogoMIPS : 100.00
> +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
> +CPU implementer  : 0xff
> +CPU architecture: 8
> +CPU variant  : 0x0
> +CPU part : 0xd08
> +CPU revision : 2
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> new file mode 100644
> index 
> ..a424e7c56c782ca6e6917248e2fa7a18eb94e06a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
> +/* { dg-set-compiler-env-var GCC_CPUINFO 
> "$srcdir/gcc.target/aarch64/cpunative/info_16" } */
> +/* { dg-additional-options "-mcpu=native" } */
> +
> +int main()
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+crc\+dotprod\+sve2} 
> } } */
> +
> +/* Test a normal looking procinfo.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> new file mode 100644
> index 
> ..8104761be927275207318a834f03041b627856b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { { aarch64*-*-linu

Re: [PATCH] arm: remove error in CPP_SPEC when float-abi soft and hard are used together

2021-05-10 Thread Richard Earnshaw via Gcc-patches




On 22/04/2021 08:01, Christophe Lyon via Gcc-patches wrote:

arm.h has had this error message since 1997, and was never updated to
take softfp into account. Anyway, it seems it was useful long ago, but
it is no longer needed since option parsing has been improved:
-mfloat-abi is handled via arm.opt and updates the var_float_abi
variable. So, the last instance of -mfloat-abi= on the command line
wins.



Yeah, at the time it was added, the specs lines were used to directly 
add preprocessor values and it was impossible at the time to deal with a 
command-line where both variants were specified in terms of adding (or 
not adding the correct defines).  As you say, things have improved 
significantly in that area and we can now eliminate this error.


I may have missed it, but do you have a similar patch for eliminating 
the big/little endian error?  That was added for the same basic reason, 
but is now redundant as well.



This patch just removes this error message, thus enabling many more
tests to pass on arm-eabi:

* with -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd (2 more passes)
gcc.target/arm/pr52375.c
g++.target/arm/pr99593.C (test for excess errors)

* with -mthumb/-mfloat-abi=soft/-march=armv6s-m (115 more passes in C, 90 more 
in C++)
gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
g++.target/arm/no_unique_address_1.C
g++.target/arm/no_unique_address_2.C

* with -mthumb/-mfloat-abi=soft/-march=armv7-m (153 more passes in C, 90 more 
in C++)
gcc.dg/pr59418.c (test for excess errors)
gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
gcc.target/arm/bfloat16_scalar_2_1.c (test for excess errors)
gcc.target/arm/bfloat16_scalar_3_1.c (test for excess errors)
gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
gcc.target/arm/pr52375.c (test for excess errors)
gcc.target/arm/simd/vld1_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vldn_lane_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vst1_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vstn_lane_bf16_1.c (test for excess errors)
g++.target/arm/no_unique_address_1.C
g++.target/arm/no_unique_address_2.C

* with -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp (65 more passes)
gcc.target/arm/atomic-comp-swap-release-acquire-3.c (test for excess errors)
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-not dmb
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times ldaex 4
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times stlex 4
gcc.target/arm/atomic-op-acq_rel-3.c (test for excess errors)
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times stlex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acquire-3.c (test for excess errors)
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-char-3.c (test for excess errors)
gcc.target/arm/atomic-op-char-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-char-3.c scan-assembler-times ldrexb\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-char-3.c scan-assembler-times strexb\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-consume-3.c (test for excess errors)
gcc.target/arm/atomic-op-consume-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-consume-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-consume-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-int-3.c (test for excess errors)
gcc.target/arm/atomic-op-int-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-int-3.c scan-assembler-times ldrex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-int-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-relaxed-3.c (test for excess errors)

Re: [PATCH] arm: remove error in CPP_SPEC when float-abi soft and hard are used together

2021-05-10 Thread Christophe Lyon via Gcc-patches
On Mon, 10 May 2021 at 18:32, Richard Earnshaw
 wrote:
>
>
>
> On 22/04/2021 08:01, Christophe Lyon via Gcc-patches wrote:
> > arm.h has had this error message since 1997, and was never updated to
> > take softfp into account. Anyway, it seems it was useful long ago, but
> > it is no longer needed since option parsing has been improved:
> > -mfloat-abi is handled via arm.opt and updates the var_float_abi
> > variable. So, the last instance of -mfloat-abi= on the command line
> > wins.
> >
>
> Yeah, at the time it was added, the specs lines were used to directly
> add preprocessor values and it was impossible at the time to deal with a
> command-line where both variants were specified in terms of adding (or
> not adding the correct defines).  As you say, things have improved
> significantly in that area and we can now eliminate this error.
>
> I may have missed it, but do you have a similar patch for eliminating
> the big/little endian error?  That was added for the same basic reason,
> but is now redundant as well.

No, because I didn't see any undesirable error related to that in
validation logs,
but sure I can prepare one.

>
> > This patch just removes this error message, thus enabling many more
> > tests to pass on arm-eabi:
> >
> > * with -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd (2 more passes)
> > gcc.target/arm/pr52375.c
> > g++.target/arm/pr99593.C (test for excess errors)
> >
> > * with -mthumb/-mfloat-abi=soft/-march=armv6s-m (115 more passes in C, 90 
> > more in C++)
> > gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
> > g++.target/arm/no_unique_address_1.C
> > g++.target/arm/no_unique_address_2.C
> >
> > * with -mthumb/-mfloat-abi=soft/-march=armv7-m (153 more passes in C, 90 
> > more in C++)
> > gcc.dg/pr59418.c (test for excess errors)
> > gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
> > gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
> > gcc.target/arm/bfloat16_scalar_2_1.c (test for excess errors)
> > gcc.target/arm/bfloat16_scalar_3_1.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
> > gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
> > gcc.target/arm/pr52375.c (test for excess errors)
> > gcc.target/arm/simd/vld1_bf16_1.c (test for excess errors)
> > gcc.target/arm/simd/vldn_lane_bf16_1.c (test for excess errors)
> > gcc.target/arm/simd/vst1_bf16_1.c (test for excess errors)
> > gcc.target/arm/simd/vstn_lane_bf16_1.c (test for excess errors)
> > g++.target/arm/no_unique_address_1.C
> > g++.target/arm/no_unique_address_2.C
> >
> > * with -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp (65 more passes)
> > gcc.target/arm/atomic-comp-swap-release-acquire-3.c (test for excess errors)
> > gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-not dmb
> > gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times 
> > ldaex 4
> > gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times 
> > stlex 4
> > gcc.target/arm/atomic-op-acq_rel-3.c (test for excess errors)
> > gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-not dmb
> > gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times ldaex\tr[0-9]+, 
> > \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times stlex\t...?, 
> > r[0-9]+, \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-acquire-3.c (test for excess errors)
> > gcc.target/arm/atomic-op-acquire-3.c scan-assembler-not dmb
> > gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times ldaex\tr[0-9]+, 
> > \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times strex\t...?, 
> > r[0-9]+, \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-char-3.c (test for excess errors)
> > gcc.target/arm/atomic-op-char-3.c scan-assembler-not dmb
> > gcc.target/arm/atomic-op-char-3.c scan-assembler-times ldrexb\tr[0-9]+, 
> > \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-char-3.c scan-assembler-times strexb\t...?, 
> > r[0-9]+, \\[r[0-9]+\\] 6
> > gcc.target/arm/atomic-op-consume-3.c (test for excess errors)
> > gcc.target/arm/atomic-op-consume-3.c scan-assembler-not dmb
> > gcc.target/arm/atomic-op-consume-3.c scan-assembler-times ldae

Re: [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.

2021-05-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de17126432e2a63a32b
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -648,6 +648,22 @@ (define_expand "dot_prod"
>DONE;
>  })
>  
> +;; Auto-vectorizer pattern for usdot
> +(define_expand "usdot_prod"
> +  [(set (match_operand:VS 0 "register_operand")
> + (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
> + (match_operand: 2 "register_operand")]
> +  UNSPEC_USDOT)
> + (match_operand:VS 3 "register_operand")))]
> +  "TARGET_I8MM"
> +{
> +  emit_insn (
> +gen_aarch64_usdot (operands[3], operands[3], operands[1],
> +operands[2]));
> +  emit_move_insn (operands[0], operands[3]);
> +  DONE;
> +})

We can't modify operands[3] here; it's an input rather than an output.

It looks like this would work with just the {…} removed though.
The pattern will match aarch64_usdot on its own accord.

Even better would be to rename __builtin_aarch64_usdot… to
__builtin_usdot_prod…, change its arguments so that they line up
with the optabs, and change arm_neon.h to match.

> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> new file mode 100644
> index 
> ..b99a945903c043c7410becaf6f09496dd038410d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> new file mode 100644
> index 
> ..094dd51cea62e0ba05ec3505657bf05320e5fdbb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm+sve" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tusdot\t} 2 } } */

Guess this is personal preference, but I don't think the SIGNEDNESS_*
macros add anything when used like this.  I remember doing something
similar in the past when including .c files from other .c files(!)
in order to avoid cut-&-paste, but there doesn't seem much benefit
for standalone files like these.

Thanks,
Richard


RE: [PATCH]AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown

2021-05-10 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, May 10, 2021 5:31 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64: Have -mcpu=native and -march=native enable
> extensions when CPU is unknown
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > Currently when using -mcpu=native or -march=native on a CPU that is
> > unknown to the compiler the compiler currently just used
> > -march=armv8-a and enables none of the extensions.
> >
> > To make this a bit more useful this patch changes it to still use
> > -march=armv8.a but to enable the extensions.  We still cannot do
> > tuning but at least if using this on a future SVE core the compiler will at 
> > the
> very least enable SVE etc.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> OK, thanks.
> 
> We'll have to collectively remember that this means we shouldn't try to
> enforce minimum architecture versions for features in future.
> E.g. -march=armv8-a+sve should remain valid.

The Attached testcases would fail if we do forget! 😊

Regards,
Tamar


> Richard
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/driver-aarch64.c (DEFAULT_ARCH): New.
> > (host_detect_local_cpu): Use it.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/cpunative/info_16: New test.
> > * gcc.target/aarch64/cpunative/info_17: New test.
> > * gcc.target/aarch64/cpunative/native_cpu_16.c: New test.
> > * gcc.target/aarch64/cpunative/native_cpu_17.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/driver-aarch64.c
> > b/gcc/config/aarch64/driver-aarch64.c
> > index
> >
> e2935a1156412c898ea086feb0d698ec92107652..b58591d497461cae6e8014fa3
> 9af
> > d9dd26ae67bf 100644
> > --- a/gcc/config/aarch64/driver-aarch64.c
> > +++ b/gcc/config/aarch64/driver-aarch64.c
> > @@ -58,6 +58,8 @@ struct aarch64_core_data  #define INVALID_IMP
> > ((unsigned char) -1)  #define INVALID_CORE ((unsigned)-1)  #define
> > ALL_VARIANTS ((unsigned)-1)
> > +/* Default architecture to use if -mcpu=native did not detect a known
> > +CPU.  */ #define DEFAULT_ARCH "8A"
> >
> >  #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS,
> COSTS, IMP, PART, VARIANT) \
> >{ CORE_NAME, #ARCH, IMP, PART, VARIANT, FLAGS }, @@ -390,10
> +392,18
> > @@ host_detect_local_cpu (int argc, const char **argv)
> >  && (aarch64_cpu_data[i].variant == ALL_VARIANTS
> >  || variants[0] == aarch64_cpu_data[i].variant))
> >   break;
> > +
> >if (aarch64_cpu_data[i].name == NULL)
> > -goto not_found;
> > +   {
> > + aarch64_arch_driver_info* arch_info
> > +   = get_arch_from_id (DEFAULT_ARCH);
> > +
> > + gcc_assert (arch_info);
> >
> > -  if (arch)
> > + res = concat ("-march=", arch_info->name, NULL);
> > + default_flags = arch_info->flags;
> > +   }
> > +  else if (arch)
> > {
> >   const char *arch_id = aarch64_cpu_data[i].arch;
> >   aarch64_arch_driver_info* arch_info = get_arch_from_id (arch_id);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
> > b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
> > new file mode 100644
> > index
> >
> ..b0679579d9167d46c832e55cb
> 63d
> > 9077f7a80f70
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
> > @@ -0,0 +1,8 @@
> > +processor  : 0
> > +BogoMIPS   : 100.00
> > +Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve
> sve2
> > +CPU implementer: 0xff
> > +CPU architecture: 8
> > +CPU variant: 0x0
> > +CPU part   : 0xd08
> > +CPU revision   : 2
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
> > b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
> > new file mode 100644
> > index
> >
> ..b0679579d9167d46c832e55cb
> 63d
> > 9077f7a80f70
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
> > @@ -0,0 +1,8 @@
> > +processor  : 0
> > +BogoMIPS   : 100.00
> > +Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve
> sve2
> > +CPU implementer: 0xff
> > +CPU architecture: 8
> > +CPU variant: 0x0
> > +CPU part   : 0xd08
> > +CPU revision   : 2
> > diff --git
> > a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > new file mode 100644
> > index
> >
> ..a424e7c56c782ca6e6917248e2
> fa
> > 7a18eb94e06a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
> > +/* { dg-set-compiler-env-var GCC_CPUINFO
> > +"$srcdir/gcc.target/aarch64/cpunative/info_16" } */
> > +/* { dg

Re: [PATCH] libstdc++: Fix wrong thread waking on notify [PR100334]

2021-05-10 Thread Jonathan Wakely via Gcc-patches

On 03/05/21 09:43 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

This should also be backported to gcc-11


The additional _M_laundered data member changes the object layout.
That isn't safe for the branch. Would it be possible to smuggle that
flag in the least significant bit of the _M_addr member, which is
always aligned to more than one byte? Just on the gcc-11 branch, not
for trunk.



libstdc++/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until observe value change.
(__waiter_base::_M_laundered): New member.
(__watier_base::_M_notify): Check _M_laundered to determine
whether to wake one or all.
(__detail::__atomic_compare): Do not implicitly convert
result of __buildtin_memcpmp to bool,


Typos, and the description doesn't seem accurate (it wasn't implicitly
converting it to bool, there was always an explicit comparison, but
now it's == rather than !=).


(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.



OK for trunk with a fixed changelog, but we need a different patch for
the branch.




Re: [PATCH] OpenMP: Add support for 'close' in map clause

2021-05-10 Thread Jakub Jelinek via Gcc-patches
On Mon, May 10, 2021 at 04:11:39PM +0200, Marcel Vollweiler wrote:
> @@ -15660,37 +15665,54 @@ c_parser_omp_clause_map (c_parser *parser, tree 
> list)
>if (!parens.require_open (parser))
>  return list;
>  
> -  if (c_parser_next_token_is (parser, CPP_NAME))
> +  int always = 0;
> +  int close = 0;
> +  int pos = 1;
> +  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)

Nice, totally missed that Joseph has added this.

>  {
> -  c_token *tok = c_parser_peek_token (parser);
> +  c_token *tok = c_parser_peek_nth_token_raw (parser, pos);
>const char *p = IDENTIFIER_POINTER (tok->value);
> -  always_id_kind = tok->id_kind;
> -  always_loc = tok->location;
> -  always_id = tok->value;
>if (strcmp ("always", p) == 0)
>   {
> -   c_token *sectok = c_parser_peek_2nd_token (parser);
> -   if (sectok->type == CPP_COMMA)
> +   if (always)
>   {
> -   c_parser_consume_token (parser);
> -   c_parser_consume_token (parser);
> -   always = 2;
> +   c_parser_error (parser, "expected modifier % only once");

The usual wording would be
"too many % modifiers"

> +   parens.skip_until_found_close (parser);
> +   return list;
> + }
> +
> +   always_id_kind = tok->id_kind;
> +   always_loc = tok->location;
> +   always_id = tok->value;

But you don't need any of the always_{id_kind,loc,id} variables anymore,
so they should be removed and everything that touches them too.

> +
> +   always++;
> + }
> +  else if (strcmp ("close", p) == 0)
> + {
> +   if (close)
> + {
> +   c_parser_error (parser, "expected modifier % only once");

Similarly.

> +   parens.skip_until_found_close (parser);
> +   return list;
>   }
> -   else if (sectok->type == CPP_NAME)
> +
> +   close++;
> + }
> +  else if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == 
> CPP_COLON)

IMHO you should at least check that tok->type == CPP_NAME before
checking pos + 1 token's type, you don't want to skip over CPP_EOF,
CPP_PRAGMA_EOF, or even CPP_CLOSE_PAREN etc.
Perhaps by adding
  if (tok->type != CPP_NAME)
break;
right after c_token *tok = c_parser_peek_nth_token_raw (parser, pos); ?

> + {
> +   for (int i = 1; i < pos; ++i)
>   {
> -   p = IDENTIFIER_POINTER (sectok->value);
> -   if (strcmp ("alloc", p) == 0
> -   || strcmp ("to", p) == 0
> -   || strcmp ("from", p) == 0
> -   || strcmp ("tofrom", p) == 0
> -   || strcmp ("release", p) == 0
> -   || strcmp ("delete", p) == 0)
> - {
> -   c_parser_consume_token (parser);
> -   always = 1;
> - }
> +   c_parser_peek_token(parser);

Formatting, space before (

> +   c_parser_consume_token (parser);
>   }
> +   break;

And, IMHO something should clear always and close (btw, might be better
to use close_modifier as variable name and for consistency always_modifier)
unless we reach the CPP_COLON case.

Because we don't want
  map (always, close)
to imply
  map (always, close, tofrom: always, close)
but
  map (tofrom: always, close)
and my reading of your changes suggests that we actually use the
*_ALWAYS* kinds in that case.

> +   cp_parser_error (parser,
> +"expected modifier % only once");

See above.

> +   cp_parser_skip_to_closing_parenthesis (parser,
> +  /*recovering=*/true,
> +  /*or_comma=*/false,
> +  /*consume_paren=*/true);
> +   return list;
> + }
> +
> +   always = true;
> + }
> +  else if (strcmp ("close", p) == 0)
> + {
> +   if (close)
> + {
> +   cp_parser_error (parser,
> +"expected modifier % only once");

Likewise.

> +  else if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type
> +== CPP_COLON)
> + {
> +   for (int i = 1; i < pos; ++i)
> + cp_lexer_consume_token (parser->lexer);
> +   break;
> + }
> +  else
> + break;
> +
> +  if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type == 
> CPP_COMMA)
> + pos++;
> +  pos++;
>  }

Again, I don't see anything that would clear always/close if it didn't reach
the CPP_COLON case.

And it should be covered in the testcase.

Jakub



Re: [PATCH 2/2] ipa-sra: Improve debug info for removed parameters (PR 93385)

2021-05-10 Thread Martin Jambor
On Mon, May 10 2021, Richard Biener wrote:
> On Tue, Apr 27, 2021 at 5:26 PM Martin Jambor  wrote:
>>
>> Hi,
>>
>> Whereas the previous patch fixed issues with code left behind after
>> IPA-SRA removed a parameter but only reset all affected debug bind
>> statements, this one updates them with expressions which can allow the
>> debugger to print the removed value - see the added test-case.
>>
>> Even though I originally did not want to create DEBUG_EXPR_DECLs for
>> intermediate values, I ended up doing so, because otherwise the code
>> started creating statements like
>>
>># DEBUG __aD.198693 => &MEM[(const struct _Alloc_nodeD.171110 
>> *)D#195]._M_tD.184726->_M_implD.171154
>>
>> which not only is a bit scary but also gimple-fold ICEs on
>> it. Therefore I decided they are probably quite necessary and have
>> them.
>>
>> The patch simply notes each removed SSA name present in a debug
>> statement and then works from it backwards, looking if it can
>> reconstruct the expression it represents (which can fail if a
>> non-degenerate PHI node is in the way).  If it can, it populates two
>> hash maps with those expressions so that 1) removed assignments are
>> replaced with a debug bind defining a new intermediate debug_decl_expr
>> and 2) existing debug binds that refer to SSA names that are bing
>> removed now refer to corresponding debug_decl_exprs.
>
> Isn't this what insert_debug_temp_for_var_def already does when you
> remove a stmt and if you take care to do that back-to-front?  So with
> IPA SRA removing a parameter you'd "only" need to make sure to
> set up a debug stmt for the parameter itself and that be picked up
> for the (uninitialized) default-def you map to?
>

But there is no removal, the dead statements creating dead SSAs are not
even copied when tree-inline.c does its thing, such SSAs are actually
mapped to error_mark_node.

The code is heavily inspired by what removal does but (IIRC I hope) it
is also much simpler because IPA-SRA can only remove limited classes of
scalars.

Martin



>> If a removed parameter is passed to another function, the debugging
>> information still cannot describe its value there - see the xfailed
>> test in the testcase.  I sort of know what needs to be done but the
>> handling of debug information for removed parameters is LTO unfriendly
>> in general and so needs a bit more work.
>>
>> Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux.
>> Also LTO-bootstrapped and LTO-profiledbootstrapped on x86_64-linux.
>>
>> OK for trunk?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2021-03-29  Martin Jambor  
>>
>> PR ipa/93385
>> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
>> members remap_with_debug_expressions, m_dead_ssa_debug_equiv,
>> m_dead_stmt_debug_equiv and prepare_debug_expressions.  Added
>> parameter to mark_dead_statements.
>> * ipa-param-manipulation.c: Include tree-phinodes.h and cfgexpand.h.
>> (ipa_param_body_adjustments::mark_dead_statements): New parameter
>> debugstack, push into it all SSA names used in debug statements,
>> produce m_dead_ssa_debug_equiv mapping for the removed param.
>> (replace_with_mapped_expr): New function.
>> (ipa_param_body_adjustments::remap_with_debug_expressions): Likewise.
>> (ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
>> (ipa_param_body_adjustments::common_initialization): Gather and
>> procecc SSA which will be removed but are in debug statements. 
>> Simplify.
>> (ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
>> new members.
>> * tree-inline.c (remap_gimple_stmt): Create a debug bind when 
>> possible
>> when avoiding a copy of an unnecessary statement.  Remap removed SSA
>> names in existing debug statements.
>> (tree_function_versioning): Do not create DEBUG_EXPR_DECL for removed
>> parameters if we have already done so.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2021-03-29  Martin Jambor  
>>
>> PR ipa/93385
>> * gcc.dg/guality/ipa-sra-1.c: New test.


Re: [PATCH 0.5/2] ipa-sra: Restructure how cloning and call redirection communicate (PR 93385)

2021-05-10 Thread Martin Jambor
Hi,

On Mon, May 10 2021, Richard Biener wrote:
> I've tried to have a look at this patch but it does a lot of IPA specific
> refactoring(?), so the actual DCE bits are hard to find.  Is it possible
> to split the patch up or is it too entangled?
>

Yes:

I was asked by Richi to split my fix for PR 93385 for easier review
into IPA-SRA materialization refactoring and the actual DCE addition.
Fortunately it was mostly natural except for a temporary weird
condition in ipa_param_body_adjustments::modify_call_stmt.

This is the first part which basically replaces performed_splits in
clone_info and the code which generates it, keeps it up-to-date and
consumes it with new edge summaries which are much nicer.  It simply
contains 1) a mapping from the original argument indices to the actual
indices in the call statement as it is now, 2) information needed to
identify arguments representing pass-through IPA-SRA splits with which
have been added to the call arguments in place of an original
argument/reference and 3) a delta to the index where va_args may start
- so basically directly all the information that the consumer of
performed_splits had to compute and we also do not need the weird
dummy declarations.

The main disadvantage is that the information has to be created (and
kept up-to-date) for all call graph edges associated with the given
statement from all clones (including inline clones) of the clone where
splitting or removal happened first.  But all of this happens during
clone materialization so the only effect on WPA memory consumption is
the removal of a pointer from clone_info.

The statement modification code also has to know the statement from
the original function in order to be able to locate the edge summaries
which at this point are still keyed to these.  However, the code is
already quite heavily dependant on how things are structured in
tree-inline.c and in order to fix bugs like these it probably has to
be.

The subsequent patch needs this new information to be able to remove
arguments from calls during materialization and communicate this
information to the call redirection.

The patch this one is split off introduced a field of
ipa_param_body_adjustments called m_new_call_arg_modification_info which
was not needed for anything, I have removed it.

The patch is so far only lightly tested but I have verified that
together with the second one they make up pretty much exactly the
original one (modulo m_new_call_arg_modification_info) which I did
bootstrap this morning.  I will of course bootstrap it independently
too.

What do you think?

Martin


2021-05-10  Martin Jambor  

PR ipa/93385
* symtab-clones.h (clone_info): Removed member param_adjustments.
* ipa-param-manipulation.h: Adjust initial comment to reflect how we
deal with pass-through splits now.
(ipa_param_performed_split): Removed.
(ipa_param_adjustments::modify_call): Adjusted parameters.
(class ipa_param_body_adjustments): Adjusted parameters of
register_replacement, modify_gimple_stmt and modify_call_stmt.
(ipa_verify_edge_has_no_modifications): Declare.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Remove
performed_splits processing, pas only edge to padjs->modify_call,
check that call arguments were not modified if they should not have
been.
* cgraphclones.c (cgraph_node::create_clone): Do not copy performed
splits.
* ipa-param-manipulation.c (struct pass_through_split_map): New type.
(ipa_edge_modification_info): Likewise.
(ipa_edge_modification_sum): Likewise.
(ipa_edge_modifications): New edge summary.
(ipa_verify_edge_has_no_modifications): New function.
(transitive_split_p): Removed.
(transitive_split_map): Likewise.
(init_transitive_splits): Likewise.
(ipa_param_adjustments::modify_call): Adjusted to use the new edge
summary instead of performed_splits.
(ipa_param_body_adjustments::register_replacement): Drop dummy
parameter, set base_index of the created ipa_param_body_replacement.
(phi_arg_will_live_p): New function.
(ipa_param_body_adjustments::common_initialization): Do not create
IPA_SRA dummy decls.
(simple_tree_swap_info): Removed.
(remap_split_decl_to_dummy): Likewise.
(record_argument_state_1): New function.
(record_argument_state): Likewise.
(ipa_param_body_adjustments::modify_call_stmt): New parameter
orig_stmt.  Do not work with dummy decls, save necessary info about
changes to ipa_edge_modifications.
(ipa_param_body_adjustments::modify_gimple_stmt): New parameter
orig_stmt, pass it to modify_call_stmt.
(ipa_param_body_adjustments::modify_cfun_body): Adjust call to
modify_gimple_stmt.
* tree-inline.c (remap_gimple_stmt): Pass original statement to
modify_gimple_stmt.

Re: [PATCH 1.0/2] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2021-05-10 Thread Martin Jambor
Hi,

On Mon, May 10 2021, Richard Biener wrote:
> I've tried to have a look at this patch but it does a lot of IPA specific
> refactoring(?), so the actual DCE bits are hard to find.  Is it possible
> to split the patch up or is it too entangled?
>

Yes:

I was asked by Richi to split my fix for PR 93385 for easier review
into IPA-SRA materialization refactoring and the actual DCE addition.
This is the second part that actually contains the DCE of statements
that IPA-SRA should not leave behind because they can have problematic
side effects, even if they are useless, so that we do not depend on
tree-dce to remove them for correctness.

The patch fixes the problem by doing a def-use walk when materializing
clones, marking which statements should not be copied and which
SSA_NAMEs do not need to be computed because eventually they would be
DCEd.  We do this on the original function body and tree-inline simply
does not copy statements which are "dead."

The only complication is removing dead argument calls because that
needs to be communicated to callee redirection code using the
infrastructure introduced by the previous patch.

I added all testcases of the original patch to this one, although some
probably test behavior introduced in the previous patch.

The patch is so far only lightly tested but I have verified that
together with the second one they make up pretty much exactly the
original one (modulo m_new_call_arg_modification_info) which I did
bootstrap this morning.  I will of course bootstrap it independently
too.

What do you think?

Martin


gcc/ChangeLog:

2021-05-10  Martin Jambor  

PR ipa/93385
* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
members m_dead_stmts and m_dead_ssas.
* ipa-param-manipulation.c (phi_arg_will_live_p): New function.
(ipa_param_body_adjustments::mark_dead_statements): Likwise.
(ipa_param_body_adjustments::common_initialization): Call it on
all removed but not split parameters.
(ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
new mwmbers.
(ipa_param_body_adjustments::modify_call_stmt): Remove arguments that
are dead.
* tree-inline.c (remap_gimple_stmt): Do not copy dead statements, reset
dead debug statements.
(copy_phis_for_bb): Do not copy dead PHI nodes.

gcc/testsuite/ChangeLog:

2021-03-22  Martin Jambor  

PR ipa/93385
* gcc.dg/ipa/pr93385.c: New test.
* gcc.dg/ipa/ipa-sra-23.c: Likewise.
* gcc.dg/ipa/ipa-sra-24.c: Likewise.
* g++.dg/ipa/ipa-sra-4.C: Likewise.
---
 gcc/ipa-param-manipulation.c  | 142 +++---
 gcc/ipa-param-manipulation.h  |   6 ++
 gcc/testsuite/g++.dg/ipa/ipa-sra-4.C  |  37 +++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c |  24 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-24.c |  20 
 gcc/testsuite/gcc.dg/ipa/pr93385.c|  27 +
 gcc/tree-inline.c |  18 +++-
 7 files changed, 256 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/ipa-sra-4.C
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-24.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93385.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 424b8e5343f..d7d73542856 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -969,6 +969,97 @@ ipa_param_body_adjustments::carry_over_param (tree t)
   return new_parm;
 }
 
+/* Return true if BLOCKS_TO_COPY is NULL or if PHI has an argument ARG in
+   position that corresponds to an edge that is coming from a block that has
+   the corresponding bit set in BLOCKS_TO_COPY.  */
+
+static bool
+phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree arg)
+{
+  bool arg_will_survive = false;
+  if (!blocks_to_copy)
+arg_will_survive = true;
+  else
+for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+  if (gimple_phi_arg_def (phi, i) == arg
+ && bitmap_bit_p (blocks_to_copy,
+  gimple_phi_arg_edge (phi, i)->src->index))
+   {
+ arg_will_survive = true;
+ break;
+   }
+  return arg_will_survive;
+}
+
+/* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
+   any replacement or splitting.  REPL is the replacement VAR_SECL to base any
+   remaining uses of a removed parameter on.  */
+
+void
+ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
+{
+  /* Current IPA analyses which remove unused parameters never remove a
+ non-gimple register ones which have any use except as parameters in other
+ calls, so we can safely leve them as they are.  */
+  if (!is_gimple_reg (dead_param))
+return;
+  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
+  if (!parm_ddef || has_zero_uses (parm_ddef))
+return;
+
+  auto_vec stack;
+  m_dead_ssas.a

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Martin Liška

On 5/10/21 5:59 PM, Eric Botcazou wrote:

Do you know Eric where version.o needs to be added to be included in the
problematic command line?


You can presumably remove it from GNATLINK_OBJS & GNATMAKE_OBJS.  And it needs
to be added to GNAT1_C_OBJS instead of GNAT_ADA_OBJS in Make-lang.in.



Thank you Eric.

The following patch fixes that, ready for master?

Thanks,
Martin
>From 217785ed3df1d959498e7668b577606a9f5b51e2 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 10 May 2021 10:22:43 +0200
Subject: [PATCH] Fix missing version_string in Ada.

gcc/ada/ChangeLog:

	PR bootstrap/100506
	* Make-generated.in: Replace version.c with ada/version.c.
	* gcc-interface/Make-lang.in: Add version.o to GNAT1_C_OBJS.
	Add version.o to GNAT_ADA_OBJS and GNATBIND_OBJS.
	* gcc-interface/Makefile.in: Add version.o to TOOLS_LIBS.
	* gnatvsn.adb: Start using a new C symbol gnat_version_string.
	* version.c: New file.
---
 gcc/ada/Make-generated.in  |  2 +-
 gcc/ada/gcc-interface/Make-lang.in |  4 +++-
 gcc/ada/gcc-interface/Makefile.in  |  4 ++--
 gcc/ada/gnatvsn.adb|  2 +-
 gcc/ada/version.c  | 34 ++
 5 files changed, 41 insertions(+), 5 deletions(-)
 create mode 100644 gcc/ada/version.c

diff --git a/gcc/ada/Make-generated.in b/gcc/ada/Make-generated.in
index 237444c7a26..3a65da9b962 100644
--- a/gcc/ada/Make-generated.in
+++ b/gcc/ada/Make-generated.in
@@ -87,7 +87,7 @@ ada/stamp-snames : ada/snames.ads-tmpl ada/snames.adb-tmpl ada/snames.h-tmpl ada
 	touch ada/stamp-snames
 
 ada/sdefault.adb: ada/stamp-sdefault ; @true
-ada/stamp-sdefault : $(srcdir)/version.c Makefile
+ada/stamp-sdefault : $(srcdir)/ada/version.c Makefile
 	$(ECHO) "pragma Style_Checks (Off);" >tmp-sdefault.adb
 	$(ECHO) "with Osint; use Osint;" >>tmp-sdefault.adb
 	$(ECHO) "package body Sdefault is" >>tmp-sdefault.adb
diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in
index 969022e21a7..c8c02d3f795 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -247,7 +247,8 @@ GNAT1_C_OBJS = ada/adadecode.o ada/adaint.o ada/argv.o ada/cio.o \
  ada/cstreams.o ada/env.o ada/init.o ada/initialize.o ada/raise.o \
  ada/raise-gcc.o \
  ada/seh_init.o ada/targext.o ada/cuintp.o ada/decl.o ada/rtfinal.o \
- ada/rtinit.o ada/misc.o ada/utils.o ada/utils2.o ada/trans.o ada/targtyps.o
+ ada/rtinit.o ada/misc.o ada/utils.o ada/utils2.o ada/trans.o ada/targtyps.o \
+ ada/version.o
 
 # Object files from Ada sources that are used by gnat1
 GNAT_ADA_OBJS =	\
@@ -648,6 +649,7 @@ GNATBIND_OBJS = \
  ada/uintp.o  \
  ada/uname.o  \
  ada/urealp.o \
+ ada/version.o\
  ada/widechar.o
 
 # Language-independent object files.
diff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in
index 333e2035455..2598cea2b19 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -250,7 +250,7 @@ LIBS = $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) $(LIBIBERTY) $(SYSLIBS)
 LIBDEPS = $(LIBINTL_DEP) $(LIBICONV_DEP) $(LIBBACKTRACE) $(LIBIBERTY)
 # Default is no TGT_LIB; one might be passed down or something
 TGT_LIB =
-TOOLS_LIBS = ../link.o ../targext.o ../../ggc-none.o ../../libcommon-target.a \
+TOOLS_LIBS = ../version.o ../link.o ../targext.o ../../ggc-none.o ../../libcommon-target.a \
   ../../libcommon.a ../../../libcpp/libcpp.a $(LIBGNAT) $(LIBINTL) $(LIBICONV) \
   ../$(LIBBACKTRACE) ../$(LIBIBERTY) $(SYSLIBS) $(TGT_LIB)
 
@@ -302,7 +302,7 @@ ADA_INCLUDES_FOR_SUBDIR = -I. -I$(fsrcdir)/ada
 	$(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(OUTPUT_OPTION)
 
 # how to regenerate this file
-Makefile: ../config.status $(srcdir)/ada/gcc-interface/Makefile.in $(srcdir)/ada/Makefile.in $(srcdir)/version.c
+Makefile: ../config.status $(srcdir)/ada/gcc-interface/Makefile.in $(srcdir)/ada/Makefile.in $(srcdir)/ada/version.c
 	cd ..; \
 	LANGUAGES="$(CONFIG_LANGUAGES)" \
 	CONFIG_HEADERS= \
diff --git a/gcc/ada/gnatvsn.adb b/gcc/ada/gnatvsn.adb
index 578a1aa9743..d6d2a5a3ace 100644
--- a/gcc/ada/gnatvsn.adb
+++ b/gcc/ada/gnatvsn.adb
@@ -53,7 +53,7 @@ package body Gnatvsn is
--  version.c using the zero-based convention of the C language.
--  The size is not the real one, which does not matter since we will
--  check for the nul character in Gnat_Version_String.
-   pragma Import (C, Version_String, "version_string");
+   pragma Import (C, Version_String, "gnat_version_string");
 
-
-- Gnat_Version_String --
diff --git a/gcc/ada/version.c b/gcc/ada/version.c
new file mode 100644
index 000..e6cc6124040
--- /dev/null
+++ b/gcc/ada/version.c
@@ -0,0 +1,34 @@
+/
+ *  *
+ * GNAT COMPILER COMPONENTS *
+ *   

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-10 Thread Eric Botcazou
> The following patch fixes that, ready for master?

Sure, thanks!

-- 
Eric Botcazou




Re: [PATCH 2/2 v2] rs6000: Guard density_test only for vector version

2021-05-10 Thread Segher Boessenkool
Hi!

On Sat, May 08, 2021 at 04:12:18PM +0800, Kewen.Lin wrote:
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5234,6 +5234,8 @@ typedef struct _rs6000_cost_data
>/* For each vectorized loop, this var holds TRUE iff a non-memory vector
>   instruction is needed by the vectorization.  */
>bool vect_nonmem;
> +  /* Indicates costing for the scalar version of a loop or block.  */
> +  bool costing_for_scalar;
>  } rs6000_cost_data;

"... this is costing for ..."?

> @@ -5255,6 +5257,12 @@ rs6000_density_test (rs6000_cost_data *data)
>int vec_cost = data->cost[vect_body], not_vec_cost = 0;
>int i, density_pct;
>  
> +  /* This density test only cares about the cost of vector version of the
> + loop, early return if it's costing for the scalar version (namely
> + computing single scalar iteration cost).  */
> +  if (data->costing_for_scalar)
> +return;

"..., so immediately return if we are passed costing for ..."?

The patch is okay for trunk with those or similar changes.  Thanks!


Segher


[committed] libstdc++: Implement proposed resolution to LWG 3548

2021-05-10 Thread Jonathan Wakely via Gcc-patches
This has been tentatively approved by LWG. The deleter from a unique_ptr
can be moved into the shared_ptr (at least, since LWG 2802). This uses
std::forward<_Del>(__r.get_deleter()) not std::move(__r.get_deleter())
because we don't want to convert the deleter to an rvalue when _Del is
an lvalue reference type.

This also adds a missing is_move_constructible_v constraint to the
shared_ptr(unique_ptr&&) constructor, which is inherited from the
shared_ptr(Y*, D) constructor due to the use of "equivalent to" in the
specified effects.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_count(unique_ptr&&)):
Initialize a non-reference deleter from an rvalue, as per LWG
3548.
(__shared_ptr::_UniqCompatible): Add missing constraint.
* testsuite/20_util/shared_ptr/cons/lwg3548.cc: New test.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc: Check
constraints.

Tested powerpc64le-linux. Committed to trunk.

commit 5edc0c15f1667cc2a5deb664b25c007b35d259f6
Author: Jonathan Wakely 
Date:   Mon May 10 20:46:38 2021

libstdc++: Implement proposed resolution to LWG 3548

This has been tentatively approved by LWG. The deleter from a unique_ptr
can be moved into the shared_ptr (at least, since LWG 2802). This uses
std::forward<_Del>(__r.get_deleter()) not std::move(__r.get_deleter())
because we don't want to convert the deleter to an rvalue when _Del is
an lvalue reference type.

This also adds a missing is_move_constructible_v constraint to the
shared_ptr(unique_ptr&&) constructor, which is inherited from the
shared_ptr(Y*, D) constructor due to the use of "equivalent to" in the
specified effects.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_count(unique_ptr&&)):
Initialize a non-reference deleter from an rvalue, as per LWG
3548.
(__shared_ptr::_UniqCompatible): Add missing constraint.
* testsuite/20_util/shared_ptr/cons/lwg3548.cc: New test.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc: Check
constraints.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index 71099afbf7a..eb9ad23ba1e 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -684,8 +684,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  using _Alloc_traits = allocator_traits<_Alloc>;
  _Alloc __a;
  _Sp_cd_type* __mem = _Alloc_traits::allocate(__a, 1);
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 3548. shared_ptr construction from unique_ptr should move
+ // (not copy) the deleter
  _Alloc_traits::construct(__a, __mem, __r.release(),
-  __r.get_deleter());  // non-throwing
+  std::forward<_Del>(__r.get_deleter()));
  _M_pi = __mem;
}
 
@@ -1070,9 +1073,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Constraint for construction from unique_ptr:
   template::pointer>
-   using _UniqCompatible = typename enable_if<__and_<
- __sp_compatible_with<_Yp*, _Tp*>, is_convertible<_Ptr, element_type*>
- >::value, _Res>::type;
+   using _UniqCompatible = __enable_if_t<__and_<
+ __sp_compatible_with<_Yp*, _Tp*>,
+ is_convertible<_Ptr, element_type*>,
+ is_move_constructible<_Del>
+ >::value, _Res>;
 
   // Constraint for assignment from unique_ptr:
   template
diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/lwg3548.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/lwg3548.cc
new file mode 100644
index 000..d6ec7b1d057
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/lwg3548.cc
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+// LWG 3548
+// shared_ptr construction from unique_ptr should move (not copy) the deleter
+
+struct D
+{
+  D() { }
+  D(D&&) { }
+  void operator()(int* p) const { delete p; }
+};
+
+std::unique_ptr u;
+std::shared_ptr s1(std::move(u));
diff --git 
a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc
index f4cf3ddda2e..d7ca51a4aa6 100644
--- a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc
@@ -58,10 +58,25 @@ test02()
   VERIFY( D::count == 0 ); // LWG 2415
 }
 
+void
+test03()
+{
+  struct D
+  {
+D() = default;
+D(const D&) = delete; // not copyable or movable
+void operator()(int* p) const { delete p; }
+  };
+
+  using namespace std;
+  static_assert( ! is_constructible, unique_ptr>(),
+"Constraints: is_move_constructible_v is true" );
+}
+
 int
 main()
 {
   test01();
   test02();
-  return 0;
+  test03();
 }


  1   2   >