RE: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test for excess errors) on Linux/x86_64

2023-11-27 Thread Jiang, Haochen
> -Original Message-
> From: Sebastian Huber 
> Sent: Monday, November 27, 2023 3:58 PM
> To: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test
> for excess errors) on Linux/x86_64
> 
> On 26.11.23 12:18, haochen.jiang wrote:
> > On Linux/x86_64,
> >
> > a350a74d6113e3a84943266eb691275951c109d9 is the first bad commit
> > commit a350a74d6113e3a84943266eb691275951c109d9
> > Author: Sebastian Huber
> > Date:   Sat Oct 21 15:52:15 2023 +0200
> >
> >  gcov: Add gen_counter_update()
> >
> > caused
> >
> > FAIL: gcc.dg/gomp/pr27573.c (internal compiler error: verify_gimple
> > failed)
> > FAIL: gcc.dg/gomp/pr27573.c (test for excess errors)
> > FAIL: gcc.dg/profile-update-warning.c (internal compiler error:
> > verify_gimple failed)
> > FAIL: gcc.dg/profile-update-warning.c (test for excess errors)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (internal compiler error: 
> > verify_gimple
> failed)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (test for excess errors)
> 
> The errors were fixed by:
> 
> commit 41aacdea55c5d795a7aa195357d966645845d00e
> Author: Sebastian Huber 
> Date:   Mon Nov 20 15:26:38 2023 +0100
> 
>  gcov: Fix integer types in gen_counter_update()
> 
> commit a034cca0a222598cb42302c059262b654685ff19
> Author: Sebastian Huber 
> Date:   Mon Nov 20 14:48:03 2023 +0100
> 
>  gcov: Use unshare_expr() in gen_counter_update()
> 

Hi Sebastian,

Thanks for your fix! This mail was automatically sent and delayed due to
the previous bootstrap fail on the trunk. If everything got fixed, that is
ok.

Thx,
Haochen

> --
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: [r14-5666 Regression] FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read tp_first_run: 2" 1 on Linux/x86_64

2023-11-27 Thread Sebastian Huber

On 26.11.23 12:18, haochen.jiang wrote:

On Linux/x86_64,

41aacdea55c5d795a7aa195357d966645845d00e is the first bad commit
commit 41aacdea55c5d795a7aa195357d966645845d00e
Author: Sebastian Huber
Date:   Mon Nov 20 15:26:38 2023 +0100

 gcov: Fix integer types in gen_counter_update()

caused

FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read 
tp_first_run: 0" 1
FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read 
tp_first_run: 2" 1


Please have a look at:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638104.html

--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] RISC-V: Remove incorrect function gate gather_scatter_valid_offset_mode_p

2023-11-27 Thread Robin Dapp
On 11/25/23 09:24, Juzhe-Zhong wrote:
> Come back to review the codes of gather/scatter, notice 
> gather_scatter_valid_offset_mode_p looks odd.
> gather_scatter_valid_offset_mode_p is supposed to block vluxei64/vsuxei64 in 
> RV32 system.
> However, it failed to do that since it is passing data_mode instead of index 
> mode:
> 
> riscv_vector::gather_scatter_valid_offset_mode_p (mode)
> It should be RATIO2I instead of RATIO2.
> So we have this following iterators which already can block the this 
> situation:
> 
> (define_mode_iterator RATIO8I [
>   RVVM1QI
>   RVVM2HI
>   RVVM4SI
>   (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
> ])
> 

Ah, good you noticed this.  I had it on my TODO list to check
why we didn't handle several cases properly.  In my patch I
already figured we don't need to "double exclude" the patterns
(in valid_offset_mode_p as well as in the iterator) but didn't
want to change more than necessary.  It looks more reasonable
with your change now.

LGTM.

Regards
 Robin


Re: [PATCH] testsuite, x86: Handle a broken assembler.

2023-11-27 Thread FX Coudert
Hi,

I’d like to ping that patch from Iain Sandoe. It would clear up a number of 
failures in the darwin testsuite.

Thanks,
FX



> --- 8< ---
> 
> Earlier assembler support for complex fp16 on x86_64 Darin is broken. This
> adds an additional test to the existing target-supports that fails for the
> broken assemblers but works for the newer, fixed, ones.
> 
> gcc/testsuite/ChangeLog:
> 
> * lib/target-supports.exp: Test an asm line that fails on broken
> Darwin assembler versions.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/testsuite/lib/target-supports.exp | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index f0b692a2e19..61ab063afbe 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -10062,6 +10062,7 @@ proc check_effective_target_avx512fp16 { } {
> void foo (void)
> {
> asm volatile ("vmovw %edi, %xmm0");
> + asm volatile ("vfcmulcph %xmm1, %xmm2, %xmm3{%k1}");
> }
> } "-O2 -mavx512fp16" ]
> }
> -- 
> 2.39.2 (Apple Git-143)


Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

2023-11-27 Thread Jakub Jelinek
On Mon, Nov 27, 2023 at 07:55:52AM +, Tamar Christina wrote:
> > For POPCOUNT I've introduced recently a way to provide custom expand_*
> > function and decide there what optimizations to use, even when it otherwise
> > is an integral unary optab ifn.
> > 
> 
> Oh that sounds interesting, do you have a commit for me to look at? I couldn't
> Spot anything obvious in the history.

https://gcc.gnu.org/r14-5613

Jakub



RE: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

2023-11-27 Thread Tamar Christina
> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, November 27, 2023 8:13 AM
> To: Tamar Christina 
> Cc: Xi Ruoyao ; Segher Boessenkool
> ; David Edelsohn ; gcc-
> patc...@gcc.gnu.org; Andrew Pinski 
> Subject: Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in
> the backend [PR112606]
> 
> On Mon, Nov 27, 2023 at 07:55:52AM +, Tamar Christina wrote:
> > > For POPCOUNT I've introduced recently a way to provide custom
> > > expand_* function and decide there what optimizations to use, even
> > > when it otherwise is an integral unary optab ifn.
> > >
> >
> > Oh that sounds interesting, do you have a commit for me to look at? I
> > couldn't Spot anything obvious in the history.
> 
> https://gcc.gnu.org/r14-5613

Oh, that's nice! If that's the case a simpler fix could be to let COPYSIGN 
become
one of these as well, and then just have PPC do a FAIL on the abs and neg cases.

Expand_copysign already does the fneg (fabs ()) rewriting if the target rejects 
the
optab through expand_copysign_absneg

That would also fix the i386 and Arm assembly scan failures and te phi-opts case
when the IFN isn't available.. I can do that if you prefer? Since those are on 
my list
to fix anyway.

Thanks,
Tamar

> 
>   Jakub



[PATCH] Add C intrinsics for scalar crypto extension

2023-11-27 Thread Liao Shihua
This patch add C intrinsics for scalar crypto extension.
Because of riscv-c-api 
(https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44/files) includes 
zbkb/zbkc/zbkx's
intrinsics in bit manipulation extension, this patch only support zkn*/zks*'s 
intrinsics.

gcc/ChangeLog:

* config.gcc: Add riscv_crypto.h
* config/riscv/riscv_crypto.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zknd32.c: Use intrinsics instead of builtins.
* gcc.target/riscv/zknd64.c: Likewise.
* gcc.target/riscv/zkne32.c: Likewise.
* gcc.target/riscv/zkne64.c: Likewise.
* gcc.target/riscv/zknh-sha256-32.c: Likewise.
* gcc.target/riscv/zknh-sha256-64.c: Likewise.
* gcc.target/riscv/zknh-sha512-32.c: Likewise.
* gcc.target/riscv/zknh-sha512-64.c: Likewise.
* gcc.target/riscv/zksed32.c: Likewise.
* gcc.target/riscv/zksed64.c: Likewise.
* gcc.target/riscv/zksh32.c: Likewise.
* gcc.target/riscv/zksh64.c: Likewise.

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv_crypto.h   | 219 ++
 gcc/testsuite/gcc.target/riscv/zknd32.c   |   6 +-
 gcc/testsuite/gcc.target/riscv/zknd64.c   |  12 +-
 gcc/testsuite/gcc.target/riscv/zkne32.c   |   6 +-
 gcc/testsuite/gcc.target/riscv/zkne64.c   |  10 +-
 .../gcc.target/riscv/zknh-sha256-32.c |  22 +-
 .../gcc.target/riscv/zknh-sha256-64.c |  10 +-
 .../gcc.target/riscv/zknh-sha512-32.c |  14 +-
 .../gcc.target/riscv/zknh-sha512-64.c |  10 +-
 gcc/testsuite/gcc.target/riscv/zksed32.c  |   6 +-
 gcc/testsuite/gcc.target/riscv/zksed64.c  |   6 +-
 gcc/testsuite/gcc.target/riscv/zksh32.c   |   6 +-
 gcc/testsuite/gcc.target/riscv/zksh64.c   |   6 +-
 14 files changed, 288 insertions(+), 47 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_crypto.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b88591b6fd8..d67fe8b6a6f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -548,7 +548,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_crypto.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/riscv_crypto.h b/gcc/config/riscv/riscv_crypto.h
new file mode 100644
index 000..149c1132e10
--- /dev/null
+++ b/gcc/config/riscv/riscv_crypto.h
@@ -0,0 +1,219 @@
+/* RISC-V 'K' Extension intrinsics include file.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef __RISCV_CRYPTO_H
+#define __RISCV_CRYPTO_H
+
+#include 
+
+#if defined (__cplusplus)
+extern "C" {
+#endif
+
+#if defined(__riscv_zknd)
+#if __riscv_xlen == 32
+#define __riscv_aes32dsi(x, y, bs) __builtin_riscv_aes32dsi(x, y, bs)
+#define __riscv_aes32dsmi(x, y, bs) __builtin_riscv_aes32dsmi(x, y, bs)
+#endif
+
+#if __riscv_xlen == 64
+static __inline__ uint64_t __attribute__ ((__always_inline__, __nodebug__))
+__riscv_aes64ds (uint64_t __x, uint64_t __y)
+{
+  return __builtin_riscv_aes64ds (__x, __y);
+}
+
+static __inline__ uint64_t __attribute__ ((__always_inline__, __nodebug__))
+__riscv_aes64dsm (uint64_t __x, uint64_t __y)
+{
+  return __builtin_riscv_aes64dsm (__x, __y);
+}
+
+static __inline__ uint64_t __attribute__ ((__always_inline__, __nodebug__))
+__riscv_aes64im (uint64_t __x)
+{
+  return __builtin_riscv_aes64im (__x);
+}
+#endif
+#endif // defined (__riscv_zknd)
+
+#if defined(__riscv_zkne)
+#if __riscv_xlen == 32
+#define __riscv_aes32esi(x, y, bs) __builtin_riscv_aes32esi(x, y, bs)
+#define __riscv_aes32esmi(x, y, bs) __builtin_riscv_aes32esmi(x, y, bs)
+#endif
+
+#if __riscv_xlen == 64
+sta

Re: [PATCH] testsuite, x86: Handle a broken assembler.

2023-11-27 Thread Richard Biener
On Mon, Nov 27, 2023 at 9:11 AM FX Coudert  wrote:
>
> Hi,
>
> I’d like to ping that patch from Iain Sandoe. It would clear up a number of 
> failures in the darwin testsuite.

OK.

> Thanks,
> FX
>
>
>
> > --- 8< ---
> >
> > Earlier assembler support for complex fp16 on x86_64 Darin is broken. This
> > adds an additional test to the existing target-supports that fails for the
> > broken assemblers but works for the newer, fixed, ones.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * lib/target-supports.exp: Test an asm line that fails on broken
> > Darwin assembler versions.
> >
> > Signed-off-by: Iain Sandoe 
> > ---
> > gcc/testsuite/lib/target-supports.exp | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp 
> > b/gcc/testsuite/lib/target-supports.exp
> > index f0b692a2e19..61ab063afbe 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -10062,6 +10062,7 @@ proc check_effective_target_avx512fp16 { } {
> > void foo (void)
> > {
> > asm volatile ("vmovw %edi, %xmm0");
> > + asm volatile ("vfcmulcph %xmm1, %xmm2, %xmm3{%k1}");
> > }
> > } "-O2 -mavx512fp16" ]
> > }
> > --
> > 2.39.2 (Apple Git-143)


Re: [PATCH] s390: Fix builtins floating-point convert to/from fixed

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Tue, Nov 14, 2023 at 04:19:59PM +0100, Stefan Schulze Frielinghaus wrote:
> Remove flags for non-existing operands 2 and 3.
> 
> Bootstrapped on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtins.def
>   (s390_vcefb,s390_vcdgb,s390_vcelfb,s390_vcdlgb,s390_vcfeb,s390_vcgdb,
>   s390_vclfeb,s390_vclgdb): Remove flags for non-existing operands
>   2 and 3.
> ---
>  gcc/config/s390/s390-builtins.def | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index 964d86c74a0..5bcf0d16ba3 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -2840,10 +2840,10 @@ OB_DEF (s390_vec_double,
> s390_vec_double_s64,s390_vec_double_u64,
>  OB_DEF_VAR (s390_vec_double_s64,s390_vcdgb, 0,   
>0,  BT_OV_V2DF_V2DI)
>  OB_DEF_VAR (s390_vec_double_u64,s390_vcdlgb,0,   
>0,  BT_OV_V2DF_UV2DI)
>  
> -B_DEF  (s390_vcefb, floatv4siv4sf2, 0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_V4SI)
> -B_DEF  (s390_vcdgb, floatv2div2df2, 0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DI)
> -B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_UV4SI)
> -B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
> +B_DEF  (s390_vcefb, floatv4siv4sf2, 0,   
>B_VXE2, 0,  BT_FN_V4SF_V4SI)
> +B_DEF  (s390_vcdgb, floatv2div2df2, 0,   
>B_VX,   0,  BT_FN_V2DF_V2DI)
> +B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,   
>B_VXE2, 0,  BT_FN_V4SF_UV4SI)
> +B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,   
>B_VX,   0,  BT_FN_V2DF_UV2DI)
>  
>  OB_DEF (s390_vec_signed,
> s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
>  OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2,  
>0,  BT_OV_V4SI_V4SF)
> @@ -2853,10 +2853,10 @@ OB_DEF (s390_vec_unsigned,  
> s390_vec_unsigned_flt,s390_vec_unsigned_
>  OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2,  
>0,  BT_OV_UV4SI_V4SF)
>  OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,   
>0,  BT_OV_UV2DI_V2DF)
>  
> -B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
> -B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
> -B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,
>B_VXE2, O2_U4 | O3_U3,  BT_FN_UV4SI_V4SF)
> -B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,
>B_VX,   O2_U4 | O3_U3,  BT_FN_UV2DI_V2DF)
> +B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,   
>B_VXE2, 0,  BT_FN_V4SI_V4SF)
> +B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,   
>B_VX,   0,  BT_FN_V2DI_V2DF)
> +B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,
>B_VXE2, 0,  BT_FN_UV4SI_V4SF)
> +B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,
>B_VX,   0,  BT_FN_UV2DI_V2DF)
>  
>  B_DEF  (s390_vfisb, vec_fpintv4sf,  0,   
>B_VXE,  O2_U4 | O3_U3,  BT_FN_V4SF_V4SF_UCHAR_UCHAR)
>  B_DEF  (s390_vfidb, vec_fpintv2df,  0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
> -- 
> 2.41.0
> 


Re: [PATCH] s390: Fix constraint for insn *cmphi_ccu

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Wed, Oct 25, 2023 at 11:27:33AM +0200, Stefan Schulze Frielinghaus wrote:
> Currently for an unsigned 16-bit comparison between memory and an
> immediate where the high bit is set, a clc is emitted.  This is because
> the constant is created for mode HI and therefore sign extended.  This
> means constraint D does not hold anymore.  Since the mode already
> restricts the immediate to 16 bit, it is enough to make use of
> constraint n and chop of the high bits in the output template.
> 
> Bootstrapped and regtested on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
>   use of constraint n instead of D and chop of high bits in the
>   output template.
> ---
>  gcc/config/s390/s390.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 3f29ba21442..777a20f8e77 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -1355,13 +1355,13 @@
>  (define_insn "*cmphi_ccu"
>[(set (reg CC_REGNUM)
>  (compare (match_operand:HI 0 "nonimmediate_operand" "d,d,Q,Q,BQ")
> - (match_operand:HI 1 "general_operand"  "Q,S,D,BQ,Q")))]
> + (match_operand:HI 1 "general_operand"  "Q,S,n,BQ,Q")))]
>"s390_match_ccmode (insn, CCUmode)
> && !register_operand (operands[1], HImode)"
>"@
> clm\t%0,3,%S1
> clmy\t%0,3,%S1
> -   clhhsi\t%0,%1
> +   clhhsi\t%0,%x1
> #
> #"
>[(set_attr "op_type" "RS,RSY,SIL,SS,SS")
> -- 
> 2.41.0
> 


Re: [PATCH] s390: Streamline NNPA builtins with their LLVM counterparts

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Thu, Nov 16, 2023 at 01:07:30PM +0100, Stefan Schulze Frielinghaus wrote:
> For the opaque NNP-data type prefer unsigned over signed integer types.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def: Add/remove types.
>   * config/s390/s390-builtins.def
>   (s390_vclfnhs,s390_vclfnls,s390_vcrnfs,s390_vcfn,s390_vcnf):
>   Replace type V8HI with UV8HI.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/zvector/vec-nnpa-fp16-convert.c: Replace V8HI
>   types with UV8HI.
>   * gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c: Dito.
>   * gcc.target/s390/zvector/vec_convert_from_fp16.c: Dito.
>   * gcc.target/s390/zvector/vec_convert_to_fp16.c: Dito.
>   * gcc.target/s390/zvector/vec_extend_to_fp32_hi.c: Dito.
>   * gcc.target/s390/zvector/vec_extend_to_fp32_lo.c: Dito.
>   * gcc.target/s390/zvector/vec_round_from_fp32.c: Dito.
> ---
>  gcc/config/s390/s390-builtin-types.def |  5 ++---
>  gcc/config/s390/s390-builtins.def  | 10 +-
>  .../gcc.target/s390/zvector/vec-nnpa-fp16-convert.c|  6 +++---
>  .../gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c  |  2 +-
>  .../gcc.target/s390/zvector/vec_convert_from_fp16.c|  4 ++--
>  .../gcc.target/s390/zvector/vec_convert_to_fp16.c  |  4 ++--
>  .../gcc.target/s390/zvector/vec_extend_to_fp32_hi.c|  2 +-
>  .../gcc.target/s390/zvector/vec_extend_to_fp32_lo.c|  2 +-
>  .../gcc.target/s390/zvector/vec_round_from_fp32.c  |  2 +-
>  9 files changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtin-types.def 
> b/gcc/config/s390/s390-builtin-types.def
> index 3d8b30cdcc8..0bf759bd77a 100644
> --- a/gcc/config/s390/s390-builtin-types.def
> +++ b/gcc/config/s390/s390-builtin-types.def
> @@ -265,9 +265,9 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
> BT_V2DF)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
> +DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
> -DEF_FN_TYPE_2 (BT_FN_V4SF_V8HI_UINT, BT_V4SF, BT_V8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_INT_VOIDCONSTPTR, BT_V4SI, BT_INT, BT_VOIDCONSTPTR)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_UV4SI_UV4SI, BT_V4SI, BT_UV4SI, BT_UV4SI)
> @@ -279,7 +279,6 @@ DEF_FN_TYPE_2 (BT_FN_V8HI_BV8HI_V8HI, BT_V8HI, BT_BV8HI, 
> BT_V8HI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_UV8HI_UV8HI, BT_V8HI, BT_UV8HI, BT_UV8HI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V16QI_V16QI, BT_V8HI, BT_V16QI, BT_V16QI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V4SI_V4SI, BT_V8HI, BT_V4SI, BT_V4SI)
> -DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_UINT, BT_V8HI, BT_V8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
>  DEF_FN_TYPE_2 (BT_FN_VOID_UINT64PTR_UINT64, BT_VOID, BT_UINT64PTR, BT_UINT64)
>  DEF_FN_TYPE_2 (BT_FN_VOID_V2DF_FLTPTR, BT_VOID, BT_V2DF, BT_FLTPTR)
> @@ -317,6 +316,7 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
> BT_UV8HI, BT_USHORT, BT_I
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI)
> +DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
> BT_UV16QI, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_V16QI)
> @@ -347,7 +347,6 @@ DEF_FN_TYPE_3 (BT_FN_V4SI_V4SI_V4SI_V4SI, BT_V4SI, 
> BT_V4SI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_3 (BT_FN_V4SI_V8HI_V8HI_V4SI, BT_V4SI, BT_V8HI, BT_V8HI, BT_V4SI)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_UV8HI_UV8HI_INTPTR, BT_V8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V16QI_V16QI_V8HI, BT_V8HI, BT_V16QI, BT_V16QI, 
> BT_V8HI)
> -DEF_FN_TYPE_3 (BT_FN_V8HI_V4SF_V4SF_UINT, BT_V8HI, BT_V4SF, BT_V4SF, BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V4SI_V4SI_INTPTR, BT_V8HI, BT_V4SI, BT_V4SI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_INTPTR, BT_V8HI, BT_V8HI, BT_V8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index 964d86c74a0..f331eba100a 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -3037,10 +3037,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,  
>   0,
>  
>  /* arch 14 builtins */
>  
> -B_DEF  (s390_vclfn

Re: [PATCH] s390: Fix builtins floating-point convert to/from fixed

2023-11-27 Thread Andreas Krebbel
Ok, thanks!

Andreas

On 11/27/23 10:11, Stefan Schulze Frielinghaus wrote:
> Ping.
> 
> On Tue, Nov 14, 2023 at 04:19:59PM +0100, Stefan Schulze Frielinghaus wrote:
>> Remove flags for non-existing operands 2 and 3.
>>
>> Bootstrapped on s390.  Ok for mainline?
>>
>> gcc/ChangeLog:
>>
>>  * config/s390/s390-builtins.def
>>  (s390_vcefb,s390_vcdgb,s390_vcelfb,s390_vcdlgb,s390_vcfeb,s390_vcgdb,
>>  s390_vclfeb,s390_vclgdb): Remove flags for non-existing operands
>>  2 and 3.
>> ---
>>  gcc/config/s390/s390-builtins.def | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/config/s390/s390-builtins.def 
>> b/gcc/config/s390/s390-builtins.def
>> index 964d86c74a0..5bcf0d16ba3 100644
>> --- a/gcc/config/s390/s390-builtins.def
>> +++ b/gcc/config/s390/s390-builtins.def
>> @@ -2840,10 +2840,10 @@ OB_DEF (s390_vec_double,
>> s390_vec_double_s64,s390_vec_double_u64,
>>  OB_DEF_VAR (s390_vec_double_s64,s390_vcdgb, 0,  
>> 0,  BT_OV_V2DF_V2DI)
>>  OB_DEF_VAR (s390_vec_double_u64,s390_vcdlgb,0,  
>> 0,  BT_OV_V2DF_UV2DI)
>>  
>> -B_DEF  (s390_vcefb, floatv4siv4sf2, 0,  
>> B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_V4SI)
>> -B_DEF  (s390_vcdgb, floatv2div2df2, 0,  
>> B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DI)
>> -B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,  
>> B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_UV4SI)
>> -B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,  
>> B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
>> +B_DEF  (s390_vcefb, floatv4siv4sf2, 0,  
>> B_VXE2, 0,  BT_FN_V4SF_V4SI)
>> +B_DEF  (s390_vcdgb, floatv2div2df2, 0,  
>> B_VX,   0,  BT_FN_V2DF_V2DI)
>> +B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,  
>> B_VXE2, 0,  BT_FN_V4SF_UV4SI)
>> +B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,  
>> B_VX,   0,  BT_FN_V2DF_UV2DI)
>>  
>>  OB_DEF (s390_vec_signed,
>> s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   
>> BT_FN_OV4SI_OV4SI)
>>  OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2, 
>> 0,  BT_OV_V4SI_V4SF)
>> @@ -2853,10 +2853,10 @@ OB_DEF (s390_vec_unsigned,  
>> s390_vec_unsigned_flt,s390_vec_unsigned_
>>  OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2, 
>> 0,  BT_OV_UV4SI_V4SF)
>>  OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,  
>> 0,  BT_OV_UV2DI_V2DF)
>>  
>> -B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,  
>> B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
>> -B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,  
>> B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
>> -B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,   
>> B_VXE2, O2_U4 | O3_U3,  BT_FN_UV4SI_V4SF)
>> -B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,   
>> B_VX,   O2_U4 | O3_U3,  BT_FN_UV2DI_V2DF)
>> +B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,  
>> B_VXE2, 0,  BT_FN_V4SI_V4SF)
>> +B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,  
>> B_VX,   0,  BT_FN_V2DI_V2DF)
>> +B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,   
>> B_VXE2, 0,  BT_FN_UV4SI_V4SF)
>> +B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,   
>> B_VX,   0,  BT_FN_UV2DI_V2DF)
>>  
>>  B_DEF  (s390_vfisb, vec_fpintv4sf,  0,  
>> B_VXE,  O2_U4 | O3_U3,  BT_FN_V4SF_V4SF_UCHAR_UCHAR)
>>  B_DEF  (s390_vfidb, vec_fpintv2df,  0,  
>> B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
>> -- 
>> 2.41.0
>>



Re: [PATCH] s390: Fix constraint for insn *cmphi_ccu

2023-11-27 Thread Andreas Krebbel
Ok, thanks!

Andreas

On 11/27/23 10:12, Stefan Schulze Frielinghaus wrote:
> Ping.
> 
> On Wed, Oct 25, 2023 at 11:27:33AM +0200, Stefan Schulze Frielinghaus wrote:
>> Currently for an unsigned 16-bit comparison between memory and an
>> immediate where the high bit is set, a clc is emitted.  This is because
>> the constant is created for mode HI and therefore sign extended.  This
>> means constraint D does not hold anymore.  Since the mode already
>> restricts the immediate to 16 bit, it is enough to make use of
>> constraint n and chop of the high bits in the output template.
>>
>> Bootstrapped and regtested on s390.  Ok for mainline?
>>
>> gcc/ChangeLog:
>>
>>  * config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
>>  use of constraint n instead of D and chop of high bits in the
>>  output template.
>> ---
>>  gcc/config/s390/s390.md | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
>> index 3f29ba21442..777a20f8e77 100644
>> --- a/gcc/config/s390/s390.md
>> +++ b/gcc/config/s390/s390.md
>> @@ -1355,13 +1355,13 @@
>>  (define_insn "*cmphi_ccu"
>>[(set (reg CC_REGNUM)
>>  (compare (match_operand:HI 0 "nonimmediate_operand" "d,d,Q,Q,BQ")
>> - (match_operand:HI 1 "general_operand"  "Q,S,D,BQ,Q")))]
>> + (match_operand:HI 1 "general_operand"  "Q,S,n,BQ,Q")))]
>>"s390_match_ccmode (insn, CCUmode)
>> && !register_operand (operands[1], HImode)"
>>"@
>> clm\t%0,3,%S1
>> clmy\t%0,3,%S1
>> -   clhhsi\t%0,%1
>> +   clhhsi\t%0,%x1
>> #
>> #"
>>[(set_attr "op_type" "RS,RSY,SIL,SS,SS")
>> -- 
>> 2.41.0
>>



Re: [PATCH] s390: Streamline NNPA builtins with their LLVM counterparts

2023-11-27 Thread Andreas Krebbel
Ok, thanks!

Andreas

On 11/27/23 10:12, Stefan Schulze Frielinghaus wrote:
> Ping.
> 
> On Thu, Nov 16, 2023 at 01:07:30PM +0100, Stefan Schulze Frielinghaus wrote:
>> For the opaque NNP-data type prefer unsigned over signed integer types.
>>
>> gcc/ChangeLog:
>>
>>  * config/s390/s390-builtin-types.def: Add/remove types.
>>  * config/s390/s390-builtins.def
>>  (s390_vclfnhs,s390_vclfnls,s390_vcrnfs,s390_vcfn,s390_vcnf):
>>  Replace type V8HI with UV8HI.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/s390/zvector/vec-nnpa-fp16-convert.c: Replace V8HI
>>  types with UV8HI.
>>  * gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c: Dito.
>>  * gcc.target/s390/zvector/vec_convert_from_fp16.c: Dito.
>>  * gcc.target/s390/zvector/vec_convert_to_fp16.c: Dito.
>>  * gcc.target/s390/zvector/vec_extend_to_fp32_hi.c: Dito.
>>  * gcc.target/s390/zvector/vec_extend_to_fp32_lo.c: Dito.
>>  * gcc.target/s390/zvector/vec_round_from_fp32.c: Dito.
>> ---
>>  gcc/config/s390/s390-builtin-types.def |  5 ++---
>>  gcc/config/s390/s390-builtins.def  | 10 +-
>>  .../gcc.target/s390/zvector/vec-nnpa-fp16-convert.c|  6 +++---
>>  .../gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c  |  2 +-
>>  .../gcc.target/s390/zvector/vec_convert_from_fp16.c|  4 ++--
>>  .../gcc.target/s390/zvector/vec_convert_to_fp16.c  |  4 ++--
>>  .../gcc.target/s390/zvector/vec_extend_to_fp32_hi.c|  2 +-
>>  .../gcc.target/s390/zvector/vec_extend_to_fp32_lo.c|  2 +-
>>  .../gcc.target/s390/zvector/vec_round_from_fp32.c  |  2 +-
>>  9 files changed, 18 insertions(+), 19 deletions(-)
>>
>> diff --git a/gcc/config/s390/s390-builtin-types.def 
>> b/gcc/config/s390/s390-builtin-types.def
>> index 3d8b30cdcc8..0bf759bd77a 100644
>> --- a/gcc/config/s390/s390-builtin-types.def
>> +++ b/gcc/config/s390/s390-builtin-types.def
>> @@ -265,9 +265,9 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
>> BT_V2DF)
>>  DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
>>  DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
>>  DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
>> +DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
>>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
>>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
>> -DEF_FN_TYPE_2 (BT_FN_V4SF_V8HI_UINT, BT_V4SF, BT_V8HI, BT_UINT)
>>  DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
>>  DEF_FN_TYPE_2 (BT_FN_V4SI_INT_VOIDCONSTPTR, BT_V4SI, BT_INT, 
>> BT_VOIDCONSTPTR)
>>  DEF_FN_TYPE_2 (BT_FN_V4SI_UV4SI_UV4SI, BT_V4SI, BT_UV4SI, BT_UV4SI)
>> @@ -279,7 +279,6 @@ DEF_FN_TYPE_2 (BT_FN_V8HI_BV8HI_V8HI, BT_V8HI, BT_BV8HI, 
>> BT_V8HI)
>>  DEF_FN_TYPE_2 (BT_FN_V8HI_UV8HI_UV8HI, BT_V8HI, BT_UV8HI, BT_UV8HI)
>>  DEF_FN_TYPE_2 (BT_FN_V8HI_V16QI_V16QI, BT_V8HI, BT_V16QI, BT_V16QI)
>>  DEF_FN_TYPE_2 (BT_FN_V8HI_V4SI_V4SI, BT_V8HI, BT_V4SI, BT_V4SI)
>> -DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_UINT, BT_V8HI, BT_V8HI, BT_UINT)
>>  DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
>>  DEF_FN_TYPE_2 (BT_FN_VOID_UINT64PTR_UINT64, BT_VOID, BT_UINT64PTR, 
>> BT_UINT64)
>>  DEF_FN_TYPE_2 (BT_FN_VOID_V2DF_FLTPTR, BT_VOID, BT_V2DF, BT_FLTPTR)
>> @@ -317,6 +316,7 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
>> BT_UV8HI, BT_USHORT, BT_I
>>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
>> BT_INT)
>>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, 
>> BT_UV8HI, BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
>> BT_UV8HI)
>> +DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
>> BT_UINT)
>>  DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
>> BT_UV16QI, BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, 
>> BT_V16QI, BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
>> BT_V16QI)
>> @@ -347,7 +347,6 @@ DEF_FN_TYPE_3 (BT_FN_V4SI_V4SI_V4SI_V4SI, BT_V4SI, 
>> BT_V4SI, BT_V4SI, BT_V4SI)
>>  DEF_FN_TYPE_3 (BT_FN_V4SI_V8HI_V8HI_V4SI, BT_V4SI, BT_V8HI, BT_V8HI, 
>> BT_V4SI)
>>  DEF_FN_TYPE_3 (BT_FN_V8HI_UV8HI_UV8HI_INTPTR, BT_V8HI, BT_UV8HI, BT_UV8HI, 
>> BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_V8HI_V16QI_V16QI_V8HI, BT_V8HI, BT_V16QI, BT_V16QI, 
>> BT_V8HI)
>> -DEF_FN_TYPE_3 (BT_FN_V8HI_V4SF_V4SF_UINT, BT_V8HI, BT_V4SF, BT_V4SF, 
>> BT_UINT)
>>  DEF_FN_TYPE_3 (BT_FN_V8HI_V4SI_V4SI_INTPTR, BT_V8HI, BT_V4SI, BT_V4SI, 
>> BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_INTPTR, BT_V8HI, BT_V8HI, BT_V8HI, 
>> BT_INTPTR)
>>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI, 
>> BT_V8HI)
>> diff --git a/gcc/config/s390/s390-builtins.def 
>> b/gcc/config/s390/s390-builtins.def
>> index 964d86c74a0..f331eba100a 100644
>> --- a/gcc/config/s390/s390-builtins.def
>> +++ b/gcc/confi

[PATCH] tree-optimization/112706 - missed simplification of condition

2023-11-27 Thread Richard Biener
We lack a match.pd pattern recognizing ptr + o ==/!= ptr + o'.
The following extends handling we have for integral types to
pointers.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112706
* match.pd (ptr + o ==/!=/- ptr + o'): New patterns.

* gcc.dg/tree-ssa/pr112706.c: New testcase.
---
 gcc/match.pd |  9 +
 gcc/testsuite/gcc.dg/tree-ssa/pr112706.c | 15 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112706.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 61e5d3441f4..95225e4ca5f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2596,6 +2596,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0
(op @0 @1
+/* And similar for pointers.  */
+(for op (eq ne)
+ (simplify
+  (op (pointer_plus @0 @1) (pointer_plus @0 @2))
+  (op @1 @2)))
+(simplify
+ (pointer_diff (pointer_plus @0 @1) (pointer_plus @0 @2))
+ (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
+  (convert (minus @1 @2
 
 /* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
 (for op (lt le ge gt)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112706.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr112706.c
new file mode 100644
index 000..217730b99b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112706.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+int *ptr;
+void link_error ();
+void
+test ()
+{
+  int *ptr1 = ptr + 10;
+  int *ptr2 = ptr + 20;
+  if (ptr1 == ptr2)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "if" "fre1" } } */
-- 
2.35.3


[PATCH] s390: Fixup builtins vec_rli and verll

2023-11-27 Thread Stefan Schulze Frielinghaus
Commit 248df13b966f46649e16dc3c8c92b263790ef503 restricted the rotate
count to immediates.  Although the documentation of vec_rli (Vector
Element Rotate Left Immediate) can be read as if it where restricted to
immediates, this is not the case.  Thus, revert this commit.

In order to finally allow register operands, the rotate count must be of
type unsigned char since the expander expects it to be of mode QI.  The
previously used type unsigned integer worked out for immediates since
those are of VOID mode anyway.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Remove types.
* config/s390/s390-builtins.def (O_U64): Remove 64-bit literal support.
Don't restrict s390_vec_rli and s390_verll[bhfg] to immediates.
* config/s390/s390.cc (s390_const_operand_ok): Remove 64-bit
literal support.
---
 gcc/config/s390/s390-builtin-types.def |  4 --
 gcc/config/s390/s390-builtins.def  | 60 +++---
 gcc/config/s390/s390.cc|  6 +--
 3 files changed, 27 insertions(+), 43 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 6799b883e29..6d2a3f912b8 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -216,7 +216,6 @@ DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_INT, BT_UV16QI, BT_UCHAR, 
BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_UCHAR, BT_UV16QI, BT_UCHAR, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_INTPTR, BT_UV16QI, BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UCHAR, BT_UV16QI, BT_UV16QI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UINT, BT_UV16QI, BT_UV16QI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UV16QI, BT_UV16QI, BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV2DI_UV2DI, BT_UV16QI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV4SI_UV4SI, BT_UV16QI, BT_UV4SI, BT_UV4SI)
@@ -225,7 +224,6 @@ DEF_FN_TYPE_2 (BT_FN_UV2DI_UCHAR_UCHAR, BT_UV2DI, BT_UCHAR, 
BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONG_INT, BT_UV2DI, BT_ULONGLONG, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV16QI_UV16QI, BT_UV2DI, BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UCHAR, BT_UV2DI, BT_UV2DI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UINT, BT_UV2DI, BT_UV2DI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UV2DI, BT_UV2DI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV4SI_UV4SI, BT_UV2DI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV8HI_UV8HI, BT_UV2DI, BT_UV8HI, BT_UV8HI)
@@ -236,7 +234,6 @@ DEF_FN_TYPE_2 (BT_FN_UV4SI_UV16QI_UV16QI, BT_UV4SI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV2DI_UV2DI, BT_UV4SI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_INTPTR, BT_UV4SI, BT_UV4SI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UCHAR, BT_UV4SI, BT_UV4SI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UINT, BT_UV4SI, BT_UV4SI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UV4SI, BT_UV4SI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV8HI_UV8HI, BT_UV4SI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UCHAR_UCHAR, BT_UV8HI, BT_UCHAR, BT_UCHAR)
@@ -245,7 +242,6 @@ DEF_FN_TYPE_2 (BT_FN_UV8HI_UV16QI_UV16QI, BT_UV8HI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV4SI_UV4SI, BT_UV8HI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UCHAR, BT_UV8HI, BT_UV8HI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UINT, BT_UV8HI, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_BV16QI_V16QI, BT_V16QI, BT_BV16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_UINT_VOIDCONSTPTR, BT_V16QI, BT_UINT, 
BT_VOIDCONSTPTR)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index f5540106adc..b09c303adc0 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -28,7 +28,6 @@
 #undef O_U12
 #undef O_U16
 #undef O_U32
-#undef O_U64
 
 #undef O_M12
 
@@ -89,11 +88,6 @@
 #undef O3_U32
 #undef O4_U32
 
-#undef O1_U64
-#undef O2_U64
-#undef O3_U64
-#undef O4_U64
-
 #undef O1_M12
 #undef O2_M12
 #undef O3_M12
@@ -163,21 +157,20 @@
 #define O_U127 /* unsigned 16 bit literal */
 #define O_U168 /* unsigned 16 bit literal */
 #define O_U329 /* unsigned 32 bit literal */
-#define O_U64   10 /* unsigned 64 bit literal */
 
-#define O_M12   11 /* matches bitmask of 12 */
+#define O_M12   10 /* matches bitmask of 12 */
 
-#define O_S212 /* signed  2 bit literal */
-#define O_S313 /* signed  3 bit literal */
-#define O_S414 /* signed  4 bit literal */
-#define O_S515 /* signed  5 bit literal */
-#define O_S816 /* signed  8 bit literal */
-#define O_S12   17 /* signed 12 bit literal */
-#define O_S16   18 /* signed 16 bit literal */
-#define O_S32   19 /* signed 32 bit literal */
+#define O_S211 /* signed  2 bit literal */
+#define O_S312 /* signed  3 bit literal *

Re: [PATCH] PR tree-optimization/111922 - Ensure wi_fold arguments match precisions.

2023-11-27 Thread Richard Biener
On Fri, Nov 24, 2023 at 5:53 PM Andrew MacLeod  wrote:
>
> This problem here is that IPA is calling something like operator_minus
> with 2 operands, one with precision 32 (int) and one with precision 64
> (pointer). There are various ways this can happen as mentioned in the PR.
>
> Regardless of whether IPA should be doing promoting types or not calling
> into range-ops,  range-ops does not support mis-matched precision in its
> arguments and it does not have to context to know what should be
> promoted/changed.   It is expected that the caller will ensure the
> operands are compatible.
>
> However, It is not really practical for the caller to know this with
> more context. Some operations support different precision or even
> types.. ie, shifts, or casts, etc.It seems silly to require IPA to
> have a big switch to see what the tree code is and match up/promote/or
> bail if operands don't match...
>
> Range-ops routines probably shouldn't crash when this happens either, so
> this patch takes the conservative approach  and returns VARYING if there
> is a mismatch in the arguments precision.
>
> Fixes the problem and bootstraps on x86_64-pc-linux-gnu with no new
> regressions.
>
> OK for trunk?
>
> Andrew
>
> PS  If you would rather we trap in these cases and fix the callers, then
> I'd suggest we change these to checking_asserts instead.  I have also
> prepared a version that does a gcc_checking_assert instead of returning
> varying and done a bootstrap/testrun.Of course, the callers will
> have to be changed..

Yes, I'd very much prefer that - otherwise we get hard to find missed
optimizations
when one botches the argument (types).

Richard.

>
> It bootstraps fine in that variation too, and all the testcases (except
> this one of course) pass.   Its clearly not a common occurrence, and my
> inclination is to apply this patch so we silently move on and simply
> don't provide useful range info.. that is all the callers in these cases
> are likely to do anyway...
>
>
>
>


Re: [PATCH v2 3/7] aarch64: Add eh_return compile tests

2023-11-27 Thread Szabolcs Nagy
The 11/26/2023 14:37, Richard Sandiford wrote:
> Szabolcs Nagy  writes:
> > +++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mbranch-protection=pac-ret+leaf" } */
> 
> Probably best to add -fno-schedule-insns -fno-schedule-insns2, so that the
> instructions in the check-function-bodies are in a more predictable order.
> 
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +/*
> > +**foo:
> > +** hint25 // paciasp
> > +** stp x0, x1, .*
> > +** stp x2, x3, .*
> > +** cbz w2, .*
> > +** mov x4, 0
> > +** ldp x2, x3, .*
> > +** ldp x0, x1, .*
> > +** cbz x4, .*
> > +** add sp, sp, x5
> > +** br  x6
> > +** hint29 // autiasp
> > +** ret
> > +** mov x5, x0
> > +** mov x6, x1
> > +** mov x4, 1
> > +** b   .*
> > +*/
> 
> What's the significance of x3 here?  It looks from the function definition
> like it should be undefined.  And what are the stps and ldps doing?

x0,..,x3 are preserved registers for eh (EH_RETURN_DATA_REGNO).

they are saved in the prologue and restored in the epilogue so
they can pass arguments to eh, which i think is relevant to an
eh_return test, although if the compiler knows they are not
clobbered then it could eliminate the save/restore.

> If those aren't an important part of the test, it might be better
> to stub them out with "...", e.g.:

i can do that.

> /*
> **foo:
> **hint25 // paciasp
> **...
> **cbz w2, .*
> **mov x4, 0
> **...
> **cbz x4, .*
> **add sp, sp, x5
> **br  x6
> **hint29 // autiasp
> **ret
> **mov x5, x0
> **mov x6, x1
> **mov x4, 1
> **b   .*
> */
> 
> LGTM otherwise.

thanks.


Re: [PATCH] s390: Fixup builtins vec_rli and verll

2023-11-27 Thread Andreas Krebbel
On 11/27/23 10:53, Stefan Schulze Frielinghaus wrote:
> Commit 248df13b966f46649e16dc3c8c92b263790ef503 restricted the rotate
> count to immediates.  Although the documentation of vec_rli (Vector
> Element Rotate Left Immediate) can be read as if it where restricted to
> immediates, this is not the case.  Thus, revert this commit.
> 
> In order to finally allow register operands, the rotate count must be of
> type unsigned char since the expander expects it to be of mode QI.  The
> previously used type unsigned integer worked out for immediates since
> those are of VOID mode anyway.
> 
> Bootstrapped and regtested on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def: Remove types.
>   * config/s390/s390-builtins.def (O_U64): Remove 64-bit literal support.
>   Don't restrict s390_vec_rli and s390_verll[bhfg] to immediates.
>   * config/s390/s390.cc (s390_const_operand_ok): Remove 64-bit
>   literal support.

Ok, Thanks!

Andreas



Re: [wwwdocs][patch][OpenACC] gcc-14/changes.html: OpenACC - mention support for first 2.7 features

2023-11-27 Thread Tobias Burnus

On 26.11.23 09:48, Gerald Pfeifer wrote:

On Fri, 24 Nov 2023, Tobias Burnus wrote:

Comments before I commit it?

+  https://gcc.gnu.org/wiki/OpenACC
+OpenACC 2.7: The self clause was added to be used on
+  compute constructs and the default clause for data
+  constructs.
+  
+  

Where does that  come from? I'm afraid this won't validate/render
properly.


That's the disadvantage of splitting patches from the same file ... The
'' starts / should start just before the quoted line, namely as
attached. (Updated patch attach - minor changes but syntactically relevant.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-14/changes.html: OpenACC - mention support for first 2.7 features

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 2088ee91..4ceed13d 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -90,6 +91,13 @@ a work-in-progress.
 
   
   
+  https://gcc.gnu.org/wiki/OpenACC";>OpenACC
+  
+OpenACC 2.7: The self clause was added to be used on
+  compute constructs and the default clause for data
+  constructs.
+  
+  
   For offload-device code generated via OpenMP and OpenACC, the math
   and the Fortran runtime libraries will now automatically be linked,
   when the user or compiler links them on the host side. Thus, it is no



Re: [patch] OpenMP: Add -Wopenmp and use it

2023-11-27 Thread Christophe Lyon
Hi!

On Fri, 24 Nov 2023 at 15:08, Jakub Jelinek  wrote:
>
> On Fri, Nov 24, 2023 at 02:51:28PM +0100, Tobias Burnus wrote:
> > Following the general trend to add a "[-W...]" to the warning messages
> > for both better grouping of the warnings and - more importantly - for 
> > providing
> > a means to silence such a warning (or to -Werror= them explicitly), this 
> > patch
> > replaces several '0' by OPT_Wopenmp.
> >
> > Comments or remarks before I commit it?
>
> LGTM, thanks for working on it.
>
> Jakub
>

I think the lack of final '.' in:
gcc/c-family/c.opt
+ Warn about suspicious OpenMP code

has caused the following regressions:
Running gcc:gcc.misc-tests/help.exp ...
FAIL: compiler driver --help=c option(s): "^ +-.*[^:.]$" absent from
output: "  -WopenmpWarn about suspicious OpenMP
code"
FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]$" absent from
output: "  -WopenmpWarn about suspicious OpenMP
code"
FAIL: compiler driver --help=fortran option(s): "^ +-.*[^:.]$" absent
from output: "  -WopenmpWarn about suspicious
OpenMP code"
FAIL: compiler driver --help=warnings option(s): "^ +-.*[^:.]$" absent
from output: "  -WopenmpWarn about suspicious
OpenMP code"

I think you have received a notification from our CI about that?

Can you check it's as simple as that?

Thanks,

Christophe


Re: [patch] OpenMP: Add -Wopenmp and use it

2023-11-27 Thread Jakub Jelinek
On Mon, Nov 27, 2023 at 11:20:20AM +0100, Christophe Lyon wrote:
> On Fri, 24 Nov 2023 at 15:08, Jakub Jelinek  wrote:
> > > Comments or remarks before I commit it?
> >
> > LGTM, thanks for working on it.
> >
> > Jakub
> >
> 
> I think the lack of final '.' in:
> gcc/c-family/c.opt
> + Warn about suspicious OpenMP code

Tobias has fixed that a few commits later:
r14-5835-g6eb1507107dee3e67e3a136e2917b93cdffba7c4

Sorry for missing that during patch review.

Jakub



Re: [patch] OpenMP: Add -Wopenmp and use it

2023-11-27 Thread Tobias Burnus

Hi,

On 27.11.23 11:20, Christophe Lyon wrote:


I think the lack of final '.' in:


Indeed - but you are lagging a bit behind:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638128.html

[committed] c-family/c.opt (-Wopenmp): Add missing tailing '.'

Fri Nov 24 18:56:21 GMT 2023

Committed as r14-5835-g6eb1507107dee3

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v5] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-27 Thread Alex Coplan
On 23/11/2023 12:41, Marek Polacek wrote:
> On Mon, Nov 20, 2023 at 05:29:58PM -0500, Jason Merrill wrote:
> > On 11/17/23 09:50, Alex Coplan wrote:
> > > Hi,
> > > 
> > > This is a v5 patch to address Marek's feedback here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635157.html
> > > 
> > > I also implemented Jason's suggestion to use constexpr for the tables
> > > from this review:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634484.html
> > > 
> > > I'll attach the incremental change in reply to Marek's review to make
> > > things easier to compare.
> > > 
> > > Bootstrapped/regtested on aarch64-linux-gnu.  Bootstrap/regtest on
> > > x86_64-apple-darwin in progress (on top of this libsanitizer fix:
> > > https://github.com/llvm/llvm-project/issues/72639).
> > > 
> > > OK for trunk if testing passes?
> > 
> > > --- a/gcc/c-family/c-common.h
> > > +/* Implemented in c/c-objc-common.cc.  */
> > > +extern void c_register_features ();
> > 
> > I think this declaration should go in c-objc-common.h, though the C
> > maintainers might prefer c-lang.h or c-tree.h.
> > 
> > > +/* Implemented in cp/cp-objcp-common.cc.  */
> > > +extern void cp_register_features ();
> > 
> > And this one in cp-objc-common.h.
> > 
> > With that change the patch is OK on Friday if Marek doesn't have any other
> > notes.
> 
> v5 looks good to me.  Thanks,

Many thanks both for the reviews, this is now pushed (with Jason's
above changes implemented) as g:06280a906cb3dc80cf5e07cf3335b758848d488d.

Alex

> 
> Marek
> 


Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]

2023-11-27 Thread Nathaniel Shead
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634626.html.

I've been made aware since constructing this patch of CWG2820, which has
a proposed resolution that would change the result of the testcase
'noexcept(yesthrow_t())' (and similarly for the library builtin), but as
it hasn't yet been accepted I think at least ensuring the builtin
matches the behaviour of the operator is probably still sensible.

On Sun, Oct 29, 2023 at 12:43:28PM +1100, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> -- >8 --
> 
> This patch stops eager folding of trivial operations (construction and
> assignment) from occurring when checking for noexceptness. This was
> previously done in PR c++/53025, but only for copy/move construction,
> and the __is_nothrow_xible builtins did not receive the same treatment
> when they were added.
> 
> To handle `is_nothrow_default_constructible`, the patch also ensures
> that when no parameters are passed we do value initialisation instead of
> just building the constructor call: in particular, value-initialisation
> doesn't necessarily actually invoke the constructor for trivial default
> constructors, and so we need to handle this case as well.
> 
>   PR c++/96090
>   PR c++/100470
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (build_over_call): Prevent folding of trivial special
>   members when checking for noexcept.
>   * method.cc (constructible_expr): Perform value-initialisation
>   for empty parameter lists.
>   (is_nothrow_xible): Treat as noexcept operator.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/noexcept81.C: New test.
>   * g++.dg/ext/is_nothrow_constructible7.C: New test.
>   * g++.dg/ext/is_nothrow_constructible8.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/call.cc| 17 ++---
>  gcc/cp/method.cc  | 19 --
>  gcc/testsuite/g++.dg/cpp0x/noexcept81.C   | 36 +++
>  .../g++.dg/ext/is_nothrow_constructible7.C| 20 ++
>  .../g++.dg/ext/is_nothrow_constructible8.C| 63 +++
>  5 files changed, 141 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept81.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible7.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible8.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index c1fb8807d3f..ac02b0633ed 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -10231,15 +10231,16 @@ build_over_call (struct z_candidate *cand, int 
> flags, tsubst_flags_t complain)
>/* Avoid actually calling copy constructors and copy assignment operators,
>   if possible.  */
>  
> -  if (! flag_elide_constructors && !force_elide)
> +  if (!force_elide 
> +  && (!flag_elide_constructors
> +   /* It's unsafe to elide the operation when handling
> +  a noexcept-expression, it may evaluate to the wrong
> +  value (c++/53025, c++/96090).  */
> +   || cp_noexcept_operand != 0))
>  /* Do things the hard way.  */;
> -  else if (cand->num_convs == 1 
> -   && (DECL_COPY_CONSTRUCTOR_P (fn) 
> -   || DECL_MOVE_CONSTRUCTOR_P (fn))
> -/* It's unsafe to elide the constructor when handling
> -   a noexcept-expression, it may evaluate to the wrong
> -   value (c++/53025).  */
> -&& (force_elide || cp_noexcept_operand == 0))
> +  else if (cand->num_convs == 1
> +&& (DECL_COPY_CONSTRUCTOR_P (fn)
> +|| DECL_MOVE_CONSTRUCTOR_P (fn)))
>  {
>tree targ;
>tree arg = argarray[num_artificial_parms_for (fn)];
> diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
> index a70dd5d6adc..3c978e2369d 100644
> --- a/gcc/cp/method.cc
> +++ b/gcc/cp/method.cc
> @@ -2091,6 +2091,7 @@ constructible_expr (tree to, tree from)
>  {
>tree expr;
>cp_unevaluated cp_uneval_guard;
> +  const int len = TREE_VEC_LENGTH (from);
>if (CLASS_TYPE_P (to))
>  {
>tree ctype = to;
> @@ -2098,11 +2099,16 @@ constructible_expr (tree to, tree from)
>if (!TYPE_REF_P (to))
>   to = cp_build_reference_type (to, /*rval*/false);
>tree ob = build_stub_object (to);
> -  vec_alloc (args, TREE_VEC_LENGTH (from));
> -  for (tree arg : tree_vec_range (from))
> - args->quick_push (build_stub_object (arg));
> -  expr = build_special_member_call (ob, complete_ctor_identifier, &args,
> - ctype, LOOKUP_NORMAL, tf_none);
> +  if (len == 0)
> + expr = build_value_init (ctype, tf_none);
> +  else
> + {
> +   vec_alloc (args, TREE_VEC_LENGTH (from));
> +   for (tree arg : tree_vec_range (from))
> + args->quick_push (build_stub_object (arg));
> +   expr = build_special_member_call (ob, complete_ctor_identifier, &args,
> + ctype, LOOKUP_NORMAL, tf_none);
> +

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-11-27 Thread Nathaniel Shead
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635071.html.

On Fri, Nov 03, 2023 at 12:34:06PM +1100, Nathaniel Shead wrote:
> Oh, this also fixes PR102284 and its other linked PRs (apart from
> fields); I forgot to note that in the commit.
> 
> On Fri, Nov 03, 2023 at 12:18:29PM +1100, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86-64_pc_linux_gnu.
> > 
> > I'm not entirely sure if the change I made to have destructors clobber with
> > CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to 
> > have
> > broken by doing this and I wasn't able to find anything else that really
> > depended on this distinction other than a warning pass. Otherwise I could
> > experiment with a new clobber kind for destructor calls.
> > 
> > -- >8 --
> > 
> > This patch adds checks for using objects after they've been manually
> > destroyed via explicit destructor call. Currently this is only
> > implemented for 'top-level' objects; FIELD_DECLs and individual elements
> > of arrays will need a lot more work to track correctly and are left for
> > a future patch.
> > 
> > The other limitation is that destruction of parameter objects is checked
> > too 'early', happening at the end of the function call rather than the
> > end of the owning full-expression as they should be for consistency;
> > see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
> > good way to link the constructed parameter declarations with the
> > variable declarations that are actually destroyed later on to propagate
> > their lifetime status, so I'm leaving this for a later patch.
> > 
> > PR c++/71093
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (build_trivial_dtor_call): Mark pseudo-destructors as
> > ending lifetime.
> > * constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
> > return NULL_TREE for objects we're initializing.
> > (constexpr_global_ctx::destroy_value): Rename from remove_value.
> > Only mark real variables as outside lifetime.
> > (constexpr_global_ctx::clear_value): New function.
> > (destroy_value_checked): New function.
> > (cxx_eval_call_expression): Defer complaining about non-constant
> > arg0 for operator delete. Use remove_value_safe.
> > (cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
> > (outside_lifetime_error): Include name of object we're
> > accessing.
> > (cxx_eval_store_expression): Handle clobbers. Improve error
> > messages.
> > (cxx_eval_constant_expression): Use remove_value_safe. Clear
> > bind variables before entering body.
> > * decl.cc (build_clobber_this): Mark destructors as ending
> > lifetime.
> > (start_preparsed_function): Pass false to build_clobber_this.
> > (begin_destructor_body): Pass true to build_clobber_this.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
> > * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
> > * g++.dg/cpp2a/bitfield2.C: Likewise.
> > * g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
> > * g++.dg/cpp1y/constexpr-lifetime7.C: New test.
> > * g++.dg/cpp2a/constexpr-lifetime1.C: New test.
> > * g++.dg/cpp2a/constexpr-lifetime2.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  gcc/cp/call.cc|   2 +-
> >  gcc/cp/constexpr.cc   | 149 +++---
> >  gcc/cp/decl.cc|  10 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime1.C|   2 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime2.C|   2 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime3.C|   2 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime4.C|   2 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime7.C|  93 +++
> >  gcc/testsuite/g++.dg/cpp2a/bitfield2.C|   2 +-
> >  .../g++.dg/cpp2a/constexpr-lifetime1.C|  21 +++
> >  .../g++.dg/cpp2a/constexpr-lifetime2.C|  23 +++
> >  gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  17 +-
> >  12 files changed, 292 insertions(+), 33 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime7.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime1.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime2.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index 2eb54b5b6ed..e5e9c6c44f8 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -9682,7 +9682,7 @@ build_trivial_dtor_call (tree instance, bool 
> > no_ptr_deref)
> >  }
> >  
> >/* A trivial destructor should still clobber the object.  */
> > -  tree clobber = build_clobber (TREE_TYPE (instance));
> > +  tree clobber = build_clobber (TREE_TYPE (instance), CLOBBER_EOL);
> >return build2 (MODIFY_EXPR, void_type_node,
> >  i

Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Andrew Stubbs
I tried this patch for AMD GCN. We have a similar problem with excess 
extends, but also for vector modes. Each lane has a minimum 32 bits and 
GCC's normal assumption is that vector registers have precisely the 
number of bits they need, so the amdgcn backend patterns have explicit 
sign/zero extends for QImode and HImode for the instructions that might 
need it. It would be cool if this pass could eliminate some of those, 
but at this point I just wanted to check it didn't break anything.


Unfortunately I get a crash building libgcc:


during RTL pass: ext_dce
conftest.c: In function 'main':
conftest.c:16:1: internal compiler error: RTL check: expected code 'subreg', 
have 'reg' in ext_dce_process_uses, at ext-dce.cc:421
   16 | }
  | ^
0x8c7aa3 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
/scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/rtl.cc:770
0xa76a27 ext_dce_process_uses
/scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:421
0x1aeca5c ext_dce_process_bb
/scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:651
0x1aeca5c ext_dce
/scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:802
0x1aeca5c execute
/scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:868
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
configure:3812: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "GNU C Runtime Library"
| #define PACKAGE_TARNAME "libgcc"
| #define PACKAGE_VERSION "1.0"
| #define PACKAGE_STRING "GNU C Runtime Library 1.0"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL "http://www.gnu.org/software/libgcc/";
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }


I have no idea if this is an unhandled case or a case that shouldn't 
exist, but it's trying to do "SUBREG_BYTE (dst).is_constant ()" for a 
very simple instruction:


(set (reg/i:SI 168 v8)
(const_int 0 [0]))

This seems pretty basic to me, but there is some hidden complexity. It's 
possible that the pass has correctly identified that "v8" can hold more 
that just a single integer: in this case we're using a single lane of a 
vector register. No extend is needed here though. The register has 2048 
bits, but only 32 are active in SImode.


Andrew


[PATCH] vect: Avoid duplicate_and_interleave for uniform vectors [PR112661]

2023-11-27 Thread Richard Sandiford
can_duplicate_and_interleave_p checks whether we know a way of
building a particular VLA SLP invariant.  g:60034ecf25597bd515f
skipped that test for booleans, to support MASK_LEN_GATHER_LOAD
calls with a dummy all-ones mask.  But there's nothing fundamentally
different about VLA masks vs VLA data vectors.  If we have a VLA mask
that isn't all-ones, we need some way of loading it.  This ultimately
led to the ICE in the PR.

This patch fixes it by applying can_duplicate_and_interleave_p
to masks, while also adding a special path for uniform vectors
(of all kinds) to support the MASK_LEN_GATHER_LOAD usage.  This
also fixes an XFAIL in pr36648.cc for SVE.

The patch is mostly Richard's.  My only changes were to skip
redundant conversions and to use gimple_build_vector_from_val
for all eligible vectors.

Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
OK to install?

Richard


2023-11-27  Richard Biener  
Richard Sandiford  

gcc/
PR tree-optimization/112661
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Defer duplicate-and-
interleave test to...
(vect_build_slp_tree_2): ...here, once we have all the operands.
Skip the test for uniform vectors.
(vect_create_constant_vectors): Detect uniform vectors.  Avoid
redundant conversions in that case.  Use gimple_build_vector_from_val
to build the vector.

gcc/testsuite/
* g++.dg/vect/pr36648.cc: Remove XFAIL for VLA load-lanes.
---
 gcc/testsuite/g++.dg/vect/pr36648.cc |  2 +-
 gcc/tree-vect-slp.cc | 56 +++-
 2 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr36648.cc 
b/gcc/testsuite/g++.dg/vect/pr36648.cc
index 8d24d3d445d..7bda82899d0 100644
--- a/gcc/testsuite/g++.dg/vect/pr36648.cc
+++ b/gcc/testsuite/g++.dg/vect/pr36648.cc
@@ -25,6 +25,6 @@ int main() { }
targets, ! vect_no_align is a sufficient test.  */
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
{ { !  vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
vect_hw_misalign } } xfail { vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
vect_hw_misalign } } } } } */
 
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4a09b3c2aca..6799b9375ae 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -763,18 +763,6 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
{
  tree type = TREE_TYPE (oprnd);
  dt = dts[i];
- if ((dt == vect_constant_def
-  || dt == vect_external_def)
- && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
- && TREE_CODE (type) != BOOLEAN_TYPE
- && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: invalid type of def "
-"for variable-length SLP %T\n", oprnd);
- return -1;
-   }
 
  /* For the swapping logic below force vect_reduction_def
 for the reduction op in a SLP reduction group.  */
@@ -2395,7 +2383,7 @@ out:
   /* Create SLP_TREE nodes for the definition node/s.  */
   FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
 {
-  slp_tree child;
+  slp_tree child = nullptr;
   unsigned int j;
 
   /* We're skipping certain operands from processing, for example
@@ -2443,6 +2431,29 @@ out:
   if (oprnd_info->first_dt == vect_external_def
  || oprnd_info->first_dt == vect_constant_def)
{
+ if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ())
+   {
+ tree op0;
+ tree uniform_val = op0 = oprnd_info->ops[0];
+ for (j = 1; j < oprnd_info->ops.length (); ++j)
+   if (!operand_equal_p (uniform_val, oprnd_info->ops[j]))
+ {
+   uniform_val = NULL_TREE;
+   break;
+ }
+ if (!uniform_val
+ && !can_duplicate_and_interleave_p (vinfo,
+ oprnd_info->ops.length (),
+ TREE_TYPE (op0)))
+   {
+ matches[j] = false;
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: invalid type of def "
+   

[PATCH] Treat "p" in asms as addressing VOIDmode

2023-11-27 Thread Richard Sandiford
check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA.  Before RA it tested the address with a
void (unknown) memory mode:

case CT_ADDRESS:
  /* Every address operand can be reloaded to fit.  */
  result = result || address_operand (op, VOIDmode);
  break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
 || SCALAR_INT_MODE_P (GET_MODE (op)))
&& (strict <= 0
|| (strict_memory_address_p
 (recog_data.operand_mode[opno], op
  win = true;

Using the mode of the operand matches reload's behaviour:

  else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

It allowed the special predicate address_operand to be used, with the
mode of the operand being the mode of the addressed memory, rather than
the mode of the address itself.  For example, vax has:

(define_insn "*movaddr"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(match_operand:VAXfp 1 "address_operand" "p"))
   (clobber (reg:CC VAX_PSL_REGNUM))]
  "reload_completed"
  "mova %a1,%0")

where operand 1 is an SImode expression that can address memory of
mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
but recog_data.operand_mode[1] is mode.

But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
do this, and instead pass VOIDmode.  So I think this traditional use
of address_operand is effectively an old-reload-only feature.

And it seems like no modern port cares.  I think ports have generally
moved to using different address constraints instead, rather than
relying on "p" with different operand modes.  Target-specific address
constraints post-date the code above.

The big advantage of using different constraints is that it works
for asms too.  And that (to finally get to the point) is the problem
fixed in this patch.  For the aarch64 test:

  void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }

everything up to and including RA required the operand to be a
valid VOIDmode address.  But post-RA check_asm_operands and
constrain_operands instead required it to be valid for
recog_data.operand_mode[0].  Since asms have no syntax for
specifying an operand mode that's separate from the operand itself,
operand_mode[0] is simply Pmode (i.e. DImode).

This meant that we required one mode before RA and a different mode
after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
more conservative/restricted range than DImode.  So if a post-RA pass
tried to form a new address, it would use a laxer condition than the
pre-RA passes.

This happened with the late-combine pass that I posted in October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
which in turn triggered an error from aarch64_print_operand_address.

This patch takes the (hopefully) conservative fix of using VOIDmode for
asms but continuing to use the operand mode for .md insns, so as not
to break ports that still use reload.

Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.
---
 gcc/recog.cc   | 18 +++---
 gcc/rtl-ssa/changes.cc |  4 +++-
 .../gcc.target/aarch64/prfm_imm_offset_2.c |  2 ++
 3 files changed, 16 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_2.c

diff --git a/gcc/recog.cc b/gcc/recog.cc
index eaab79c25d7..bff7be1aec1 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3191,13 +3191,17 @@ constrain_operands (int strict, alternative_mask 
alternatives)
   strictly valid, i.e., that all pseudos requiring hard regs
   have gotten them.  We also want to make sure we have a
   valid mode.  */
-   if ((GET_MODE (op) == VOIDmode
-|| SCALAR_INT_MODE_P (GET_MODE (op)))
-   && (strict <= 0
-   || (strict_memory_address_p
-(recog_data.operand_mode[opno], op
- win = true;
-   break;
+  

Re: [PATCH] vect: Avoid duplicate_and_interleave for uniform vectors [PR112661]

2023-11-27 Thread Richard Biener
On Mon, 27 Nov 2023, Richard Sandiford wrote:

> can_duplicate_and_interleave_p checks whether we know a way of
> building a particular VLA SLP invariant.  g:60034ecf25597bd515f
> skipped that test for booleans, to support MASK_LEN_GATHER_LOAD
> calls with a dummy all-ones mask.  But there's nothing fundamentally
> different about VLA masks vs VLA data vectors.  If we have a VLA mask
> that isn't all-ones, we need some way of loading it.  This ultimately
> led to the ICE in the PR.
> 
> This patch fixes it by applying can_duplicate_and_interleave_p
> to masks, while also adding a special path for uniform vectors
> (of all kinds) to support the MASK_LEN_GATHER_LOAD usage.  This
> also fixes an XFAIL in pr36648.cc for SVE.
> 
> The patch is mostly Richard's.  My only changes were to skip
> redundant conversions and to use gimple_build_vector_from_val
> for all eligible vectors.
> 
> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> OK to install?

OK.

Thanks for picking up.

Richard.

> Richard
> 
> 
> 2023-11-27  Richard Biener  
>   Richard Sandiford  
> 
> gcc/
>   PR tree-optimization/112661
>   * tree-vect-slp.cc (vect_get_and_check_slp_defs): Defer duplicate-and-
>   interleave test to...
>   (vect_build_slp_tree_2): ...here, once we have all the operands.
>   Skip the test for uniform vectors.
>   (vect_create_constant_vectors): Detect uniform vectors.  Avoid
>   redundant conversions in that case.  Use gimple_build_vector_from_val
>   to build the vector.
> 
> gcc/testsuite/
>   * g++.dg/vect/pr36648.cc: Remove XFAIL for VLA load-lanes.
> ---
>  gcc/testsuite/g++.dg/vect/pr36648.cc |  2 +-
>  gcc/tree-vect-slp.cc | 56 +++-
>  2 files changed, 40 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/testsuite/g++.dg/vect/pr36648.cc 
> b/gcc/testsuite/g++.dg/vect/pr36648.cc
> index 8d24d3d445d..7bda82899d0 100644
> --- a/gcc/testsuite/g++.dg/vect/pr36648.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr36648.cc
> @@ -25,6 +25,6 @@ int main() { }
> targets, ! vect_no_align is a sufficient test.  */
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { { { !  vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
> vect_hw_misalign } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
> vect_hw_misalign } } xfail { vect_variable_length && vect_load_lanes } } } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && 
> vect_hw_misalign } } } } } */
>  
>  
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4a09b3c2aca..6799b9375ae 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -763,18 +763,6 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>   {
> tree type = TREE_TYPE (oprnd);
> dt = dts[i];
> -   if ((dt == vect_constant_def
> -|| dt == vect_external_def)
> -   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> -   && TREE_CODE (type) != BOOLEAN_TYPE
> -   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> - {
> -   if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -  "Build SLP failed: invalid type of def "
> -  "for variable-length SLP %T\n", oprnd);
> -   return -1;
> - }
>  
> /* For the swapping logic below force vect_reduction_def
>for the reduction op in a SLP reduction group.  */
> @@ -2395,7 +2383,7 @@ out:
>/* Create SLP_TREE nodes for the definition node/s.  */
>FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
>  {
> -  slp_tree child;
> +  slp_tree child = nullptr;
>unsigned int j;
>  
>/* We're skipping certain operands from processing, for example
> @@ -2443,6 +2431,29 @@ out:
>if (oprnd_info->first_dt == vect_external_def
> || oprnd_info->first_dt == vect_constant_def)
>   {
> +   if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ())
> + {
> +   tree op0;
> +   tree uniform_val = op0 = oprnd_info->ops[0];
> +   for (j = 1; j < oprnd_info->ops.length (); ++j)
> + if (!operand_equal_p (uniform_val, oprnd_info->ops[j]))
> +   {
> + uniform_val = NULL_TREE;
> + break;
> +   }
> +   if (!uniform_val
> +   && !can_duplicate_and_interleave_p (vinfo,
> +   oprnd_info->ops.length (),
> +   TREE_TYPE (op0)))
> + {
> +   matches[j] = false;
>

[PATCH] s390: Add missing builtin type

2023-11-27 Thread Stefan Schulze Frielinghaus
One builtin type slipped through the cracks of the last commits.

Bootstrapped on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def (BT_FN_UV8HI_UV8HI_UINT):
Add missing builtin type.
---
 gcc/config/s390/s390-builtin-types.def | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 6d2a3f912b8..5057f342f0b 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -242,6 +242,7 @@ DEF_FN_TYPE_2 (BT_FN_UV8HI_UV16QI_UV16QI, BT_UV8HI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV4SI_UV4SI, BT_UV8HI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UCHAR, BT_UV8HI, BT_UV8HI, BT_UCHAR)
+DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UINT, BT_UV8HI, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_BV16QI_V16QI, BT_V16QI, BT_BV16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_UINT_VOIDCONSTPTR, BT_V16QI, BT_UINT, 
BT_VOIDCONSTPTR)
-- 
2.41.0



Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-11-27 Thread Robin Dapp
> The easiest way to avoid running into the alias analysis problem is
> to scrap the MEM_EXPR when we expand the internal functions for
> partial loads/stores.  That avoids the disambiguation we run into
> which is realizing that we store to an object of less size as
> the size of the mode we appear to store.
> 
> After the patch we see just
> 
>   [1  S64 A32]
> 
> so we preserve the alias set, the alignment and the size (the size
> is redundant if the MEM insn't BLKmode).  That's still not good
> in case the RTL alias oracle would implement the same
> disambiguation but it fends off the gimple one.
> 
> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> and --param=vect-partial-vector-usage=1.

On riscv we're seeing a similar problem across the testsuite
and several execution failures as a result.  In the case I
looked at we move a scalar load upwards over a partial store
that aliases the load.

I independently arrived at the spot mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
before knowing about the PR.

I can confirm that your RFC patch fixes at least two of the
failures,  I haven't checked the others but very likely
they are similar.

Regards
 Robin



Re: [PATCH] s390: Add missing builtin type

2023-11-27 Thread Andreas Krebbel
On 11/27/23 13:38, Stefan Schulze Frielinghaus wrote:
> One builtin type slipped through the cracks of the last commits.
> 
> Bootstrapped on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def (BT_FN_UV8HI_UV8HI_UINT):
>   Add missing builtin type.

Ok

Andreas



RE: [PATCH 21/21]Arm: Add MVE cbranch implementation

2023-11-27 Thread Kyrylo Tkachov
Hi Tamar,

> -Original Message-
> From: Tamar Christina 
> Sent: Monday, November 6, 2023 7:43 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH 21/21]Arm: Add MVE cbranch implementation
> 
> Hi All,
> 
> This adds an implementation for conditional branch optab for MVE.
> 
> Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
> ability to do P0 comparisons and logical OR on P0.
> 
> For that reason we can only support cbranch with 0, as for comparing to a 0
> predicate we don't need to actually do a comparison, we only have to check 
> that
> any bit is set within P0.
> 
> Because we can only do P0 comparisons with 0, the costing of the comparison 
> was
> reduced in order for the compiler not to try to push 0 to a register thinking
> it's too expensive.  For the cbranch implementation to be safe we must see the
> constant 0 vector.
> 
> For the lack of logical OR on P0 we can't really work around.  This means MVE
> can't support cases where the sizes of operands in the comparison don't match,
> i.e. when one operand has been unpacked.
> 
> For e.g.
> 
> void f1 ()
> {
>   for (int i = 0; i < N; i++)
> {
>   b[i] += a[i];
>   if (a[i] > 0)
>   break;
> }
> }
> 
> For 128-bit vectors we generate:
> 
> vcmp.s32gt, q3, q1
> vmrsr3, p0  @ movhi
> cbnzr3, .L2
> 
> MVE does not have 64-bit vector comparisons, as such that is also not 
> supported.
> 
> Bootstrapped arm-none-linux-gnueabihf and regtested with
> -march=armv8.1-m.main+mve -mfpu=auto and no issues.
> 
> Ok for master?
> 

This is okay once the rest goes in.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
>   compares.
>   * config/arm/mve.md (cbranch4): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (vect_early_break): Add MVE.
>   * gcc.target/arm/mve/vect-early-break-cbranch.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index
> 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84
> b255a24eb51e32 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code,
> enum rtx_code outer_code,
>  || TARGET_HAVE_MVE)
> && simd_immediate_valid_for_move (x, mode, NULL, NULL))
>   *cost = COSTS_N_INSNS (1);
> +  else if (TARGET_HAVE_MVE
> +&& outer_code == COMPARE
> +&& VALID_MVE_PRED_MODE (mode))
> + /* MVE allows very limited instructions on VPT.P0,  however comparisons
> +to 0 do not require us to materialze this constant or require a
> +predicate comparison as we can go through SImode.  For that reason
> +allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
> +registers as we can't compare two predicates.  */
> + *cost = COSTS_N_INSNS (1);
>else
>   *cost = COSTS_N_INSNS (4);
>return true;
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index
> 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38
> 306dd43cbebfb3f 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -6880,6 +6880,21 @@ (define_expand
> "vcond_mask_"
>DONE;
>  })
> 
> +(define_expand "cbranch4"
> +  [(set (pc) (if_then_else
> +   (match_operator 0 "expandable_comparison_operator"
> +[(match_operand:MVE_7 1 "register_operand")
> + (match_operand:MVE_7 2 "zero_operand")])
> +   (label_ref (match_operand 3 "" ""))
> +   (pc)))]
> +  "TARGET_HAVE_MVE"
> +{
> +  rtx val = gen_reg_rtx (SImode);
> +  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
> +  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, 
> operands[3]));
> +  DONE;
> +})
> +
>  ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
>  (define_expand "@arm_mve_reinterpret"
>[(set (match_operand:MVE_vecs 0 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> new file mode 100644
> index
> ..c3b8506dca0b2b044e6869a6
> c8259d663c1ff930
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> @@ -0,0 +1,117 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +/*
> +** f1:
> +**   ...
> +**   vcmp.s32gt, q[0-9]+, q[0-9]+
> +**   vmrsr[0-9]+, p0 @ movhi
> +**   cbnzr[0-9]+, \.L[0-9]+
> +*

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-11-27 Thread Andre Vieira (lists)

Hi Stam,

Just some comments.

+/* Recursively scan through the DF chain backwards within the basic 
block and
+   determine if any of the USEs of the original insn (or the USEs of 
the insns
s/Recursively scan/Scan/ as you no longer recurse, thanks for that by 
the way :) +   where thy were DEF-ed, etc., recursively) were affected 
by implicit VPT

remove recursively for the same reasons.

+  if (!CONST_INT_P (cond_counter_iv.step) || !CONST_INT_P 
(cond_temp_iv.step))

+   return NULL;
+  /* Look at the steps and swap around the rtx's if needed.  Error 
out if

+one of them cannot be identified as constant.  */
+  if (INTVAL (cond_counter_iv.step) != 0 && INTVAL 
(cond_temp_iv.step) != 0)

+   return NULL;

Move the comment above the if before, as the erroring out it talks about 
is there.


+  emit_note_after ((enum insn_note)NOTE_KIND (insn), BB_END (body));
 space after 'insn_note)'

@@ -173,14 +176,14 @@ doloop_condition_get (rtx_insn *doloop_pat)
   if (! REG_P (reg))
 return 0;
 -  /* Check if something = (plus (reg) (const_int -1)).
+  /* Check if something = (plus (reg) (const_int -n)).
  On IA-64, this decrement is wrapped in an if_then_else.  */
   inc_src = SET_SRC (inc);
   if (GET_CODE (inc_src) == IF_THEN_ELSE)
 inc_src = XEXP (inc_src, 1);
   if (GET_CODE (inc_src) != PLUS
   || XEXP (inc_src, 0) != reg
-  || XEXP (inc_src, 1) != constm1_rtx)
+  || !CONST_INT_P (XEXP (inc_src, 1)))

Do we ever check that inc_src is negative? We used to check if it was 
-1, now we only check it's a constnat, but not a negative one, so I 
suspect this needs a:

|| INTVAL (XEXP (inc_src, 1)) >= 0

@@ -492,7 +519,8 @@ doloop_modify (class loop *loop, class niter_desc *desc,
 case GE:
   /* Currently only GE tests against zero are supported.  */
   gcc_assert (XEXP (condition, 1) == const0_rtx);
-
+  /* FALLTHRU */
+case GTU:
   noloop = constm1_rtx;

I spent a very long time staring at this trying to understand why noloop 
= constm1_rtx for GTU, where I thought it should've been (count & 
(n-1)). For the current use of doloop it doesn't matter because ARM is 
the only target using it and you set desc->noloop_assumptions to 
null_rtx in 'arm_attempt_dlstp_transform' so noloop is never used. 
However, if a different target accepts this GTU pattern then this target 
agnostic code will do the wrong thing.  I suggest we either:
 - set noloop to what we think might be the correct value, which if you 
ask me should be 'count & (XEXP (condition, 1))',
 - or add a gcc_assert (GET_CODE (condition) != GTU); under the if 
(desc->noloop_assumption); part and document why.  I have a slight 
preference for the assert given otherwise we are adding code that we 
can't test.


LGTM otherwise (but I don't have the power to approve this ;)).

Kind regards,
Andre

From: Stamatis Markianos-Wright 
Sent: Thursday, November 16, 2023 11:36 AM
To: Stamatis Markianos-Wright via Gcc-patches; Richard Earnshaw; Richard 
Sandiford; Kyrylo Tkachov
Subject: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low 
Overhead Loops


Pinging back to the top of reviewers' inboxes due to worry about Stage 1
End in a few days :)


See the last email for the latest version of the 2/2 patch. The 1/2
patch is A-Ok from Kyrill's earlier target-backend review.


On 10/11/2023 12:41, Stamatis Markianos-Wright wrote:


On 06/11/2023 17:29, Stamatis Markianos-Wright wrote:


On 06/11/2023 11:24, Richard Sandiford wrote:

Stamatis Markianos-Wright  writes:

One of the main reasons for reading the arm bits was to try to answer
the question: if we switch to a downcounting loop with a GE
condition,
how do we make sure that the start value is not a large unsigned
number that is interpreted as negative by GE?  E.g. if the loop
originally counted up in steps of N and used an LTU condition,
it could stop at a value in the range [INT_MAX + 1, UINT_MAX].
But the loop might never iterate if we start counting down from
most values in that range.

Does the patch handle that?

So AFAICT this is actually handled in the generic code in
`doloop_valid_p`:

This kind of loops fail because of they are "desc->infinite", then no
loop-doloop conversion is attempted at all (even for standard
dls/le loops)

Thanks to that check I haven't been able to trigger anything like the
behaviour you describe, do you think the doloop_valid_p checks are
robust enough?

The loops I was thinking of are provably not infinite though. E.g.:

   for (unsigned int i = 0; i < UINT_MAX - 100; ++i)
 ...

is known to terminate.  And doloop conversion is safe with the normal
count-down-by-1 approach, so I don't think current code would need
to reject it.  I.e. a conversion to:

   unsigned int i = UINT_MAX - 101;
   do
 ...
   while (--i != ~0U);

would be safe, but a conversion to:

   int i = UINT_MAX - 101;
   do
 ...
   while ((i -= step, i > 0));

wouldn't, becau

RE: [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation

2023-11-27 Thread Kyrylo Tkachov
Hi Tamar,

> -Original Message-
> From: Tamar Christina 
> Sent: Monday, November 6, 2023 7:43 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation
> 
> Hi All,
> 
> This adds an implementation for conditional branch optab for AArch32.
> 
> For e.g.
> 
> void f1 ()
> {
>   for (int i = 0; i < N; i++)
> {
>   b[i] += a[i];
>   if (a[i] > 0)
>   break;
> }
> }
> 
> For 128-bit vectors we generate:
> 
> vcgt.s32q8, q9, #0
> vpmax.u32   d7, d16, d17
> vpmax.u32   d7, d7, d7
> vmovr3, s14 @ int
> cmp r3, #0
> 
> and of 64-bit vector we can omit one vpmax as we still need to compress to
> 32-bits.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master?
> 

This is okay once the prerequisites go in.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/neon.md (cbranch4): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (vect_early_break): Add AArch32.
>   * gcc.target/arm/vect-early-break-cbranch.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index
> d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc
> 344d2243dcb63 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -408,6 +408,45 @@ (define_insn "vec_extract"
>[(set_attr "type" "neon_store1_one_lane,neon_to_gp")]
>  )
> 
> +;; Patterns comparing two vectors and conditionally jump.
> +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
> +;; operation.  To not pay the penalty for inverting == we can map our any
> +;; comparisons to all i.e. any(~x) => all(x).
> +;;
> +;; However unlike the AArch64 version, we can't optimize this further as the
> +;; chain is too long for combine due to these being unspecs so it doesn't 
> fold
> +;; the operation to something simpler.
> +(define_expand "cbranch4"
> +  [(set (pc) (if_then_else
> +   (match_operator 0 "expandable_comparison_operator"
> +[(match_operand:VDQI 1 "register_operand")
> + (match_operand:VDQI 2 "zero_operand")])
> +   (label_ref (match_operand 3 "" ""))
> +   (pc)))]
> +  "TARGET_NEON"
> +{
> +  rtx mask = operands[1];
> +
> +  /* For 128-bit vectors we need an additional reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (mode)))
> +{
> +  /* Always reduce using a V4SI.  */
> +  mask = gen_reg_rtx (V2SImode);
> +  rtx low = gen_reg_rtx (V2SImode);
> +  rtx high = gen_reg_rtx (V2SImode);
> +  emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
> +  emit_insn (gen_neon_vget_highv4si (high, operands[1]));
> +  emit_insn (gen_neon_vpumaxv2si (mask, low, high));
> +}
> +
> +  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
> +
> +  rtx val = gen_reg_rtx (SImode);
> +  emit_move_insn (val, gen_lowpart (SImode, mask));
> +  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, 
> operands[3]));
> +  DONE;
> +})
> +
>  ;; This pattern is renamed from "vec_extract" to
>  ;; "neon_vec_extract" and this pattern is called
>  ;; by define_expand in vec-common.md file.
> diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> new file mode 100644
> index
> ..2c05aa10d26ed4ac9785672e
> 6e3b4355cef046dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> @@ -0,0 +1,136 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target arm32 } */
> +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +/* f1:
> +**   ...
> +**   vcgt.s32q[0-9]+, q[0-9]+, #0
> +**   vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
> +**   vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
> +**   vmovr[0-9]+, s[0-9]+@ int
> +**   cmp r[0-9]+, #0
> +**   bne \.L[0-9]+
> +**   ...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] > 0)
> + break;
> +}
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   vcge.s32q[0-9]+, q[0-9]+, #0
> +**   vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
> +**   vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
> +**   vmovr[0-9]+, s[0-9]+@ int
> +**   cmp r[0-9]+, #0
> +**   bne \.L[0-9]+
> +**   ...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] >= 0)
> + break;
> +}
> +}
> +
> +/*
> +** f3:
> +**   ...
> +**   vceq.i32q[0-9]+, q[0-9]+, #0
> +**   vpmax.u32   d[

[PATCH] libsanitizer: Check assembler support for symbol assignment [PR112563]

2023-11-27 Thread Rainer Orth
The recent libsanitizer import broke the build on Solaris/SPARC with the
native as:

/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
"__sanitizer_internal_memset" is used but not defined
/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
"__sanitizer_internal_memcpy" is used but not defined
/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
"__sanitizer_internal_memmove" is used but not defined

Since none of the alternatives considered in the PR worked out, this
patch checks if the assembler does support symbol assignment, disabling
the code otherwise.  This returns the code to the way it was up to LLVM 16.

Bootstrapped without regressions on sparc-sun-solaris2.11 (as and gas),
i386-pc-solaris2.11, x86_64-pc-linux-gnu, and x86_64-apple-darwin21.6.0.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2023-11-23  Rainer Orth  

libsanitizer:
PR libsanitizer/112563
* configure.ac (libsanitizer_cv_as_sym_assign): Check for
assembler symbol assignment support.
* configure, config.h.in: Regenerate.
* sanitizer_common/sanitizer_redefine_builtins.h: Include config.h.
Check HAVE_AS_SYM_ASSIGN.

# HG changeset patch
# Parent  1f757467f1bed35373c55b65cde4f9b0506172f5
libsanitizer: Require assembler support for sanitizer_redefine_builtins.h [PR112563]

diff --git a/libsanitizer/configure.ac b/libsanitizer/configure.ac
--- a/libsanitizer/configure.ac
+++ b/libsanitizer/configure.ac
@@ -214,6 +214,19 @@ if test "$libsanitizer_cv_sys_atomic" = 
 	[Define to 1 if you have the __atomic functions])
 fi
 
+# Check if assembler supports symbol assignment.
+AC_CACHE_CHECK([assembler symbol assignment],
+[libsanitizer_cv_as_sym_assign],
+[AC_COMPILE_IFELSE(
+  [AC_LANG_PROGRAM([],
+		   [asm("a = b");])],
+  [libsanitizer_cv_as_sym_assign=yes],
+  [libsanitizer_cv_as_sym_assign=no])])
+if test "$libsanitizer_cv_as_sym_assign" = "yes"; then
+  AC_DEFINE([HAVE_AS_SYM_ASSIGN], 1,
+  	[Define to 1 if assembler supports symbol assignment])
+fi
+
 # The library needs to be able to read the executable itself.  Compile
 # a file to determine the executable format.  The awk script
 # filetype.awk prints out the file type.
diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -12,6 +12,7 @@
 #ifndef SANITIZER_DEFS_H
 #define SANITIZER_DEFS_H
 
+#include "config.h"
 #include "sanitizer_platform.h"
 #include "sanitizer_redefine_builtins.h"
 
diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
--- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
+++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
@@ -15,7 +15,7 @@
 #define SANITIZER_REDEFINE_BUILTINS_H
 
 // The asm hack only works with GCC and Clang.
-#if !defined(_WIN32)
+#if !defined(_WIN32) && defined(HAVE_AS_SYM_ASSIGN)
 
 asm("memcpy = __sanitizer_internal_memcpy");
 asm("memmove = __sanitizer_internal_memmove");
@@ -50,7 +50,7 @@ using vector = Define_SANITIZER_COMMON_N
 }  // namespace std
 
 #  endif  // __cpluplus
-#endif// !_WIN32
+#endif// !_WIN32 && HAVE_AS_SYM_ASSIGN
 
 #  endif  // SANITIZER_REDEFINE_BUILTINS_H
 #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS


Re: [PATCH] libsanitizer: Check assembler support for symbol assignment [PR112563]

2023-11-27 Thread Jakub Jelinek
On Mon, Nov 27, 2023 at 01:56:46PM +0100, Rainer Orth wrote:
> The recent libsanitizer import broke the build on Solaris/SPARC with the
> native as:
> 
> /usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
> "__sanitizer_internal_memset" is used but not defined
> /usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
> "__sanitizer_internal_memcpy" is used but not defined
> /usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol 
> "__sanitizer_internal_memmove" is used but not defined
> 
> Since none of the alternatives considered in the PR worked out, this
> patch checks if the assembler does support symbol assignment, disabling
> the code otherwise.  This returns the code to the way it was up to LLVM 16.
> 
> Bootstrapped without regressions on sparc-sun-solaris2.11 (as and gas),
> i386-pc-solaris2.11, x86_64-pc-linux-gnu, and x86_64-apple-darwin21.6.0.
> 
> Ok for trunk?
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2023-11-23  Rainer Orth  
> 
>   libsanitizer:
>   PR libsanitizer/112563
>   * configure.ac (libsanitizer_cv_as_sym_assign): Check for
>   assembler symbol assignment support.
>   * configure, config.h.in: Regenerate.
>   * sanitizer_common/sanitizer_redefine_builtins.h: Include config.h.
>   Check HAVE_AS_SYM_ASSIGN.

Can you please
1) split it into 2 patches, one touching config* which is owned by GCC (and
   Makefiles, see later), one just 
sanitizer_common/sanitizer_redefine_builtins.h
2) avoid using config.h in, instead use AC_SUBST and add @HAVE_AS_SYM_ASSIGN@
   to Makefile.am's DEFS where needed (either expanding to nothing or
   -DHAVE_AS_SYM_ASSIGN=1)?  The reason is to minimize changes to imported
   sources

Once the sanitizer_common/sanitizer_redefine_builtins.h change (just
the && defined(HAVE_AS_SYM_ASSIGN) addition) patch is committed and pushed
upstream, add its commit has LOCAL_PATCHES.

Thanks.

Note, your ChangeLog entry was pretending config.h include has been added
to one header, but it went to a different one instead.

> # HG changeset patch
> # Parent  1f757467f1bed35373c55b65cde4f9b0506172f5
> libsanitizer: Require assembler support for sanitizer_redefine_builtins.h 
> [PR112563]
> 
> diff --git a/libsanitizer/configure.ac b/libsanitizer/configure.ac
> --- a/libsanitizer/configure.ac
> +++ b/libsanitizer/configure.ac
> @@ -214,6 +214,19 @@ if test "$libsanitizer_cv_sys_atomic" = 
>   [Define to 1 if you have the __atomic functions])
>  fi
>  
> +# Check if assembler supports symbol assignment.
> +AC_CACHE_CHECK([assembler symbol assignment],
> +[libsanitizer_cv_as_sym_assign],
> +[AC_COMPILE_IFELSE(
> +  [AC_LANG_PROGRAM([],
> +[asm("a = b");])],
> +  [libsanitizer_cv_as_sym_assign=yes],
> +  [libsanitizer_cv_as_sym_assign=no])])
> +if test "$libsanitizer_cv_as_sym_assign" = "yes"; then
> +  AC_DEFINE([HAVE_AS_SYM_ASSIGN], 1,
> + [Define to 1 if assembler supports symbol assignment])
> +fi
> +
>  # The library needs to be able to read the executable itself.  Compile
>  # a file to determine the executable format.  The awk script
>  # filetype.awk prints out the file type.
> diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h 
> b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> --- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> @@ -12,6 +12,7 @@
>  #ifndef SANITIZER_DEFS_H
>  #define SANITIZER_DEFS_H
>  
> +#include "config.h"
>  #include "sanitizer_platform.h"
>  #include "sanitizer_redefine_builtins.h"
>  
> diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h 
> b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> @@ -15,7 +15,7 @@
>  #define SANITIZER_REDEFINE_BUILTINS_H
>  
>  // The asm hack only works with GCC and Clang.
> -#if !defined(_WIN32)
> +#if !defined(_WIN32) && defined(HAVE_AS_SYM_ASSIGN)
>  
>  asm("memcpy = __sanitizer_internal_memcpy");
>  asm("memmove = __sanitizer_internal_memmove");
> @@ -50,7 +50,7 @@ using vector = Define_SANITIZER_COMMON_N
>  }  // namespace std
>  
>  #  endif  // __cpluplus
> -#endif// !_WIN32
> +#endif// !_WIN32 && HAVE_AS_SYM_ASSIGN
>  
>  #  endif  // SANITIZER_REDEFINE_BUILTINS_H
>  #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS


Jakub



[V4] [C PATCH 1/4] c23: tag compatibility rules for struct and unions

2023-11-27 Thread Martin Uecker


Note that there is an additional change in parser_xref_tag
to address the issue regarding completeness in redefinition
which affects also structs / unions.  The test c23-tag-6.c
was changed accordingly.


c23: tag compatibility rules for struct and unions

Implement redeclaration and compatibility rules for
structures and unions in C23.

gcc/c/:
* c-decl.cc (previous_tag): New function.
(parser_xref_tag): Find earlier definition.
(get_parm_info): Turn off warning for C23.
(start_struct): Allow redefinitons.
(finish_struct): Diagnose conflicts.
* c-tree.h (comptypes_same_p): Add prototype.
* c-typeck.cc (comptypes_same_p): New function
(comptypes_internal): Activate comparison of tagged types.
(convert_for_assignment): Ignore qualifiers.
(digest_init): Add error.
(initialized_elementwise_p): Allow compatible types.

gcc/testsuite/:
* gcc.dg/c23-enum-7.c: Remove warning.
* gcc.dg/c23-tag-1.c: New test.
* gcc.dg/c23-tag-2.c: New deactivated test.
* gcc.dg/c23-tag-3.c: New test.
* gcc.dg/c23-tag-4.c: New test.
* gcc.dg/c23-tag-5.c: New deactivated test.
* gcc.dg/c23-tag-6.c: New test.
* gcc.dg/c23-tag-7.c: New test.
* gcc.dg/c23-tag-8.c: New test.
* gcc.dg/gnu23-tag-1.c: New test.
* gcc.dg/gnu23-tag-2.c: New test.
* gcc.dg/gnu23-tag-3.c: New test.
* gcc.dg/gnu23-tag-4.c: New test.
---
 gcc/c/c-decl.cc| 72 +++---
 gcc/c/c-tree.h |  1 +
 gcc/c/c-typeck.cc  | 38 +---
 gcc/testsuite/gcc.dg/c23-enum-7.c  |  6 +--
 gcc/testsuite/gcc.dg/c23-tag-1.c   | 67 +++
 gcc/testsuite/gcc.dg/c23-tag-2.c   | 43 ++
 gcc/testsuite/gcc.dg/c23-tag-3.c   | 16 +++
 gcc/testsuite/gcc.dg/c23-tag-4.c   | 26 +++
 gcc/testsuite/gcc.dg/c23-tag-5.c   | 33 ++
 gcc/testsuite/gcc.dg/c23-tag-6.c   | 58 
 gcc/testsuite/gcc.dg/c23-tag-7.c   | 12 +
 gcc/testsuite/gcc.dg/c23-tag-8.c   | 10 +
 gcc/testsuite/gcc.dg/gnu23-tag-1.c | 10 +
 gcc/testsuite/gcc.dg/gnu23-tag-2.c | 19 
 gcc/testsuite/gcc.dg/gnu23-tag-3.c | 28 
 gcc/testsuite/gcc.dg/gnu23-tag-4.c | 31 +
 16 files changed, 454 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-5.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-6.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-7.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-8.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-4.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 64d3a941cb9..ebe1708b977 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2039,6 +2039,28 @@ locate_old_decl (tree decl)
decl, TREE_TYPE (decl));
 }
 
+
+/* Helper function.  For a tagged type, it finds the declaration
+   for a visible tag declared in the the same scope if such a
+   declaration exists.  */
+static tree
+previous_tag (tree type)
+{
+  struct c_binding *b = NULL;
+  tree name = TYPE_NAME (type);
+
+  if (name)
+b = I_TAG_BINDING (name);
+
+  if (b)
+b = b->shadowed;
+
+  if (b && B_IN_CURRENT_SCOPE (b))
+return b->decl;
+
+  return NULL_TREE;
+}
+
 /* Subroutine of duplicate_decls.  Compare NEWDECL to OLDDECL.
Returns true if the caller should proceed to merge the two, false
if OLDDECL should simply be discarded.  As a side effect, issues
@@ -8573,11 +8595,14 @@ get_parm_info (bool ellipsis, tree expr)
  if (TREE_CODE (decl) != UNION_TYPE || b->id != NULL_TREE)
{
  if (b->id)
-   /* The %s will be one of 'struct', 'union', or 'enum'.  */
-   warning_at (b->locus, 0,
-   "%<%s %E%> declared inside parameter list"
-   " will not be visible outside of this definition or"
-   " declaration", keyword, b->id);
+   {
+ /* The %s will be one of 'struct', 'union', or 'enum'.  */
+ if (!flag_isoc23)
+   warning_at (b->locus, 0,
+   "%<%s %E%> declared inside parameter list"
+   " will not be visible outside of this 
definition or"
+   " declaration", keyword, b->id);
+   }
  else
/* The %s will be one of 'struct', 'union', or 'enum'.  */
warning_at (b->locus, 0,
@@ -8668,6 +8693,16 

[V4] [PATCH 2/4] c23: tag compatibility rules for enums

2023-11-27 Thread Martin Uecker


(only tests were changed)


c23: tag compatibility rules for enums

Allow redefinition of enum types and enumerators.  Diagnose
nested redefinitions including redefinitions in the enum
specifier for enum types with fixed underlying type.

gcc/c:
* c-tree.h (c_parser_enum_specifier): Add parameter.
* c-decl.cc (start_enum): Allow redefinition.
(finish_enum): Diagnose conflicts.
(build_enumerator): Set context.
(diagnose_mismatched_decls): Diagnose conflicting enumerators.
(push_decl): Preserve context for enumerators.
* c-parser.cc (c_parser_enum_specifier): Remember when
seen is from an enum type which is not yet defined.

gcc/testsuide/:
* gcc.dg/c23-tag-enum-1.c: New test.
* gcc.dg/c23-tag-enum-2.c: New test.
* gcc.dg/c23-tag-enum-3.c: New test.
* gcc.dg/c23-tag-enum-4.c: New test.
* gcc.dg/c23-tag-enum-5.c: New test.
* gcc.dg/gnu23-tag-enum-1.c: Mew test.
---
 gcc/c/c-decl.cc | 65 +
 gcc/c/c-parser.cc   |  5 +-
 gcc/c/c-tree.h  |  3 +-
 gcc/c/c-typeck.cc   |  5 +-
 gcc/testsuite/gcc.dg/c23-tag-enum-1.c   | 56 +
 gcc/testsuite/gcc.dg/c23-tag-enum-2.c   | 17 +++
 gcc/testsuite/gcc.dg/c23-tag-enum-3.c   |  7 +++
 gcc/testsuite/gcc.dg/c23-tag-enum-4.c   | 22 +
 gcc/testsuite/gcc.dg/c23-tag-enum-5.c   | 18 +++
 gcc/testsuite/gcc.dg/gnu23-tag-enum-1.c | 19 
 10 files changed, 205 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-enum-1.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index ebe1708b977..bcc09ba479e 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2114,9 +2114,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
  given scope.  */
   if (TREE_CODE (olddecl) == CONST_DECL)
 {
-  auto_diagnostic_group d;
-  error ("redeclaration of enumerator %q+D", newdecl);
-  locate_old_decl (olddecl);
+  if (flag_isoc23
+ && TYPE_NAME (DECL_CONTEXT (newdecl))
+ && DECL_CONTEXT (newdecl) != DECL_CONTEXT (olddecl)
+ && TYPE_NAME (DECL_CONTEXT (newdecl)) == TYPE_NAME (DECL_CONTEXT 
(olddecl)))
+   {
+ if (!simple_cst_equal (DECL_INITIAL (olddecl), DECL_INITIAL 
(newdecl)))
+   {
+ auto_diagnostic_group d;
+ error ("conflicting redeclaration of enumerator %q+D", newdecl);
+ locate_old_decl (olddecl);
+   }
+   }
+  else
+   {
+ auto_diagnostic_group d;
+ error ("redeclaration of enumerator %q+D", newdecl);
+ locate_old_decl (olddecl);
+   }
   return false;
 }
 
@@ -3277,8 +3292,11 @@ pushdecl (tree x)
 
   /* Must set DECL_CONTEXT for everything not at file scope or
  DECL_FILE_SCOPE_P won't work.  Local externs don't count
- unless they have initializers (which generate code).  */
+ unless they have initializers (which generate code).  We
+ also exclude CONST_DECLs because enumerators will get the
+ type of the enum as context.  */
   if (current_function_decl
+  && TREE_CODE (x) != CONST_DECL
   && (!VAR_OR_FUNCTION_DECL_P (x)
  || DECL_INITIAL (x) || !TREE_PUBLIC (x)))
 DECL_CONTEXT (x) = current_function_decl;
@@ -9747,7 +9765,7 @@ layout_array_type (tree t)
 
 tree
 start_enum (location_t loc, struct c_enum_contents *the_enum, tree name,
-   tree fixed_underlying_type)
+   tree fixed_underlying_type, bool potential_nesting_p)
 {
   tree enumtype = NULL_TREE;
   location_t enumloc = UNKNOWN_LOCATION;
@@ -9759,9 +9777,26 @@ start_enum (location_t loc, struct c_enum_contents 
*the_enum, tree name,
   if (name != NULL_TREE)
 enumtype = lookup_tag (ENUMERAL_TYPE, name, true, &enumloc);
 
+  if (enumtype != NULL_TREE && TREE_CODE (enumtype) == ENUMERAL_TYPE)
+{
+  /* If the type is currently being defined or if we have seen an
+incomplete version which is now complete, this is a nested
+redefinition.  The later happens if the redefinition occurs
+inside the enum specifier itself.  */
+  if (C_TYPE_BEING_DEFINED (enumtype)
+ || (potential_nesting_p && TYPE_VALUES (enumtype) != NULL_TREE))
+   error_at (loc, "nested redefinition of %", name);
+
+  /* For C23 we allow redefinitions.  We set to zero and check for
+consistency later.  */
+  if (flag_isoc23 && TYPE_VALUES (enumtype) != NULL_TREE)
+   enumtype = NULL_TREE;
+}
+
   if (enumtype == NULL_TREE || TREE_CODE (enumtype) != ENUMERAL_TYPE)
 {
   enumtype = make_node (ENUME

[V4] [PATCH 4/4] c23: construct composite type for tagged types

2023-11-27 Thread Martin Uecker


(this patch was still not updated and needs more work, so
only included now for completeness) 


c23: construct composite type for tagged types

Support for constructing composite type for structs and unions
in C23.

gcc/c:
* c-typeck.cc (composite_type_internal): Adapted from
composite_type to support structs and unions.
(composite_type): New wrapper function.
(build_conditional_operator): Return composite type.

gcc/testsuite:
* gcc.dg/c23-tag-composite-1.c: New test.
* gcc.dg/c23-tag-composite-2.c: New test.
* gcc.dg/c23-tag-composite-3.c: New test.
* gcc.dg/c23-tag-composite-4.c: New test.
---
 gcc/c/c-typeck.cc  | 114 +
 gcc/testsuite/gcc.dg/c23-tag-composite-1.c |  26 +
 gcc/testsuite/gcc.dg/c23-tag-composite-2.c |  16 +++
 gcc/testsuite/gcc.dg/c23-tag-composite-3.c |  17 +++
 gcc/testsuite/gcc.dg/c23-tag-composite-4.c |  21 
 5 files changed, 176 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-4.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 00eb65dbcce..7901368c9fd 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -381,8 +381,15 @@ build_functype_attribute_variant (tree ntype, tree otype, 
tree attrs)
nonzero; if that isn't so, this may crash.  In particular, we
assume that qualifiers match.  */
 
+struct composite_cache {
+  tree t1;
+  tree t2;
+  tree composite;
+  struct composite_cache* next;
+};
+
 tree
-composite_type (tree t1, tree t2)
+composite_type_internal (tree t1, tree t2, struct composite_cache* cache)
 {
   enum tree_code code1;
   enum tree_code code2;
@@ -427,7 +434,8 @@ composite_type (tree t1, tree t2)
   {
tree pointed_to_1 = TREE_TYPE (t1);
tree pointed_to_2 = TREE_TYPE (t2);
-   tree target = composite_type (pointed_to_1, pointed_to_2);
+   tree target = composite_type_internal (pointed_to_1,
+  pointed_to_2, cache);
 t1 = build_pointer_type_for_mode (target, TYPE_MODE (t1), false);
t1 = build_type_attribute_variant (t1, attributes);
return qualify_type (t1, t2);
@@ -435,7 +443,8 @@ composite_type (tree t1, tree t2)
 
 case ARRAY_TYPE:
   {
-   tree elt = composite_type (TREE_TYPE (t1), TREE_TYPE (t2));
+   tree elt = composite_type_internal (TREE_TYPE (t1), TREE_TYPE (t2),
+   cache);
int quals;
tree unqual_elt;
tree d1 = TYPE_DOMAIN (t1);
@@ -503,9 +512,61 @@ composite_type (tree t1, tree t2)
return build_type_attribute_variant (t1, attributes);
   }
 
-case ENUMERAL_TYPE:
 case RECORD_TYPE:
 case UNION_TYPE:
+  if (flag_isoc23 && !comptypes_same_p (t1, t2))
+   {
+ gcc_checking_assert (COMPLETE_TYPE_P (t1) && COMPLETE_TYPE_P (t2));
+ gcc_checking_assert (comptypes (t1, t2));
+
+ /* If a composite type for these two types is already under
+construction, return it.  */
+
+ for (struct composite_cache *c = cache; c != NULL; c = c->next)
+   if (c->t1 == t1 && c->t2 == t2)
+  return c->composite;
+
+ /* Otherwise, create a new type node and link it into the cache.  */
+
+ tree n = make_node (code1);
+ struct composite_cache cache2 = { t1, t2, n, cache };
+ cache = &cache2;
+
+ tree f1 = TYPE_FIELDS (t1);
+ tree f2 = TYPE_FIELDS (t2);
+ tree fields = NULL_TREE;
+
+ for (tree a = f1, b = f2; a && b;
+  a = DECL_CHAIN (a), b = DECL_CHAIN (b))
+   {
+ tree ta = TREE_TYPE (a);
+ tree tb = TREE_TYPE (b);
+
+ gcc_assert (DECL_NAME (a) == DECL_NAME (b));
+ gcc_assert (comptypes (ta, tb));
+
+ tree f = build_decl (input_location, FIELD_DECL, DECL_NAME (a),
+  composite_type_internal (ta, tb, cache));
+
+ DECL_FIELD_CONTEXT (f) = n;
+ DECL_CHAIN (f) = fields;
+ fields = f;
+   }
+
+ TYPE_NAME (n) = TYPE_NAME (t1);
+ TYPE_FIELDS (n) = nreverse (fields);
+ TYPE_ATTRIBUTES (n) = attributes;
+ layout_type (n);
+ n = build_type_attribute_variant (n, attributes);
+ n = qualify_type (n, t1);
+
+ gcc_checking_assert (comptypes (n, t1));
+ gcc_checking_assert (comptypes (n, t2));
+
+ return n;
+   }
+  /* FALLTHRU */
+case ENUMERAL_TYPE:
   if (attributes != NULL)
{
  /* Try harder not to create a new aggregate type.  */
@@ -520,7 +581,8 @@ composite_type (tree t1, tree t2)
   /* Function types: prefer the one that specified arg types.

[V4] [PATCH 3/4] c23: aliasing of compatible tagged types

2023-11-27 Thread Martin Uecker


(this mostly got an extended description and more
comments, also tests were reorganized)



c23: aliasing of compatible tagged types

Tell the backend which types are equivalent by setting
TYPE_CANONICAL to one struct in the set of equivalent
structs.  Structs are considered equivalent by ignoring
all sizes of arrays nested in types below field level.

The following two structs are incompatible and lvalues
with these types can be assumed not to alias:

 struct foo { int a[3]; };
 struct foo { int a[4]; };

The following two structs are also incompatible, but
will get the same TYPE_CANONICAL and it is then not
exploited that lvalues with those types can not alias:

 struct bar { int (*p)[3]; };
 struct bar { int (*p)[4]; };

The reason is that both are compatible to

 struct bar { int (*p)[]; };

and therefore are in the same equivalence class.  For
the same reason all enums with the same underyling type
are in the same equivalence class.  Tests are added
for the expected aliasing behavior with optimization.

gcc/c:
* c-decl.cc (c_struct_hasher): Hash stable for struct
types.
(c_struct_hasher::hash, c_struct_hasher::equal): New
functions.
(finish_struct): Set TYPE_CANONICAL to first struct in
equivalence class.
* c-objc-common.cc (c_get_alias_set): Let structs or
unions with variable size alias anything.
* c-tree.h (comptypes_equiv): New prototype.
* c-typeck.cc (comptypes_equiv): New function.
(comptypes_internal): Implement equivalence mode.
(tagged_types_tu_compatible): Implement equivalence mode.

gcc/testsuite:
* gcc.dg/c23-tag-2.c: Activate.
* gcc.dg/c23-tag-6.c: Activate.
* gcc.dg/c23-tag-alias-1.c: New test.
* gcc.dg/c23-tag-alias-2.c: New test.
* gcc.dg/gnu23-tag-alias-1.c: New test.
* gcc.dg/gnu23-tag-alias-2.c: New test.
* gcc.dg/gnu23-tag-alias-3.c: New test.
* gcc.dg/gnu23-tag-alias-4.c: New test.
* gcc.dg/gnu23-tag-alias-5.c: New test.
* gcc.dg/gnu23-tag-alias-6.c: New test.
* gcc.dg/gnu23-tag-alias-7.c: New test.
---
 gcc/c/c-decl.cc  |  51 ++-
 gcc/c/c-objc-common.cc   |   5 ++
 gcc/c/c-tree.h   |   1 +
 gcc/c/c-typeck.cc|  31 +++
 gcc/testsuite/gcc.dg/c23-tag-2.c |   2 +-
 gcc/testsuite/gcc.dg/c23-tag-5.c |   2 +-
 gcc/testsuite/gcc.dg/c23-tag-alias-1.c   |  49 +++
 gcc/testsuite/gcc.dg/c23-tag-alias-2.c   |  50 +++
 gcc/testsuite/gcc.dg/c23-tag-alias-3.c   |  32 +++
 gcc/testsuite/gcc.dg/c23-tag-alias-4.c   |  54 
 gcc/testsuite/gcc.dg/gnu23-tag-alias-1.c |  33 +++
 gcc/testsuite/gcc.dg/gnu23-tag-alias-2.c |  85 ++
 gcc/testsuite/gcc.dg/gnu23-tag-alias-3.c |  83 ++
 gcc/testsuite/gcc.dg/gnu23-tag-alias-4.c |  36 
 gcc/testsuite/gcc.dg/gnu23-tag-alias-5.c | 107 +++
 gcc/testsuite/gcc.dg/gnu23-tag-alias-6.c |  60 +
 gcc/testsuite/gcc.dg/gnu23-tag-alias-7.c |  93 
 17 files changed, 771 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-6.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-7.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index bcc09ba479e..68cba131704 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -634,6 +634,36 @@ public:
   auto_vec typedefs_seen;
 };
 
+
+/* Hash table for structs and unions.  */
+struct c_struct_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (tree t);
+  static bool equal (tree, tree);
+};
+
+/* Hash an RECORD OR UNION.  */
+hashval_t
+c_struct_hasher::hash (tree type)
+{
+  inchash::hash hstate;
+
+  hstate.add_int (TREE_CODE (type));
+  hstate.add_object (TYPE_NAME (type));
+
+  return hstate.end ();
+}
+
+/* Compare two RECORD or UNION types.  */
+bool
+c_struct_hasher::equal (tree t1,  tree t2)
+{
+  return comptypes_equiv_p (t1, t2);
+}
+
+/* All tagged typed so that TYPE_CANONICAL can be set correctly.  */
+static GTY (()) hash_table *c_struct_htab;
+
 /* Information for the struct or union currently being parsed, or
NULL if not parsing a struct or union.  */
 static class c_struct_parse_info *struct_parse_info;
@@ -8713,7 +8743,8 @@ parser_xref_tag (location_t loc, enum tree_code code, 
tree name,
   ref = lookup_tag (code, name, has_enu

Re: [PATCH] libsanitizer: Check assembler support for symbol assignment [PR112563]

2023-11-27 Thread Rainer Orth
Hi Jakub,

>> 2023-11-23  Rainer Orth  
>> 
>>  libsanitizer:
>>  PR libsanitizer/112563
>>  * configure.ac (libsanitizer_cv_as_sym_assign): Check for
>>  assembler symbol assignment support.
>>  * configure, config.h.in: Regenerate.
>>  * sanitizer_common/sanitizer_redefine_builtins.h: Include config.h.
>>  Check HAVE_AS_SYM_ASSIGN.
>
> Can you please
> 1) split it into 2 patches, one touching config* which is owned by GCC (and
>Makefiles, see later), one just 
> sanitizer_common/sanitizer_redefine_builtins.h
> 2) avoid using config.h in, instead use AC_SUBST and add @HAVE_AS_SYM_ASSIGN@
>to Makefile.am's DEFS where needed (either expanding to nothing or
>-DHAVE_AS_SYM_ASSIGN=1)?  The reason is to minimize changes to imported
>sources
>
> Once the sanitizer_common/sanitizer_redefine_builtins.h change (just
> the && defined(HAVE_AS_SYM_ASSIGN) addition) patch is committed and pushed
> upstream, add its commit has LOCAL_PATCHES.

But will they accept a patch to check a macro never set anywhere in and
irrelevant to LLVM?  That's why I kept all in one patch, to be GCC-local.

If we go (or at least try) this upstream route, should I wait for
approval there and than commit both parts to GCC, keeping it in my local
tree until then?

> Note, your ChangeLog entry was pretending config.h include has been added
> to one header, but it went to a different one instead.

Drats, that's what you get for starting one way and adjusting later ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] RISC-V: Fix VSETVL PASS regression

2023-11-27 Thread Juzhe-Zhong
This patch is regression fix patch, not an optimization patch.
Since trunk GCC generates redundant vsetvl than GCC-13.

This is the case:

bb 2:
  def a2 (vsetvl a2, zero)
bb 3:
  use a2
bb 4:
  use a2 (vle)

before this patch:

bb 2:
vsetvl a2 zero
bb 3:
vsetvl zero, zero > should be eliminated.
bb 4:
vle.v

The root cause is we didn't set bb 3 as transparent since the incorrect codes.
bb 3 didn't modify "a2" just use it, the VSETVL status from bb 2 can be 
available to bb 3 and bb 4:

bb 2 -> bb 3 -> bb4.

Another regression fix is anticipation calculation:

bb 4:
use a5 (sub)
use a5 (vle)

The vle VSETVL status should be considered as anticipated as long as both sub 
and vle a5 def are coming from same def.

Tested on zvl128b no regression.

I am going to test on zvl256/zvl512/zvl1024

PR target/112713

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pre_vsetvl::compute_lcm_local_properties): Fix regression.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112713-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112713-2.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 29 
 .../gcc.target/riscv/rvv/vsetvl/pr112713-1.c  | 24 ++
 .../gcc.target/riscv/rvv/vsetvl/pr112713-2.c  | 47 +++
 3 files changed, 91 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-2.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 74367ec8d8e..b3e07d4c3aa 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1433,9 +1433,23 @@ private:
 
   inline bool modify_or_use_vl_p (insn_info *i, const vsetvl_info &info)
   {
-return info.has_vl ()
-  && (find_access (i->uses (), REGNO (info.get_vl ()))
-  || find_access (i->defs (), REGNO (info.get_vl (;
+if (info.has_vl ())
+  {
+   if (find_access (i->defs (), REGNO (info.get_vl (
+ return true;
+   if (find_access (i->uses (), REGNO (info.get_vl (
+ {
+   resource_info resource = full_register (REGNO (info.get_vl ()));
+   def_lookup dl1 = crtl->ssa->find_def (resource, i);
+   def_lookup dl2 = crtl->ssa->find_def (resource, info.get_insn ());
+   if (dl1.matching_set () || dl2.matching_set ())
+ return true;
+   /* If their VLs are coming from same def, we still want to fuse
+  their VSETVL demand info to gain better performance.  */
+   return dl1.prev_def (i) != dl2.prev_def (i);
+ }
+  }
+return false;
   }
   inline bool modify_avl_p (insn_info *i, const vsetvl_info &info)
   {
@@ -1702,7 +1716,7 @@ public:
for (insn_info *i = next_insn->prev_nondebug_insn (); i != prev_insn;
 i = i->prev_nondebug_insn ())
  {
-   // no def amd use of vl
+   // no def and use of vl
if (!ignore_vl && modify_or_use_vl_p (i, info))
  return false;
 
@@ -2635,11 +2649,8 @@ pre_vsetvl::compute_lcm_local_properties ()
 
  for (const insn_info *insn : bb->real_nondebug_insns ())
{
- if ((info.has_nonvlmax_reg_avl ()
-  && find_access (insn->defs (), REGNO (info.get_avl (
- || (info.has_vl ()
- && find_access (insn->uses (),
- REGNO (info.get_vl ()
+ if (info.has_nonvlmax_reg_avl ()
+ && find_access (insn->defs (), REGNO (info.get_avl (
{
  bitmap_clear_bit (m_transp[bb_index], i);
  break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-1.c
new file mode 100644
index 000..76402ab6167
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+size_t
+foo (char const *buf, size_t len)
+{
+   size_t sum = 0;
+   size_t vl = __riscv_vsetvlmax_e8m8();
+   size_t step = vl * 4;
+   const char *it = buf, *end = buf + len;
+   for(; it + step <= end; ) {
+   it += vl;
+   vint8m8_t v3 = __riscv_vle8_v_i8m8((void*)it, vl); it += vl;
+   vbool1_t m3 = __riscv_vmsgt_vx_i8m8_b1(v3, -65, vl);
+   sum += __riscv_vcpop_m_b1(m3, vl);
+   }
+   return sum;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112713-2.c 
b/gcc/testsuite/gcc

Re: [PATCH] libsanitizer: Check assembler support for symbol assignment [PR112563]

2023-11-27 Thread Jakub Jelinek
On Mon, Nov 27, 2023 at 02:20:25PM +0100, Rainer Orth wrote:
> Hi Jakub,
> 
> >> 2023-11-23  Rainer Orth  
> >> 
> >>libsanitizer:
> >>PR libsanitizer/112563
> >>* configure.ac (libsanitizer_cv_as_sym_assign): Check for
> >>assembler symbol assignment support.
> >>* configure, config.h.in: Regenerate.
> >>* sanitizer_common/sanitizer_redefine_builtins.h: Include config.h.
> >>Check HAVE_AS_SYM_ASSIGN.
> >
> > Can you please
> > 1) split it into 2 patches, one touching config* which is owned by GCC (and
> >Makefiles, see later), one just 
> > sanitizer_common/sanitizer_redefine_builtins.h
> > 2) avoid using config.h in, instead use AC_SUBST and add 
> > @HAVE_AS_SYM_ASSIGN@
> >to Makefile.am's DEFS where needed (either expanding to nothing or
> >-DHAVE_AS_SYM_ASSIGN=1)?  The reason is to minimize changes to imported
> >sources
> >
> > Once the sanitizer_common/sanitizer_redefine_builtins.h change (just
> > the && defined(HAVE_AS_SYM_ASSIGN) addition) patch is committed and pushed
> > upstream, add its commit has LOCAL_PATCHES.
> 
> But will they accept a patch to check a macro never set anywhere in and
> irrelevant to LLVM?  That's why I kept all in one patch, to be GCC-local.

I meant the patch would be gcc local.
But, for later we need only the changes to the imported files be in one
commit, not others, because merge.sh will not revert the GCC owned changes,
just the imported ones, and so that is what should be reapplied.
And, the preference of not using config.h is because we do it like that
for other stuff already (exactly to minimize amount of local changes).

> If we go (or at least try) this upstream route, should I wait for
> approval there and than commit both parts to GCC, keeping it in my local
> tree until then?
> 
> > Note, your ChangeLog entry was pretending config.h include has been added
> > to one header, but it went to a different one instead.
> 
> Drats, that's what you get for starting one way and adjusting later ;-)

Jakub



Re: [PATCH] libsanitizer: Check assembler support for symbol assignment [PR112563]

2023-11-27 Thread Rainer Orth
Hi Jakub,

>> But will they accept a patch to check a macro never set anywhere in and
>> irrelevant to LLVM?  That's why I kept all in one patch, to be GCC-local.
>
> I meant the patch would be gcc local.
> But, for later we need only the changes to the imported files be in one
> commit, not others, because merge.sh will not revert the GCC owned changes,
> just the imported ones, and so that is what should be reapplied.
> And, the preference of not using config.h is because we do it like that
> for other stuff already (exactly to minimize amount of local changes).

ah, now I get it.  Will rework the patch accordingly.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[committed] amdgcn: Disallow TImode vector permute

2023-11-27 Thread Andrew Stubbs
This fixes an ICE that affects some testsuite compiles that use vector 
extensions, but probably not much real code (certainly not for offloading).


Andrewamdgcn: Disallow TImode vector permute

We don't support it and it doesn't happen without vector extensions, so
just remove the unhandled case.

Fixes gcc.dg/pr78575.c failure.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Disallow TImode.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 52c8a0e409c..22d2b6ebf6d 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5050,7 +5050,9 @@ gcn_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
  rtx dst, rtx src0, rtx src1,
  const vec_perm_indices & sel)
 {
-  if (vmode != op_mode)
+  if (vmode != op_mode
+  || !VECTOR_MODE_P (vmode)
+  || GET_MODE_INNER (vmode) == TImode)
 return false;
 
   unsigned int nelt = GET_MODE_NUNITS (vmode);


[PATCH] tree-optimization/112653 - PTA and return

2023-11-27 Thread Richard Biener
The following separates the escape solution for return stmts not
only during points-to solving but also for later querying.  This
requires adjusting the points-to-global tests to include escapes
through returns.  Technically the patch replaces the existing
post-processing which computes the transitive closure of the
returned value solution by a proper artificial variable with
transitive closure constraints.  Instead of adding the solution
to escaped we track it separately.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

Richard.

PR tree-optimization/112653
* gimple-ssa.h (gimple_df): Add escaped_return solution.
* tree-ssa.cc (init_tree_ssa): Reset it.
(delete_tree_ssa): Likewise.
* tree-ssa-structalias.cc (escaped_return_id): New.
(find_func_aliases): Handle non-IPA return stmts by
adding to ESCAPED_RETURN.
(set_uids_in_ptset): Adjust HEAP escaping to also cover
escapes through return.
(init_base_vars): Initialize ESCAPED_RETURN.
(compute_points_to_sets): Replace ESCAPED post-processing
with recording the ESCAPED_RETURN solution.
* tree-ssa-alias.cc (ref_may_alias_global_p_1): Check
the ESCAPED_RETUNR solution.
(dump_alias_info): Dump it.
* cfgexpand.cc (update_alias_info_with_stack_vars): Update it.
* ipa-icf.cc (sem_item_optimizer::fixup_points_to_sets):
Likewise.
* tree-parloops.cc (expand_call_inline): Reset it.
* tree-sra.cc (maybe_add_sra_candidate): Check it.

* gcc.dg/tree-ssa/pta-return-1.c: New testcase.
---
 gcc/cfgexpand.cc |   3 +-
 gcc/gimple-ssa.h |   4 +-
 gcc/ipa-icf.cc   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pta-return-1.c |  16 +++
 gcc/tree-inline.cc   |   5 +-
 gcc/tree-parloops.cc |   5 +-
 gcc/tree-sra.cc  |   2 +-
 gcc/tree-ssa-alias.cc|   6 +-
 gcc/tree-ssa-structalias.cc  | 124 +++
 gcc/tree-ssa.cc  |   2 +
 10 files changed, 85 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pta-return-1.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index e58327b239b..feed001f3c9 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -863,7 +863,8 @@ update_alias_info_with_stack_vars (void)
 
   add_partitioned_vars_to_ptset (&cfun->gimple_df->escaped,
 decls_to_partitions, &visited, temp);
-
+  add_partitioned_vars_to_ptset (&cfun->gimple_df->escaped_return,
+decls_to_partitions, &visited, temp);
   delete decls_to_partitions;
   BITMAP_FREE (temp);
 }
diff --git a/gcc/gimple-ssa.h b/gcc/gimple-ssa.h
index f2cffa2b159..79637058f70 100644
--- a/gcc/gimple-ssa.h
+++ b/gcc/gimple-ssa.h
@@ -76,8 +76,10 @@ struct GTY(()) gimple_df {
   /* Artificial variable used for the virtual operand FUD chain.  */
   tree vop;
 
-  /* The PTA solution for the ESCAPED artificial variable.  */
+  /* The PTA solution for the ESCAPED and ESCAPED_RETURN artificial
+ variables.  */
   struct pt_solution escaped;
+  struct pt_solution escaped_return;
 
   /* A map of decls to artificial ssa-names that point to the partition
  of the decl.  */
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index bbdfd445397..c72c9d57a80 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -3506,6 +3506,7 @@ sem_item_optimizer::fixup_points_to_sets (void)
&& SSA_NAME_PTR_INFO (name))
  fixup_pt_set (&SSA_NAME_PTR_INFO (name)->pt);
   fixup_pt_set (&fn->gimple_df->escaped);
+  fixup_pt_set (&fn->gimple_df->escaped_return);
 
/* The above gets us to 99% I guess, at least catching the
  address compares.  Below also gets us aliasing correct
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-return-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pta-return-1.c
new file mode 100644
index 000..9c2416e7810
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-return-1.c
@@ -0,0 +1,16 @@
+/* PR112653 */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+char *test;
+char *
+copy_test ()
+{
+  char *test2 = __builtin_malloc (1000);
+  __builtin_memmove (test2, test, 1000);
+  return test2;
+}
+
+/* We should be able to turn the memmove into memcpy by means of alias
+   analysis.  */
+/* { dg-final { scan-tree-dump "memcpy" "optimized" } } */
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 0b14118b94b..59847166842 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -5147,7 +5147,10 @@ expand_call_inline (basic_block bb, gimple *stmt, 
copy_body_data *id,
 
   /* Reset the escaped solution.  */
   if (cfun->gimple_df)
-pt_solution_reset (&cfun->gimple_df->escaped);
+{
+  pt_solution_reset (&cfun->g

bpf: Throw error when external libcalls are generated.

2023-11-27 Thread Cupertino Miranda


User-agent: mu4e 1.4.15; emacs 28.1
Author: Cupertino Miranda 
Hi everyone,

The attached patch is a temporary solution for the lack of proper linker
and external library linking of the eBPF platform.
Any calls created by the compiler, that would usually be defined within
libgcc, will endup being undefined in bpftool, when GCC the compiled
code is passed.

This patch anticipates that error to the compiler, by verifiying if
any of those calls are being generated, and reporting as an error.

Looking forward to your comments.

Cheers,
Cupertino

commit c2110ae497c7ff83c309f172bc265973652b760d
This patch enables errors when external calls are created.

When architectural limitations or usage of builtins implies the compiler
to create function calls to external libraries that implement the
functionality, GCC will now report an error claiming that this function
calls are not compatible with eBPF target.
Examples of those are the usage of __builtin_memmove and a sign division
in BPF ISA v3 or below that will require to call __divdi3.
This is currently an eBPF limitation which does not support linking of
object files but rather "raw" non linked ones. Those object files are
loaded and relocated by libbpf and the kernel.

gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_output_call): Report error in case the
function call is for a builtin.
(bpf_external_libcall): Added target hook to detect and report
error when other external calls that are not builtins.

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 0c9d5257c384..1c84113055b1 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -744,6 +744,15 @@ bpf_output_call (rtx target)
xops[0] = GEN_INT (TREE_INT_CST_LOW (TREE_VALUE (attr_args)));
output_asm_insn ("call\t%0", xops);
  }
+   else if (fndecl_built_in_p (decl))
+ {
+   /* For now lets report this as an error while we are not able to
+  link eBPF object files.  In particular with libgcc.  */
+   tree name = DECL_NAME (decl);
+   error ("call to external builtin %s in function, which is not 
supported by "
+  "eBPF", name != NULL_TREE ? IDENTIFIER_POINTER (name) : 
"(anon)");
+   output_asm_insn ("call 0", NULL);
+ }
else
  output_asm_insn ("call\t%0", &target);

@@ -763,6 +772,18 @@ bpf_output_call (rtx target)
   return "";
 }

+static void
+bpf_external_libcall (rtx fun)
+{
+  tree decl = SYMBOL_REF_DECL (fun);
+  tree name = DECL_NAME (decl);
+  error ("call to external libcall %s in function, which is not supported by "
+"eBPF", name != NULL_TREE ? IDENTIFIER_POINTER (name) : "(anon)");
+}
+
+#undef  TARGET_ASM_EXTERNAL_LIBCALL
+#define TARGET_ASM_EXTERNAL_LIBCALL bpf_external_libcall
+
 /* Print register name according to assembly dialect.  In normal
syntax registers are printed like %rN where N is the register
number.
diff --git a/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c 
b/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
index 4036570ac601..fec720584e48 100644
--- a/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
+++ b/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
@@ -6,7 +6,7 @@ foo (int *p, int *expected, int desired)
 {
   return __atomic_compare_exchange (p, expected, &desired, 0,
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
-}
+} /* { dg-error "call to external builtin" } */

 int
 foo64 (long *p, long *expected, long desired)
diff --git a/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c 
b/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
index 044a2f76474b..ea1b8e48928a 100644
--- a/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
+++ b/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
@@ -9,7 +9,7 @@ long
 test_atomic_fetch_add (long x)
 {
   return __atomic_fetch_add (&val, x, __ATOMIC_ACQUIRE);
-}
+} /* { dg-error "call to external builtin" } */

 long
 test_atomic_fetch_sub (long x)
diff --git a/gcc/testsuite/gcc.target/bpf/atomic-op-3.c 
b/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
index b2ce28926347..fefafd6b748f 100644
--- a/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
+++ b/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
@@ -20,7 +20,7 @@ void
 test_atomic_and (int x)
 {
   __atomic_and_fetch (&val, x, __ATOMIC_ACQUIRE);
-}
+} /* { dg-error "call to external builtin" } */

 void
 test_atomic_nand (int x)
diff --git a/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c 
b/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
index 3b6324e966b8..eab695bf388c 100644
--- a/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
+++ b/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
@@ -7,7 +7,7 @@ int foo (int *p, int *new)
   int old;
   __atomic_exchange (p, new, &old, __ATOMIC_RELAXED);
   return old;
-}
+} /* { dg-error "call to external builtin" } */

 int foo64 (long *p, long *new)
 {
diff --git a/gcc/testsuite/gcc.target/bpf/diag-sdiv.c

[pushed] aarch64: Move and generalise vect_all_same

2023-11-27 Thread Richard Sandiford
The fix for PR106329 needs a way of testing for a ptrue of a particular
element size.  We already had such a function for svlast, so this patch
moves it to common code and generalises it to work with all kinds of
vectors.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64-sve-builtins.h (vector_cst_all_same): Declare.
* config/aarch64/aarch64-sve-builtins.cc (vector_cst_all_same): New
function, a generalized replacement of...
* config/aarch64/aarch64-sve-builtins-base.cc
(svlast_impl::vect_all_same): ...this.
(svlast_impl::fold): Update accordingly.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 17 ++--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 20 +++
 gcc/config/aarch64/aarch64-sve-builtins.h |  2 ++
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 9010ecca6da..a6e527bedd1 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1105,19 +1105,6 @@ public:
   bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
   bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
 
-  bool vect_all_same (tree v, int step) const
-  {
-int i;
-int nelts = vector_cst_encoded_nelts (v);
-tree first_el = VECTOR_CST_ENCODED_ELT (v, 0);
-
-for (i = 0; i < nelts; i += step)
-  if (!operand_equal_p (VECTOR_CST_ENCODED_ELT (v, i), first_el, 0))
-   return false;
-
-return true;
-  }
-
   /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
  BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to
  ensure the index of the element being accessed is in the range of a
@@ -1142,7 +1129,7 @@ public:
   without a linear search of the predicate vector:
   1.  LASTA if predicate is all true, return element 0.
   2.  LASTA if predicate all false, return element 0.  */
-   if (is_lasta () && vect_all_same (pred, step_1))
+   if (is_lasta () && vector_cst_all_same (pred, step_1))
  {
b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
bitsize_int (step * BITS_PER_UNIT), bitsize_int (0));
@@ -1152,7 +1139,7 @@ public:
/* Handle the all-false case for LASTB where SVE VL == 128b -
   return the highest numbered element.  */
if (is_lastb () && known_eq (BYTES_PER_SVE_VECTOR, 16)
-   && vect_all_same (pred, step_1)
+   && vector_cst_all_same (pred, step_1)
&& integer_zerop (VECTOR_CST_ENCODED_ELT (pred, 0)))
  {
b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 161a14edde7..b61156302cf 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2541,6 +2541,26 @@ function_checker::check ()
   return shape->check (*this);
 }
 
+/* Return true if V is a vector constant and if, for every in-range integer I,
+   element STEP*I is equal to element 0.  */
+bool
+vector_cst_all_same (tree v, unsigned int step)
+{
+  if (TREE_CODE (v) != VECTOR_CST)
+return false;
+
+  /* VECTOR_CST_NELTS_PER_PATTERN applies to any multiple of
+ VECTOR_CST_NPATTERNS.  */
+  unsigned int lcm = least_common_multiple (step, VECTOR_CST_NPATTERNS (v));
+  unsigned int nelts = lcm * VECTOR_CST_NELTS_PER_PATTERN (v);
+  tree first_el = VECTOR_CST_ENCODED_ELT (v, 0);
+  for (unsigned int i = 0; i < nelts; i += step)
+if (!operand_equal_p (VECTOR_CST_ENCODED_ELT (v, i), first_el, 0))
+  return false;
+
+  return true;
+}
+
 gimple_folder::gimple_folder (const function_instance &instance, tree fndecl,
  gimple_stmt_iterator *gsi_in, gcall *call_in)
   : function_call_info (gimple_location (call_in), instance, fndecl),
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index a301570b82e..d646df1c026 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -672,6 +672,8 @@ extern tree 
acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
 extern tree acle_svpattern;
 extern tree acle_svprfop;
 
+bool vector_cst_all_same (tree, unsigned int);
+
 /* Return the ACLE type svbool_t.  */
 inline tree
 get_svbool_t (void)
-- 
2.25.1



[pushed] aarch64: Remove redundant zeroing/merging in SVE intrinsics [PR106326]

2023-11-27 Thread Richard Sandiford
Many predicated SVE intrinsics provide three forms of predication:
zeroing, merging, and any/dont-care.  All three are equivalent when
the predicate is all-true, so this patch drops the zeroing and
merging in that case.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
PR target/106326
* config/aarch64/aarch64-sve-builtins.h (is_ptrue): Declare.
* config/aarch64/aarch64-sve-builtins.cc (is_ptrue): New function.
(gimple_folder::redirect_pred_x): Likewise.
(gimple_folder::fold): Use it.

gcc/testsuite/
PR target/106326
* gcc.target/aarch64/sve/acle/general/pr106326_1.c: New test.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc|  46 +++
 gcc/config/aarch64/aarch64-sve-builtins.h |   3 +
 .../aarch64/sve/acle/general/pr106326_1.c | 378 ++
 3 files changed, 427 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index b61156302cf..ee81282a0be 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2561,6 +2561,17 @@ vector_cst_all_same (tree v, unsigned int step)
   return true;
 }
 
+/* Return true if V is a constant predicate that acts as a ptrue when
+   predicating STEP-byte elements.  */
+bool
+is_ptrue (tree v, unsigned int step)
+{
+  return (TREE_CODE (v) == VECTOR_CST
+ && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode
+ && integer_nonzerop (VECTOR_CST_ENCODED_ELT (v, 0))
+ && vector_cst_all_same (v, step));
+}
+
 gimple_folder::gimple_folder (const function_instance &instance, tree fndecl,
  gimple_stmt_iterator *gsi_in, gcall *call_in)
   : function_call_info (gimple_location (call_in), instance, fndecl),
@@ -2635,6 +2646,37 @@ gimple_folder::redirect_call (const function_instance 
&instance)
   return call;
 }
 
+/* Redirect _z and _m calls to _x functions if the predicate is all-true.
+   This allows us to use unpredicated instructions, where available.  */
+gimple *
+gimple_folder::redirect_pred_x ()
+{
+  if (pred != PRED_z && pred != PRED_m)
+return nullptr;
+
+  if (gimple_call_num_args (call) < 2)
+return nullptr;
+
+  tree lhs_type = TREE_TYPE (TREE_TYPE (fndecl));
+  tree arg0_type = type_argument_type (TREE_TYPE (fndecl), 1);
+  tree arg1_type = type_argument_type (TREE_TYPE (fndecl), 2);
+  if (!VECTOR_TYPE_P (lhs_type)
+  || !VECTOR_TYPE_P (arg0_type)
+  || !VECTOR_TYPE_P (arg1_type))
+return nullptr;
+
+  auto lhs_step = element_precision (lhs_type);
+  auto rhs_step = element_precision (arg1_type);
+  auto step = MAX (lhs_step, rhs_step);
+  if (!multiple_p (step, BITS_PER_UNIT)
+  || !is_ptrue (gimple_call_arg (call, 0), step / BITS_PER_UNIT))
+return nullptr;
+
+  function_instance instance (*this);
+  instance.pred = PRED_x;
+  return redirect_call (instance);
+}
+
 /* Fold the call to constant VAL.  */
 gimple *
 gimple_folder::fold_to_cstu (poly_uint64 val)
@@ -2707,6 +2749,10 @@ gimple_folder::fold ()
   if (!lhs && TREE_TYPE (gimple_call_fntype (call)) != void_type_node)
 return NULL;
 
+  /* First try some simplifications that are common to many functions.  */
+  if (auto *call = redirect_pred_x ())
+return call;
+
   return base->fold (*this);
 }
 
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index d646df1c026..b9148c51b28 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -500,6 +500,8 @@ public:
   tree load_store_cookie (tree);
 
   gimple *redirect_call (const function_instance &);
+  gimple *redirect_pred_x ();
+
   gimple *fold_to_cstu (poly_uint64);
   gimple *fold_to_pfalse ();
   gimple *fold_to_ptrue ();
@@ -673,6 +675,7 @@ extern tree acle_svpattern;
 extern tree acle_svprfop;
 
 bool vector_cst_all_same (tree, unsigned int);
+bool is_ptrue (tree, unsigned int);
 
 /* Return the ACLE type svbool_t.  */
 inline tree
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c
new file mode 100644
index 000..34604a8df6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c
@@ -0,0 +1,378 @@
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** add1:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add1 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b8 (), x, y);
+}
+
+/*
+** add2:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add2 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b16 (), x, y);
+}
+
+/*
+** add3:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add3 (svint32_t x

hurd: Add multilib paths for gnu-x86_64

2023-11-27 Thread Thomas Schwinge
Hi!

On 2023-10-28T21:19:59+0200, Samuel Thibault  wrote:
> We need the multilib paths in gcc to find e.g. glibc crt files on
> Debian.

ACK.

> This is essentially based on t-linux64 version.

Yes, but isn't the overall setup diverged from GNU/Linux?

Currently, x86_64 GNU/Hurd first gets 'i386/t-linux64', whose definitons
are only later:

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -5828,6 +5828,9 @@ case ${target} in
>   visium-*-*)
>   target_cpu_default2="TARGET_CPU_$with_cpu"
>   ;;
> + x86_64-*-gnu*)
> + tmake_file="$tmake_file i386/t-gnu64"
> + ;;
>  esac

... then here (effectively) overwritten by 'i386/t-gnu64'.  Instead, I
suppose, we should handle 'i386/t-linux64' and 'i386/t-gnu64' alike, and
resolve relevant configuration differences.

As fas a I can tell, 'i386/t-linux64' is also used for multilib-enabled
('test x$enable_targets = xall') x86 GNU/Linux, and that's not
(correspondingly) done for x86 GNU/Hurd?

However, such things can certainly be resolved incrementally, later on.
I understand that your change does work for you as-is, so I've now pushed
that to master branch in commit 5707e9db9c398d311defc80c5b7822c9a07ead60
"hurd: Add multilib paths for gnu-x86_64", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5707e9db9c398d311defc80c5b7822c9a07ead60 Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sat, 6 May 2023 13:50:36 +0200
Subject: [PATCH] hurd: Add multilib paths for gnu-x86_64

We need the multilib paths in gcc to find e.g. glibc crt files on
Debian.  This is essentially based on t-linux64 version.

gcc/ChangeLog:

	* config/i386/t-gnu64: New file.
	* config.gcc [x86_64-*-gnu*]: Add i386/t-gnu64 to
	tmake_file.
---
 gcc/config.gcc  |  3 +++
 gcc/config/i386/t-gnu64 | 38 ++
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/config/i386/t-gnu64

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3000379cafc..e62849c1230 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5973,6 +5973,9 @@ case ${target} in
 	visium-*-*)
 		target_cpu_default2="TARGET_CPU_$with_cpu"
 		;;
+	x86_64-*-gnu*)
+		tmake_file="$tmake_file i386/t-gnu64"
+		;;
 esac
 
 t=
diff --git a/gcc/config/i386/t-gnu64 b/gcc/config/i386/t-gnu64
new file mode 100644
index 000..23ee6823d65
--- /dev/null
+++ b/gcc/config/i386/t-gnu64
@@ -0,0 +1,38 @@
+# Copyright (C) 2002-2023 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# On Debian, Ubuntu and other derivative distributions, the 32bit libraries
+# are found in /lib32 and /usr/lib32, /lib64 and /usr/lib64 are symlinks to
+# /lib and /usr/lib, while other distributions install libraries into /lib64
+# and /usr/lib64.  The LSB does not enforce the use of /lib64 and /usr/lib64,
+# it doesn't tell anything about the 32bit libraries on those systems.  Set
+# MULTILIB_OSDIRNAMES according to what is found on the target.
+
+# To support i386, x86-64 and x32 libraries, the directory structrue
+# should be:
+#
+# 	/lib has i386 libraries.
+# 	/lib64 has x86-64 libraries.
+# 	/libx32 has x32 libraries.
+#
+comma=,
+MULTILIB_OPTIONS= $(subst $(comma),/,$(TM_MULTILIB_CONFIG))
+MULTILIB_DIRNAMES   = $(patsubst m%, %, $(subst /, ,$(MULTILIB_OPTIONS)))
+MULTILIB_OSDIRNAMES = m64=../lib64$(call if_multiarch,:x86_64-gnu)
+MULTILIB_OSDIRNAMES+= m32=$(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call if_multiarch,:i386-gnu)
+MULTILIB_OSDIRNAMES+= mx32=../libx32$(call if_multiarch,:x86_64-gnux32)
-- 
2.34.1



Re: bpf: Throw error when external libcalls are generated.

2023-11-27 Thread Jose E. Marchesi


Hi Cuper.
OK. Thanks for the patch.

> Hi everyone,
>
> The attached patch is a temporary solution for the lack of proper linker
> and external library linking of the eBPF platform.
> Any calls created by the compiler, that would usually be defined within
> libgcc, will endup being undefined in bpftool, when GCC the compiled
> code is passed.
>
> This patch anticipates that error to the compiler, by verifiying if
> any of those calls are being generated, and reporting as an error.
>
> Looking forward to your comments.
>
> Cheers,
> Cupertino
>
> commit c2110ae497c7ff83c309f172bc265973652b760d
> Author: Cupertino Miranda 
> Date:   Thu Nov 23 22:28:01 2023 +
>
> This patch enables errors when external calls are created.
> 
> When architectural limitations or usage of builtins implies the compiler
> to create function calls to external libraries that implement the
> functionality, GCC will now report an error claiming that this function
> calls are not compatible with eBPF target.
> Examples of those are the usage of __builtin_memmove and a sign division
> in BPF ISA v3 or below that will require to call __divdi3.
> This is currently an eBPF limitation which does not support linking of
> object files but rather "raw" non linked ones. Those object files are
> loaded and relocated by libbpf and the kernel.
> 
> gcc/ChangeLog:
> * config/bpf/bpf.cc (bpf_output_call): Report error in case the
> function call is for a builtin.
> (bpf_external_libcall): Added target hook to detect and report
> error when other external calls that are not builtins.
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 0c9d5257c384..1c84113055b1 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -744,6 +744,15 @@ bpf_output_call (rtx target)
>   xops[0] = GEN_INT (TREE_INT_CST_LOW (TREE_VALUE (attr_args)));
>   output_asm_insn ("call\t%0", xops);
> }
> + else if (fndecl_built_in_p (decl))
> +   {
> + /* For now lets report this as an error while we are not able to
> +link eBPF object files.  In particular with libgcc.  */
> + tree name = DECL_NAME (decl);
> + error ("call to external builtin %s in function, which is not 
> supported by "
> +"eBPF", name != NULL_TREE ? IDENTIFIER_POINTER (name) : 
> "(anon)");
> + output_asm_insn ("call 0", NULL);
> +   }
>   else
> output_asm_insn ("call\t%0", &target);
>  
> @@ -763,6 +772,18 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +static void
> +bpf_external_libcall (rtx fun)
> +{
> +  tree decl = SYMBOL_REF_DECL (fun);
> +  tree name = DECL_NAME (decl);
> +  error ("call to external libcall %s in function, which is not supported by 
> "
> +  "eBPF", name != NULL_TREE ? IDENTIFIER_POINTER (name) : "(anon)");
> +}
> +
> +#undef  TARGET_ASM_EXTERNAL_LIBCALL
> +#define TARGET_ASM_EXTERNAL_LIBCALL bpf_external_libcall
> +
>  /* Print register name according to assembly dialect.  In normal
> syntax registers are printed like %rN where N is the register
> number.
> diff --git a/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c 
> b/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
> index 4036570ac601..fec720584e48 100644
> --- a/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
> +++ b/gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
> @@ -6,7 +6,7 @@ foo (int *p, int *expected, int desired)
>  {
>return __atomic_compare_exchange (p, expected, &desired, 0,
>   __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
> -}
> +} /* { dg-error "call to external builtin" } */
>  
>  int
>  foo64 (long *p, long *expected, long desired)
> diff --git a/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c 
> b/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
> index 044a2f76474b..ea1b8e48928a 100644
> --- a/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
> +++ b/gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
> @@ -9,7 +9,7 @@ long
>  test_atomic_fetch_add (long x)
>  {
>return __atomic_fetch_add (&val, x, __ATOMIC_ACQUIRE);
> -}
> +} /* { dg-error "call to external builtin" } */
>  
>  long
>  test_atomic_fetch_sub (long x)
> diff --git a/gcc/testsuite/gcc.target/bpf/atomic-op-3.c 
> b/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
> index b2ce28926347..fefafd6b748f 100644
> --- a/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
> +++ b/gcc/testsuite/gcc.target/bpf/atomic-op-3.c
> @@ -20,7 +20,7 @@ void
>  test_atomic_and (int x)
>  {
>__atomic_and_fetch (&val, x, __ATOMIC_ACQUIRE);
> -}
> +} /* { dg-error "call to external builtin" } */
>  
>  void
>  test_atomic_nand (int x)
> diff --git a/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c 
> b/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
> index 3b6324e966b8..eab695bf388c 100644
> --- a/gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
> +++ b/gcc/testsuite/gcc.target/bpf/atomic-xchg-

hurd: Ad default-pie and static-pie support

2023-11-27 Thread Thomas Schwinge
Hi!

On 2023-10-28T21:20:39+0200, Samuel Thibault  wrote:
> This fixes the Hurd spec in the default-pie case, and adds static-pie
> support.

I understand that your change does work for you as-is, so I've now pushed
that to master branch in commit c768917402d4cba69a92c737e56e177f5b8ab0df
"hurd: Ad default-pie and static-pie support", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c768917402d4cba69a92c737e56e177f5b8ab0df Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sat, 6 May 2023 13:55:44 +0200
Subject: [PATCH] hurd: Ad default-pie and static-pie support

This fixes the Hurd spec in the default-pie case, and adds static-pie
support.

gcc/ChangeLog:

	* config/i386/gnu.h: Use PIE_SPEC, add static-pie case.
	* config/i386/gnu64.h: Use PIE_SPEC, add static-pie case.
---
 gcc/config/i386/gnu.h   | 6 +++---
 gcc/config/i386/gnu64.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
index 8dc6d9ee4e3..e776144f96c 100644
--- a/gcc/config/i386/gnu.h
+++ b/gcc/config/i386/gnu.h
@@ -27,12 +27,12 @@ along with GCC.  If not, see .
 #undef	STARTFILE_SPEC
 #if defined HAVE_LD_PIE
 #define STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+  "%{!shared: %{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;" PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC ":crtbeginS.o%s;:crtbegin.o%s}"
 #else
 #define STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+   crti.o%s %{static:crtbeginT.o%s;shared:crtbeginS.o%s;:crtbegin.o%s}"
 #endif
 
 #ifdef TARGET_LIBC_PROVIDES_SSP
diff --git a/gcc/config/i386/gnu64.h b/gcc/config/i386/gnu64.h
index a411f0e802a..332372fa067 100644
--- a/gcc/config/i386/gnu64.h
+++ b/gcc/config/i386/gnu64.h
@@ -31,10 +31,10 @@ along with GCC.  If not, see .
 #undef	STARTFILE_SPEC
 #if defined HAVE_LD_PIE
 #define STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+  "%{!shared: %{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;" PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC ":crtbeginS.o%s;:crtbegin.o%s}"
 #else
 #define STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC ":crtbeginS.o%s;:crtbegin.o%s}"
 #endif
-- 
2.34.1



Re: [PATCH 1/3] [GCC] arm: vst1_types_x2 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 06/10/2023 12:55, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.


We avoid the terms arm32 and arm64 in the gnu toolchain because of the 
potential for conflict with wild-card regexps for old CPUs (arm3* and 
arm6*).  Please just use 'arm' in this case.



This patch adds the _x2 variants of the vst1 intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/



Please reformat this text section to be no more than 70 columns, so that 
it doesn't produce awkward line wraps in the commit message.



gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x32): New.


This line appears to be a duplicate of the one below, except for the 
bogus ..._x32 at the end.  I think just drop it.



 (vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New.
 (vst1_f16_x2, vst1_f32_x2): New.
 (vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New.
 (vst1_bf16_x2): New.
 * config/arm/arm_neon_builtins.def (vst1_x2): New entries.
 * config/arm/neon.md (vst1_x2): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.


OK with the above changes.

R.


---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md|  10 ++
  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  67 ++
  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |  13 ++
  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |  13 ++
  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |  13 ++
  7 files changed, 231 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c03be9912f8..4bd6093281b 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11242,6 +11242,14 @@ vst1_p64 (poly64_t * __a, poly64x1_t __b)
__builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11271,6 +11279,38 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
__builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x2 (int8_t * __a, int8x8x2_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x2 (int16_t * __a, int16x4x2_t __b)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x2 (int32_t * __a, int32x2x2_t __b)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11287,6 +11327,24 @@ vst1_f32 (float32_t * __a, float32x2_t __b)
__builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x2 (fl

Re: [PATCH 1/3] [GCC] arm: vst1_types_x2 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 27/11/2023 14:56, Richard Earnshaw wrote:



On 06/10/2023 12:55, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN 
variants of the vst1 intrinsic for arm32.


We avoid the terms arm32 and arm64 in the gnu toolchain because of the 
potential for conflict with wild-card regexps for old CPUs (arm3* and 
arm6*).  Please just use 'arm' in this case.


This patch adds the _x2 variants of the vst1 intrinsic. Tests use xN 
so that the latter variants (_x3, _x4) could be added.


ACLE documents are at 
https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at 
https://developer.arm.com/documentation/ddi0487/latest/




Please reformat this text section to be no more than 70 columns, so that 
it doesn't produce awkward line wraps in the commit message.



gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x32): New.


This line appears to be a duplicate of the one below, except for the 
bogus ..._x32 at the end.  I think just drop it.


Ah, the difference is u vs s.  So please fix the last entry.

R.




 (vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New.
 (vst1_f16_x2, vst1_f32_x2): New.
 (vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New.
 (vst1_bf16_x2): New.
 * config/arm/arm_neon_builtins.def (vst1_x2): New entries.
 * config/arm/neon.md (vst1_x2): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.


OK with the above changes.

R.


---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md    |  10 ++
  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  67 ++
  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |  13 ++
  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |  13 ++
  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |  13 ++
  7 files changed, 231 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c03be9912f8..4bd6093281b 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11242,6 +11242,14 @@ vst1_p64 (poly64_t * __a, poly64x1_t __b)
    __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
  }
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11271,6 +11279,38 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
    __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
  }
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x2 (int8_t * __a, int8x8x2_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x2 (int16_t * __a, int16x4x2_t __b)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x2 (int32_t * __a, int32x2x2_t __b)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)

  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11287,6 +11327,24 @@ vst1_f32 (float32_t * __a, float32x2_t __b)
    __builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b);
  }
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension

Re: [PATCH 2/3] [GCC] arm: vst1_types_x3 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 06/10/2023 12:55, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.
This patch adds the _x3 variants of the vst1 intrinsic.



OK, but see comments on the first patch about naming and formatting.

R.


ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1_u8_x3, vst1_u16_x3, vst1_u32_x3, vst1_u64_x3): New.
 (vst1_s8_x3, vst1_s16_x3, vst1_s32_x3, vst1_s64_x3): New.
 (vst1_f16_x3, vst1_f32_x3): New.
 (vst1_p8_x3, vst1_p16_x3, vst1_p64_x3): New.
 (vst1_bf16_x3): New.
 * config/arm/arm_neon_builtins.def (vst1_x3): New entries.
 * config/arm/neon.md (vst1_x3): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1_base_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_p64_xN_1.c: Add new test.
---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md|  10 ++
  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  63 +-
  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   7 +-
  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
  7 files changed, 202 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 4bd6093281b..b01171e5966 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11250,6 +11250,14 @@ vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
__builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11311,6 +11319,38 @@ vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
__builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x3 (int16_t * __a, int16x4x3_t __b)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x3 (int32_t * __a, int32x2x3_t __b)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11345,6 +11385,24 @@ vst1_f32_x2 (float32_t * __a, float32x2x2_t __b)
__builtin_neon_vst1_x2v2sf ((__builtin_neon_sf *) __a, __bu.__o);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x3 (float16_t * __a, float16x4x3_t __b)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11405,6 +11463,38 @@ vst1_u64_x2 (uint64_t * __a, uint64x1x2_t __b)
__builtin_neon_vst1_x2di

Re: [PATCH 3/3] [GCC] arm: vst1_types_x4 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 06/10/2023 12:56, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.
This patch adds the _x4 variants of the vst1 intrinsic.


OK, but please see comment on first patch about naming and formatting.

R.



ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1_u8_x4, vst1_u16_x4, vst1_u32_x4, vst1_u64_x4): New.
 (vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_s64_x4): New.
 (vst1_f16_x4, vst1_f32_x4): New.
 (vst1_p8_x4, vst1_p16_x4, vst1_p64_x4): New.
 (vst1_bf16_x4): New.
 * config/arm/arm_neon_builtins.def (vst1_x4): New entries.
 * config/arm/neon.md (vst1_x4): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1_base_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new test.
 * gcc.target/arm/simd/vst1_p64_xN_1.c: Add new test.
---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md|  10 ++
  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  62 +-
  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   6 +-
  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
  7 files changed, 200 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b01171e5966..41e645d8352 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11258,6 +11258,14 @@ vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
__builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x4 (poly64_t * __a, poly64x1x4_t __b)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11351,6 +11359,38 @@ vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
__builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x4 (int8_t * __a, int8x8x4_t __b)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x4 (int16_t * __a, int16x4x4_t __b)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x4 (int32_t * __a, int32x2x4_t __b)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x4 (int64_t * __a, int64x1x4_t __b)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11403,6 +11443,24 @@ vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
__builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x4 (float16_t * __a, float16x4x4_t __b)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x4 (float32_t * __a, float32x2x4_t __b)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11495,6 +11553,38 @@ vst1_u64_x3 (uint64_t * __a, uint64x1x3_t __b)
__builtin_neon_vst1_x3

Re: [PATCH v5] gcc: Introduce -fhardened

2023-11-27 Thread Marek Polacek
On Sun, Nov 26, 2023 at 11:59:04AM +0100, FX Coudert wrote:
> Hi Marek,
> 
> The new test at gcc.target/i386/cf_check-6.c fails on darwin with:
>   Excess errors:
>   cc1: warning: '-fhardened' not supported for this target
> 
> Other tests are only run on Linux, so I added this to 
> gcc.target/i386/cf_check-6.c as well.
> Pushed as 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e40a13eaca4d87ec33beb0d9d31985e0023bfe3e

Ah, right.  Thanks a lot, that is the correct fix.

Marek



Re: [PATCH 1/3] [GCC] arm: vst1q_types_x2 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 10/10/2023 15:04, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1q intrinsic for AArch32.
This patch adds the _x2 variants of the vst1q intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1q_u8_x2, vst1q_u16_x2, vst1q_u32_x2, vst1q_u64_x32): New.


The same issues that I noted on the previous set apply here too.

Otherwise OK.

R.


 (vst1q_s8_x2, vst1q_s16_x2, vst1q_s32_x2, vst1q_s64_x2): New.
 (vst1q_f16_x2, vst1q_f32_x2): New.
 (vst1q_p8_x2, vst1q_p16_x2, vst1q_p64_x2): New.
 (vst1q_bf16_x2): New.
 * config/arm/arm_neon_builtins.def (vst1<_x2): New entries.
 * config/arm/neon.md (neon_vst1_x2): Updated from 
neon_vst1_x2.
* config/arm/iterators.md (VMEMX2): New mode iterator.
(VMEMX2_q): New mode attribute.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.
---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/iterators.md   |   6 +
  gcc/config/arm/neon.md|   6 +-
  .../gcc.target/arm/simd/vst1q_base_xN_1.c |  70 +++
  .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |  13 ++
  .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |  13 ++
  .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |  13 ++
  8 files changed, 233 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_base_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_bf16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_fp16_xN_1.c
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 41e645d8352..b8f3fca3060 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11327,6 +11327,38 @@ vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
__builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x2 (int8_t * __a, int8x16x2_t __b)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x2 (int16_t * __a, int16x8x2_t __b)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x2 (int32_t * __a, int32x4x2_t __b)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x2 (int64_t * __a, int64x2x2_t __b)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11656,6 +11688,14 @@ vst1q_p64 (poly64_t * __a, poly64x2_t __b)
__builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t __b)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11701,6 +11741,24 @@ vst1q_f32 (float32_t * __a, float32x4_t __b)
__builtin_neon_vst1v4sf ((__builtin_neon_sf *) __a, __b);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x2 (float16_t * __a, float16x8x2_t __b)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v8hf (__a, __bu.__o);
+}
+#

Re: [PATCH 2/3] [GCC] arm: vst1q_types_x3 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 10/10/2023 15:04, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1q intrinsic for AArch32.
This patch adds the _x3 variants of the vst1q intrinsic.


OK, but format lines to <= 70 columns please.

R.


ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1q_u8_x3, vst1q_u16_x3, vst1q_u32_x3, vst1q_u64_x3): New.
 (vst1q_s8_x3, vst1q_s16_x3, vst1q_s32_x3, vst1q_s64_x3): New.
 (vst1q_f16_x3, vst1q_f32_x3): New.
 (vst1q_p8_x3, vst1q_p16_x3, vst1q_p64_x3): New.
 (vst1q_bf16_x3): New.
 * config/arm/arm_neon_builtins.def (vst1q_x3): New entries.
 * config/arm/neon.md (neon_vst1q_x3): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.
---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md|  24 
  .../gcc.target/arm/simd/vst1q_base_xN_1.c |  60 +
  .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   6 +
  .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   6 +
  .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   6 +
  7 files changed, 217 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b8f3fca3060..46ee888410f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11359,6 +11359,38 @@ vst1q_s64_x2 (int64_t * __a, int64x2x2_t __b)
__builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x3 (int8_t * __a, int8x16x3_t __b)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x3 (int16_t * __a, int16x8x3_t __b)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x3 (int32_t * __a, int32x4x3_t __b)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x3 (int64_t * __a, int64x2x3_t __b)
+{
+  union { int64x2x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11696,6 +11728,14 @@ vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t __b)
__builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t __b)
+{
+  union { poly64x2x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11759,6 +11799,24 @@ vst1q_f32_x2 (float32_t * __a, float32x4x2_t __b)
__builtin_neon_vst1q_x2v4sf (__a, __bu.__o);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x3 (float16_t * __a, float16x8x3_t __b)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v8hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f32_x3 (float32_t * __a, float32x4x3_t __b)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v4sf (__a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1q_u8 (uint8_t * __a, uint8x16_t __b)
@@ -11819,6 +11877,38 @@ vst1q_u64_x2 (uint64_t * __a, uint64x2x2_t __b)
__builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__exten

Re: [PATCH 3/3] [GCC] arm: vst1q_types_x4 ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 10/10/2023 15:04, ezra.sito...@arm.com wrote:

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1q intrinsic for AArch32.
This patch adds the _x4 variants of the vst1q intrinsic.


OK, but see earlier comments about formatting.

R.



ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
 * config/arm/arm_neon.h
 (vst1q_u8_x4, vst1q_u16_x4, vst1q_u32_x4, vst1q_u64_x4): New.
 (vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_s64_x4): New.
 (vst1q_f16_x4, vst1q_f32_x4): New.
 (vst1q_p8_x4, vst1q_p16_x4, vst1q_p64_x4): New.
 (vst1q_bf16_x4): New.
 * config/arm/arm_neon_builtins.def (vst1q_x4): New entries.
 * config/arm/neon.md (neon_vst1q_x4): New.

gcc/testsuite/ChangeLog:
 * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
 * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.
---
  gcc/config/arm/arm_neon.h | 114 ++
  gcc/config/arm/arm_neon_builtins.def  |   1 +
  gcc/config/arm/neon.md|  26 
  .../gcc.target/arm/simd/vst1q_base_xN_1.c |  59 +
  .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   8 +-
  .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   6 +
  .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   6 +
  7 files changed, 219 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 46ee888410f..df3e23b6e95 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11391,6 +11391,38 @@ vst1q_s64_x3 (int64_t * __a, int64x2x3_t __b)
__builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x4 (int8_t * __a, int8x16x4_t __b)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x4 (int16_t * __a, int16x8x4_t __b)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x4 (int32_t * __a, int32x4x4_t __b)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x4 (int64_t * __a, int64x2x4_t __b)
+{
+  union { int64x2x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11736,6 +11768,14 @@ vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t __b)
__builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x4 (poly64_t * __a, poly64x2x4_t __b)
+{
+  union { poly64x2x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
  #pragma GCC pop_options
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11817,6 +11857,24 @@ vst1q_f32_x3 (float32_t * __a, float32x4x3_t __b)
__builtin_neon_vst1q_x3v4sf (__a, __bu.__o);
  }
  
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)

+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x4 (float16_t * __a, float16x8x4_t __b)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v8hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f32_x4 (float32_t * __a, float32x4x4_t __b)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v4sf (__a, __bu.__o);
+}
+
  __extension__ extern __inline void
  __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
  vst1q_u8 (uint8_t * __a, uint8x16_t __b)
@@ -11909,6 +11967,38 @@ vst1q_u64_x3 (uint64_t * __a, uint64x2x3_t __b)
__builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
  }
  
+__extension__ extern __inline void

+

Re: [PATCH 0/3] [GCC] arm: vld1_types_xN ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 19/10/2023 14:41, ezra.sito...@arm.com wrote:

Add xN variants of vld1_types intrinsic for AArch32.




These patches are all OK, but please fix the commit message formatting 
as with earlier series.


R.


Re: [PATCH 0/3] [GCC] arm: vld1q_types_xN ACLE intrinsics

2023-11-27 Thread Richard Earnshaw




On 06/10/2023 10:49, ezra.sito...@arm.com wrote:

Add xN variants of vld1q_types intrinsic.




These patches are all OK, but please fix commit message formatting in 
line with the comments on the earlier series.


R.


Re: [patch] OpenMP: Add -Wopenmp and use it

2023-11-27 Thread Christophe Lyon
On Mon, 27 Nov 2023 at 11:33, Tobias Burnus  wrote:
>
> Hi,
>
> On 27.11.23 11:20, Christophe Lyon wrote:
>
> > I think the lack of final '.' in:
>
> Indeed - but you are lagging a bit behind:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638128.html
>
> [committed] c-family/c.opt (-Wopenmp): Add missing tailing '.'
>
> Fri Nov 24 18:56:21 GMT 2023
>
> Committed as r14-5835-g6eb1507107dee3
>

Great thanks! Sorry for the noise, it's a bit hard and error-prone to
track which regressions have already fixed and/or are being worked on.
Our bisect started at r14-5830, just a bit too early :-)

Thanks,

Christophe


> Tobias
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: PR111754

2023-11-27 Thread Prathamesh Kulkarni
On Fri, 24 Nov 2023 at 03:13, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> >>  wrote:
> >> >
> >> > Prathamesh Kulkarni  writes:
> >> > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> >> > >  wrote:
> >> > >> So I think the PR could be solved by something like the attached.
> >> > >> Do you agree?  If so, could you base the patch on this instead?
> >> > >>
> >> > >> Only tested against the self-tests.
> >> > >>
> >> > >> Thanks,
> >> > >> Richard
> >> > >>
> >> > >> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > >> index 40767736389..00fce4945a7 100644
> >> > >> --- a/gcc/fold-const.cc
> >> > >> +++ b/gcc/fold-const.cc
> >> > >> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, 
> >> > >> tree arg1, const vec_perm_indices &sel,
> >> > >>unsigned res_npatterns, res_nelts_per_pattern;
> >> > >>unsigned HOST_WIDE_INT res_nelts;
> >> > >>
> >> > >> -  /* (1) If SEL is a suitable mask as determined by
> >> > >> - valid_mask_for_fold_vec_perm_cst_p, then:
> >> > >> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> >> > >> - res_nelts_per_pattern = max of nelts_per_pattern between
> >> > >> -ARG0, ARG1 and SEL.
> >> > >> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> >> > >> - res_npatterns = nelts in result vector.
> >> > >> - res_nelts_per_pattern = 1.
> >> > >> - This exception is made so that VLS ARG0, ARG1 and SEL work as 
> >> > >> before.  */
> >> > >> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >> > >> -{
> >> > >> -  res_npatterns
> >> > >> -   = std::max (VECTOR_CST_NPATTERNS (arg0),
> >> > >> -   std::max (VECTOR_CST_NPATTERNS (arg1),
> >> > >> - sel.encoding ().npatterns ()));
> >> > >> +  /* First try to implement the fold in a VLA-friendly way.
> >> > >> +
> >> > >> + (1) If the selector is simply a duplication of N elements, the
> >> > >> +result is likewise a duplication of N elements.
> >> > >> +
> >> > >> + (2) If the selector is N elements followed by a duplication
> >> > >> +of N elements, the result is too.
> >> > >>
> >> > >> -  res_nelts_per_pattern
> >> > >> -   = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> >> > >> -   std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> >> > >> - sel.encoding ().nelts_per_pattern ()));
> >> > >> + (3) If the selector is N elements followed by an interleaving
> >> > >> +of N linear series, the situation is more complex.
> >> > >>
> >> > >> +valid_mask_for_fold_vec_perm_cst_p detects whether we
> >> > >> +can handle this case.  If we can, then each of the N linear
> >> > >> +series either (a) selects the same element each time or
> >> > >> +(b) selects a linear series from one of the input patterns.
> >> > >> +
> >> > >> +If (b) holds for one of the linear series, the result
> >> > >> +will contain a linear series, and so the result will have
> >> > >> +the same shape as the selector.  If (a) holds for all of
> >> > >> +the lienar series, the result will be the same as (2) above.
> >> > >> +
> >> > >> +(b) can only hold if one of the inputs pattern has a
> >> > >> +stepped encoding.  */
> >> > >> +  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >> > >> +{
> >> > >> +  res_npatterns = sel.encoding ().npatterns ();
> >> > >> +  res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> >> > >> +  if (res_nelts_per_pattern == 3
> >> > >> + && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> >> > >> + && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> >> > >> +   res_nelts_per_pattern = 2;
> >> > > Um, in this case, should we set:
> >> > > res_nelts_per_pattern = max (nelts_per_pattern (arg0), 
> >> > > nelts_per_pattern(arg1))
> >> > > if both have nelts_per_pattern == 1 ?
> >> >
> >> > No, it still needs to be 2 even if arg0 and arg1 are duplicates.
> >> > E.g. consider a selector that picks the first element of arg0
> >> > followed by a duplicate of the first element of arg1.
> >> >
> >> > > Also I suppose this matters only for non-integral element type, since
> >> > > for integral element type,
> >> > > vector_cst_elt will return the correct value even if the element is
> >> > > not explicitly encoded and input vector is dup ?
> >> >
> >> > Yeah, but it might help even for integers.  If we build fewer
> >> > elements explicitly, and so read fewer implicitly-encoded inputs,
> >> > there's less risk of running into:
> >> >
> >> >   if (!can_div_trunc_p (sel[i], len, &q, &r))
> >> > {
> >> >   if (reason)
> >> > *reason = "cannot divide selector element by arg len";
> >> 

Re: [PATCH] aarch64: Improve cost of `a ? {-,}1 : b`

2023-11-27 Thread Richard Sandiford
Andrew Pinski  writes:
> While looking into PR 112454, I found the cost for
> `(if_then_else (cmp) (const_int 1) (reg))` was being recorded as 8
> (or `COSTS_N_INSNS (2)`) but it should have been 4 (or `COSTS_N_INSNS (1)`).
> This improves the cost by not adding the cost of `(const_int 1)` to
> the total cost.
>
> It does not does not fix PR 112454 as that requires other changes to forwprop
> the `(const_int 1)` earlier than combine.
>
> Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
>   Don't add the cost of `1` or `-1`.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index f6f6f94bf43..63241c5aaa5 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11642,9 +11642,16 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx 
> op2, int *cost, bool speed)
>   /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3).  */
>   op1 = XEXP (inner, 0);
>   }
> -
> -  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> -  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
> +  if (op2 == constm1_rtx || op2 == const1_rtx)
> + *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> +  else if (op1 == constm1_rtx || op1 == const1_rtx)
> + *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);

It looks like this is really an extra option on top of the previous
if-else chain, since it only applies when OP1 and OP2 are still the
operands of the if_then_else.  So how about:

  else if (op1 == constm1_rtx || op1 == const1_rtx)
{
  /* Use CSINV.  */
  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
  return true;
}
  else if (op2 == constm1_rtx || op2 == const1_rtx)
{
  /* Use CSINV.  */
  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
  return true;
}

leaving the code to fall through to:

  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
  return true;

as it does currently.  OK in that form if you agree.

Let me know if you don't.  But in that case:

> +  else
> + {
> +   *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> +   *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 1, speed);

should be 2, speed

> + }
> +  

Thanks,
Richard


Re: Ping: [PATCH] Fix PR112419

2023-11-27 Thread Jeff Law




On 11/23/23 10:05, Hans-Peter Nilsson wrote:

From: Hans-Peter Nilsson 
Date: Thu, 16 Nov 2023 05:24:06 +0100


From: Martin Uecker 
Date: Tue, 07 Nov 2023 06:56:25 +0100



Am Montag, dem 06.11.2023 um 21:01 -0700 schrieb Jeff Law:


On 11/6/23 20:58, Hans-Peter Nilsson wrote:

This patch caused a testsuite regression: there's now an
"excess error" failure for gcc.dg/Wnonnull-4.c for 32-bit
targets (and 64-bit targets testing with a "-m32" option)
after your r14-5115-g6e9ee44d96e5.  It's logged as PR112419.

It caused failures for just about every target ;(  Presumably it worked
on x86_64...


I do not think this is a true regression
just a problem with the test on 32-bit which somehow surfaced
due to the change.

The excess error is:

FAIL: gcc.dg/Wnonnull-4.c (test for excess errors)
Excess errors:
/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/Wnonnull-4.c:144:3:
 warning: 'fda_n_5' specified size 4294967256 exceeds maximum object size
2147483647 [-Wstringop-overflow=]

I think the warning was suppressed before due to the other (nonnull)
warning which I removed in this case.

I think the simple fix might be to to turn off -Wstringop-overflow.


No, that trigs many of the dg-warnings that are tested.

(I didn't pay attention to the actual warning messages and
tried to pursue that at first.)

Maybe think it's best to actually expect the warning, like
so.

Maintainers of 16-bit targets will have to address their
concerns separately.  For example, they may choose to not
run the test at all.

Ok to commit?

Subject: [PATCH] gcc.dg/Wnonnull-4.c: Handle new overflow warning for 32-bit 
targets [PR112419]

PR testsuite/112419
* gcc.dg/Wnonnull-4.c (test_fda_n_5): Expect warning for exceeding
maximum object size for 32-bit targets.
---
  gcc/testsuite/gcc.dg/Wnonnull-4.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/Wnonnull-4.c 
b/gcc/testsuite/gcc.dg/Wnonnull-4.c
index 1f14fbba45df..d63e76da70a2 100644
--- a/gcc/testsuite/gcc.dg/Wnonnull-4.c
+++ b/gcc/testsuite/gcc.dg/Wnonnull-4.c
@@ -142,6 +142,7 @@ void test_fda_n_5 (int r_m1)
T (  1);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is 1" }
T (  9);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is 9" }
T (max);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is \\d+" }
+// { dg-warning "size 4294967256 exceeds maximum object size" "" { target 
ilp32 } .-1 }
  }
Unfortunately I think we need to go back to the original issue that 
Martin (I think) dismissed.


Specifically, this is a regression.  It's very clear that prior to the 
patch in question there was no diagnostic about the size of the 
requested memory allocation and after the patch in question we get the 
"exceeds maximum object size" diagnostic.


Now one explanation could be that the diagnostic is warranted and it was 
a bug that the diagnostic hadn't been emitted prior to Martin's patch. 
In this case some kind of dg-blah is warranted, but I don't think anyone 
has made this argument.



Jeff


Re: [PATCH] aarch64: Improve cost of `a ? {-,}1 : b`

2023-11-27 Thread Richard Sandiford
Richard Sandiford  writes:
> Andrew Pinski  writes:
>> While looking into PR 112454, I found the cost for
>> `(if_then_else (cmp) (const_int 1) (reg))` was being recorded as 8
>> (or `COSTS_N_INSNS (2)`) but it should have been 4 (or `COSTS_N_INSNS (1)`).
>> This improves the cost by not adding the cost of `(const_int 1)` to
>> the total cost.
>>
>> It does not does not fix PR 112454 as that requires other changes to forwprop
>> the `(const_int 1)` earlier than combine.
>>
>> Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
>>  Don't add the cost of `1` or `-1`.
>>
>> Signed-off-by: Andrew Pinski 
>> ---
>>  gcc/config/aarch64/aarch64.cc | 13 ++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index f6f6f94bf43..63241c5aaa5 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -11642,9 +11642,16 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx 
>> op2, int *cost, bool speed)
>>  /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3).  */
>>  op1 = XEXP (inner, 0);
>>  }
>> -
>> -  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
>> -  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
>> +  if (op2 == constm1_rtx || op2 == const1_rtx)
>> +*cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
>> +  else if (op1 == constm1_rtx || op1 == const1_rtx)
>> +*cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
>
> It looks like this is really an extra option on top of the previous
> if-else chain, since it only applies when OP1 and OP2 are still the
> operands of the if_then_else.  So how about:
>
>   else if (op1 == constm1_rtx || op1 == const1_rtx)
> {
> /* Use CSINV.  */

eh, of course I meant CSINV or CSINC...

> *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
> return true;
> }
>   else if (op2 == constm1_rtx || op2 == const1_rtx)
> {
> /* Use CSINV.  */
> *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> return true;
> }
>
> leaving the code to fall through to:
>
>   *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
>   *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
>   return true;
>
> as it does currently.  OK in that form if you agree.
>
> Let me know if you don't.  But in that case:
>
>> +  else
>> +{
>> +  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
>> +  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 1, speed);
>
> should be 2, speed
>
>> +}
>> +  
>
> Thanks,
> Richard


Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-11-27 Thread Jeff Law




On 11/27/23 05:39, Robin Dapp wrote:

The easiest way to avoid running into the alias analysis problem is
to scrap the MEM_EXPR when we expand the internal functions for
partial loads/stores.  That avoids the disambiguation we run into
which is realizing that we store to an object of less size as
the size of the mode we appear to store.

After the patch we see just

   [1  S64 A32]

so we preserve the alias set, the alignment and the size (the size
is redundant if the MEM insn't BLKmode).  That's still not good
in case the RTL alias oracle would implement the same
disambiguation but it fends off the gimple one.

This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
and --param=vect-partial-vector-usage=1.


On riscv we're seeing a similar problem across the testsuite
and several execution failures as a result.  In the case I
looked at we move a scalar load upwards over a partial store
that aliases the load.

I independently arrived at the spot mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
before knowing about the PR.

I can confirm that your RFC patch fixes at least two of the
failures,  I haven't checked the others but very likely
they are similar.
FWIW, it should always be safe to ignore the memory attributes.   So if 
there's a reasonable condition here, then we can use it and just ignore 
the attribute.


Does the attribute on a partial load/store indicate the potential 
load/store size or does it indicate the actual known load/store size. 
If the former, then we probably need to treat it as a may-read/may-write 
kind of reference.


Jeff


Re: GCC/Rust libgrust-v2/to-submit branch

2023-11-27 Thread Thomas Schwinge
Hi!

On 2023-11-21T16:20:22+0100, Arthur Cohen  wrote:
> A newer version of the library has been force-pushed to the branch
> `libgrust-v2/to-submit`.

> On 11/20/23 15:55, Thomas Schwinge wrote:
>> Arthur and Pierre-Emmanuel have prepared a GCC/Rust libgrust-v2/to-submit
>> branch: .
>> In that one, most of the issues raised have been addressed, and which
>> I've now successfully "tested" in my different GCC configurations,
>> requiring just one additional change (see end of this email).  I'm using
>> "tested" in quotes here, as libgrust currently is still missing its
>> eventual content, and still is without actual users, so we may still be
>> up for surprises later on.  ;-)

>> On 2023-10-27T22:41:52+0200, I wrote:
>>> On 2023-09-27T00:25:16+0200, I wrote:
 don't we also directly need to
 incorporate here a few GCC/Rust master branch follow-on commits, like:

- commit 171ea4e2b3e202067c50f9c206974fbe1da691c0 "fixup: Fix bootstrap 
 build"
- commit 61cbe201029658c32e5c360823b9a1a17d21b03c "fixup: Fix missing 
 build dependency"
>>>
>>> I've not yet run into the need for these two.  Let's please leave these
>>> out of the upstream submission for now, until we understand what exactly
>>> these are necessary for.
>>
>> (Still the same.)
>
> Do you mean that we should remove the content of these commits from the
> submission? If so, I believe it's now done.

That's correct.  My theory is that "fixup: Fix bootstrap build" can be
dropped altogether (that is, reverted on GCC/Rust master branch; I'll
look into that, later), and "fixup: Fix missing build dependency" will be
necessary once the GCC/Rust front end links against libgrust (that is,
will then move into that commit).

>>> However:
>>>
- commit 6a8b207b9ef7f9038e0cae7766117428783825d8 "libgrust: Add 
 dependency to libstdc++"
>>>
>>> ... this one definitely is necessary right now; see discussion in
>>> 
>>> "Disable target libgrust if we're not building target libstdc++".
>>
>> This one still isn't in the GCC/Rust libgrust-v2/to-submit branch -- but
>> having now tested that branch, I'm now no longer seeing the respective
>> build failure.  Isn't that change "libgrust: Add dependency to libstdc++"
>> still necessary, conceptually?  (Maybe we're just lucky, currently?)
>> I'll be sure to re-test in my different GCC configurations once libgrust
>> gains actual content and use.  (..., which might then re-expose the
>> original problem?)

So I guess I really just was lucky in my testing, because: later I
actually again did run into the need for that commit, so:

> This commit was integrated into another one:
>
> fb31093105e build: Add libgrust as compilation modules
>
> (on libgrust-v2/to-submit as of 2 minutes ago)

ACK.

> --- a/gcc/rust/config-lang.in
> +++ b/gcc/rust/config-lang.in

> +target_libs="target-libffi target-libbacktrace target-libgrust"

 Please don't add back 'target-libffi' and 'target-libbacktrace' here;
 just 'target-libgrust'.  (As is present in GCC/Rust master branch, and
 per commit 7411eca498beb13729cc2acec77e68250940aa81
 "Rust: Don't depend on unused 'target-libffi', 'target-libbacktrace'".)
>>>
>>> ... that change is necessary, too.
>>
>> That's still unchanged in the GCC/Rust libgrust-v2/to-submit branch;
>> please apply to 'gcc/rust/config-lang.in':
>>
>>  -target_libs="target-libffi target-libbacktrace target-libgrust"
>>  +target_libs=target-libgrust

(That's now been addressed, too.)

>> Then, still should re-order the commits so that (re)generation of
>> auto-generated files comes before use of libgrust (so that later
>> bisection doesn't break), and move the 'contrib/gcc_update' update into
>> the commit that adds the auto-generated files.
>
> Do you mean that the regeneration should happen before the commit adding
> the proc_macro library? Or that when we keep going and adding more
> commits on top of this, we need to make sure the regeneration commit
> happens before any code starts using/depending on libgrust/?

My point is: once the 'gcc/rust/config-lang.in' changes appear (when a
'git bisect' tests commit "build: Add libgrust as compilation modules",
by chance), the GCC build system will then try to build libgrust.  But
given that, at that time in the commit history, the libgrust build system
('libgrust/configure' etc.) is not yet present, the GCC build will fail.

So I suggest:

  - "libgrust: Add entry for maintainers and stub changelog file"
  - "libgrust: Add libproc_macro and build system"
... plus 'autoreconf' in 'libgrust/' folded in.
... plus 'contrib/gcc_update' update moved here.
  - "build: Add libgrust as compilation modules"
... plus "Disable target libgrust if missing libstdc++" folded in.
... plus 'autoreconf' and 'autogen'in '/' folded in.
  - "Regenerate build files"

Re: Ping: [PATCH] Fix PR112419

2023-11-27 Thread Martin Uecker
Am Montag, dem 27.11.2023 um 08:36 -0700 schrieb Jeff Law:
> 
> On 11/23/23 10:05, Hans-Peter Nilsson wrote:
> > > From: Hans-Peter Nilsson 
> > > Date: Thu, 16 Nov 2023 05:24:06 +0100
> > > 
> > > > From: Martin Uecker 
> > > > Date: Tue, 07 Nov 2023 06:56:25 +0100
> > > 
> > > > Am Montag, dem 06.11.2023 um 21:01 -0700 schrieb Jeff Law:
> > > > > 
> > > > > On 11/6/23 20:58, Hans-Peter Nilsson wrote:
> > > > > > This patch caused a testsuite regression: there's now an
> > > > > > "excess error" failure for gcc.dg/Wnonnull-4.c for 32-bit
> > > > > > targets (and 64-bit targets testing with a "-m32" option)
> > > > > > after your r14-5115-g6e9ee44d96e5.  It's logged as PR112419.
> > > > > It caused failures for just about every target ;(  Presumably it 
> > > > > worked
> > > > > on x86_64...
> > > > 
> > > > I do not think this is a true regression
> > > > just a problem with the test on 32-bit which somehow surfaced
> > > > due to the change.
> > > > 
> > > > The excess error is:
> > > > 
> > > > FAIL: gcc.dg/Wnonnull-4.c (test for excess errors)
> > > > Excess errors:
> > > > /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/Wnonnull-4.c:144:3:
> > > >  warning: 'fda_n_5' specified size 4294967256 exceeds maximum object 
> > > > size
> > > > 2147483647 [-Wstringop-overflow=]
> > > > 
> > > > I think the warning was suppressed before due to the other (nonnull)
> > > > warning which I removed in this case.
> > > > 
> > > > I think the simple fix might be to to turn off -Wstringop-overflow.
> > > 
> > > No, that trigs many of the dg-warnings that are tested.
> > > 
> > > (I didn't pay attention to the actual warning messages and
> > > tried to pursue that at first.)
> > > 
> > > Maybe think it's best to actually expect the warning, like
> > > so.
> > > 
> > > Maintainers of 16-bit targets will have to address their
> > > concerns separately.  For example, they may choose to not
> > > run the test at all.
> > > 
> > > Ok to commit?
> > > 
> > > Subject: [PATCH] gcc.dg/Wnonnull-4.c: Handle new overflow warning for 
> > > 32-bit targets [PR112419]
> > > 
> > >   PR testsuite/112419
> > >   * gcc.dg/Wnonnull-4.c (test_fda_n_5): Expect warning for exceeding
> > >   maximum object size for 32-bit targets.
> > > ---
> > >   gcc/testsuite/gcc.dg/Wnonnull-4.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/gcc/testsuite/gcc.dg/Wnonnull-4.c 
> > > b/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > index 1f14fbba45df..d63e76da70a2 100644
> > > --- a/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > +++ b/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > @@ -142,6 +142,7 @@ void test_fda_n_5 (int r_m1)
> > > T (  1);  // { dg-bogus "argument 2 of variable length array 
> > > 'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value 
> > > is 1" }
> > > T (  9);  // { dg-bogus "argument 2 of variable length array 
> > > 'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value 
> > > is 9" }
> > > T (max);  // { dg-bogus "argument 2 of variable length array 
> > > 'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value 
> > > is \\d+" }
> > > +// { dg-warning "size 4294967256 exceeds maximum object size" "" { 
> > > target ilp32 } .-1 }
> > >   }
> Unfortunately I think we need to go back to the original issue that 
> Martin (I think) dismissed.
> 
> Specifically, this is a regression.  It's very clear that prior to the 
> patch in question there was no diagnostic about the size of the 
> requested memory allocation and after the patch in question we get the 
> "exceeds maximum object size" diagnostic.
> 
> Now one explanation could be that the diagnostic is warranted and it was 
> a bug that the diagnostic hadn't been emitted prior to Martin's patch. 
> In this case some kind of dg-blah is warranted, but I don't think anyone 
> has made this argument.
> 
I believe the warning is correct but was suppressed before.


My plan was to split up the test case in one which is for
-Wstringop-overflow and one which is for -Wnonnull and then
one could turn off the -Wstringop-overflow for the tests
which are actually for -Wnonnull.  But adding the dg-blah
would certainly be simpler.


Martin





Re: [PATCH v2 3/7] aarch64: Add eh_return compile tests

2023-11-27 Thread Szabolcs Nagy
The 11/26/2023 14:37, Richard Sandiford wrote:
> Szabolcs Nagy  writes:
> > +++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mbranch-protection=pac-ret+leaf" } */
> 
> Probably best to add -fno-schedule-insns -fno-schedule-insns2, so that the
> instructions in the check-function-bodies are in a more predictable order.
> 
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +/*
> > +**foo:
> > +** hint25 // paciasp
> > +** stp x0, x1, .*
> > +** stp x2, x3, .*
> > +** cbz w2, .*
> > +** mov x4, 0
> > +** ldp x2, x3, .*
> > +** ldp x0, x1, .*
> > +** cbz x4, .*
> > +** add sp, sp, x5
> > +** br  x6
> > +** hint29 // autiasp
> > +** ret
> > +** mov x5, x0
> > +** mov x6, x1
> > +** mov x4, 1
> > +** b   .*
> > +*/
> 
> What's the significance of x3 here?  It looks from the function definition
> like it should be undefined.  And what are the stps and ldps doing?
> 
> If those aren't an important part of the test, it might be better
> to stub them out with "...", e.g.:
> 
> /*
> **foo:
> **hint25 // paciasp
> **...
> **cbz w2, .*
> **mov x4, 0
> **...
> **cbz x4, .*
> **add sp, sp, x5
> **br  x6
> **hint29 // autiasp
> **ret
> **mov x5, x0
> **mov x6, x1
> **mov x4, 1
> **b   .*
> */
> 
> LGTM otherwise.

committed as

>From cad7e1e3e0dea1922f89290bbbc27b4c44f53bf5 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy 
Date: Fri, 2 Jun 2023 14:17:02 +0100
Subject: [PATCH] aarch64: Add eh_return compile tests

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-2.c: New test.
* gcc.target/aarch64/eh_return-3.c: New test.
---
 .../gcc.target/aarch64/eh_return-2.c  |  9 ++
 .../gcc.target/aarch64/eh_return-3.c  | 28 +++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-3.c

diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-2.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
new file mode 100644
index 000..4a9d124e891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler "add\tsp, sp, x5" } } */
+/* { dg-final { scan-assembler "br\tx6" } } */
+
+void
+foo (unsigned long off, void *handler)
+{
+  __builtin_eh_return (off, handler);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-3.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
new file mode 100644
index 000..a17baa86501
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=pac-ret+leaf -fno-schedule-insns 
-fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo:
+** hint25 // paciasp
+** ...
+** cbz w2, .*
+** mov x4, 0
+** ...
+** cbz x4, .*
+** add sp, sp, x5
+** br  x6
+** hint29 // autiasp
+** ret
+** mov x5, x0
+** mov x4, 1
+** mov x6, x1
+** b   .*
+*/
+void
+foo (unsigned long off, void *handler, int c)
+{
+  if (c)
+return;
+  __builtin_eh_return (off, handler);
+}
-- 
2.25.1



Re: [PATCH] c++: Implement P2582R1, CTAD from inherited constructors

2023-11-27 Thread Patrick Palka
On Fri, 24 Nov 2023, Patrick Palka wrote:

> On Wed, 22 Nov 2023, Patrick Palka wrote:
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > -- >8 --
> > 
> > This patch implements C++23 class template argument deduction from
> > inherited constructors, which is specified in terms of C++20 alias
> > CTAD which we already fully support.  The rule for transforming
> > the return type of an inherited guide is specified in terms of a
> > partially specialized class template, but this patch implements it
> > in a simpler way, performing ahead of time deduction instead of
> > instantiation time deduction.  I wasn't able to find an example for
> > which this implementation strategy makes a difference, but I didn't
> > look very hard.  Support seems good enough to advertise as complete,
> > and there should be no functional change before C++23 mode.
> > 
> > There's a couple of FIXMEs, one in inherited_ctad_tweaks for recognizing
> > more forms of inherited constructors, and one in deduction_guides_for for
> > making the cache aware of base-class dependencies.
> > 
> > There doesn't seem to be a feature-test macro update for this paper.
> > 
> 
> Here's v2 with some minor changes:
> 
>   * set processing_template_decl when rewriting the return type of
> a template guide
>   * rather than adding an out parameter to type_targs_deducible_from,
> just make it return NULL_TREE or the deduced args
>   * add a testcase demonstrating each of the FIXMEs
> 
> -- >8 --
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (type_targs_deducible_from): Adjust return type.
>   * pt.cc (alias_ctad_tweaks): Handle C++23 inherited CTAD.
>   (inherited_ctad_tweaks): Define.
>   (type_targs_deducible_from): Return the deduced arguments or
>   NULL_TREE instead of a bool.  Handle 'tmpl' being a TREE_LIST
>   representing a synthetic alias template.
>   (ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
>   USING_DECL in C++23 mode.
>   (deduction_guides_for): Add FIXME for stale cache entries in
>   light of inherited CTAD.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/class-deduction67.C: Accept in C++23 mode.
>   * g++.dg/cpp23/class-deduction-inherited1.C: New test.
>   * g++.dg/cpp23/class-deduction-inherited2.C: New test.
>   * g++.dg/cpp23/class-deduction-inherited3.C: New test.
>   * g++.dg/cpp23/class-deduction-inherited4.C: New test.
> ---
>  gcc/cp/cp-tree.h  |   2 +-
>  gcc/cp/pt.cc  | 186 +++---
>  .../g++.dg/cpp1z/class-deduction67.C  |   5 +-
>  .../g++.dg/cpp23/class-deduction-inherited1.C |  38 
>  .../g++.dg/cpp23/class-deduction-inherited2.C |  26 +++
>  .../g++.dg/cpp23/class-deduction-inherited3.C |  16 ++
>  .../g++.dg/cpp23/class-deduction-inherited4.C |  32 +++
>  7 files changed, 272 insertions(+), 33 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited4.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 7b0b7c6a17e..abc467fb290 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7457,7 +7457,7 @@ extern tree fn_type_unification (tree, 
> tree, tree,
>bool, bool);
>  extern void mark_decl_instantiated   (tree, int);
>  extern int more_specialized_fn   (tree, tree, int);
> -extern bool type_targs_deducible_from(tree, tree);
> +extern tree type_targs_deducible_from(tree, tree);
>  extern void do_decl_instantiation(tree, tree);
>  extern void do_type_instantiation(tree, tree, tsubst_flags_t);
>  extern bool always_instantiate_p (tree);
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 092e6fdfd36..8b7aa96cf01 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -223,6 +223,9 @@ static void instantiate_body (tree pattern, tree args, 
> tree d, bool nested);
>  static tree maybe_dependent_member_ref (tree, tree, tsubst_flags_t, tree);
>  static void mark_template_arguments_used (tree, tree);
>  static bool uses_outer_template_parms (tree);
> +static tree alias_ctad_tweaks (tree, tree);
> +static tree inherited_ctad_tweaks (tree, tree, tsubst_flags_t);
> +static tree deduction_guides_for (tree, bool&, tsubst_flags_t);
>  
>  /* Make the current scope suitable for access checking when we are
> processing T.  T can be FUNCTION_DECL for instantiated function
> @@ -29736,8 +29739,6 @@ is_spec_or_derived (tree etype, tree tmpl)
>return !err;
>  }
>  
> -static tree alias_ctad_tweaks (tree, tree);
> -
>  /* Return a C++20 aggregate deduction candidate for TYPE initial

Re: Ping: [PATCH] Fix PR112419

2023-11-27 Thread Martin Uecker
Am Montag, dem 27.11.2023 um 16:54 +0100 schrieb Martin Uecker:
> Am Montag, dem 27.11.2023 um 08:36 -0700 schrieb Jeff Law:
> > 
> > On 11/23/23 10:05, Hans-Peter Nilsson wrote:
> > > > From: Hans-Peter Nilsson 
> > > > Date: Thu, 16 Nov 2023 05:24:06 +0100
> > > > 
> > > > > From: Martin Uecker 
> > > > > Date: Tue, 07 Nov 2023 06:56:25 +0100
> > > > 
> > > > > Am Montag, dem 06.11.2023 um 21:01 -0700 schrieb Jeff Law:
> > > > > > 
> > > > > > On 11/6/23 20:58, Hans-Peter Nilsson wrote:
> > > > > > > This patch caused a testsuite regression: there's now an
> > > > > > > "excess error" failure for gcc.dg/Wnonnull-4.c for 32-bit
> > > > > > > targets (and 64-bit targets testing with a "-m32" option)
> > > > > > > after your r14-5115-g6e9ee44d96e5.  It's logged as PR112419.
> > > > > > It caused failures for just about every target ;(  Presumably it 
> > > > > > worked
> > > > > > on x86_64...
> > > > > 
> > > > > I do not think this is a true regression
> > > > > just a problem with the test on 32-bit which somehow surfaced
> > > > > due to the change.
> > > > > 
> > > > > The excess error is:
> > > > > 
> > > > > FAIL: gcc.dg/Wnonnull-4.c (test for excess errors)
> > > > > Excess errors:
> > > > > /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/Wnonnull-4.c:144:3:
> > > > >  warning: 'fda_n_5' specified size 4294967256 exceeds maximum object 
> > > > > size
> > > > > 2147483647 [-Wstringop-overflow=]
> > > > > 
> > > > > I think the warning was suppressed before due to the other (nonnull)
> > > > > warning which I removed in this case.
> > > > > 
> > > > > I think the simple fix might be to to turn off -Wstringop-overflow.
> > > > 
> > > > No, that trigs many of the dg-warnings that are tested.
> > > > 
> > > > (I didn't pay attention to the actual warning messages and
> > > > tried to pursue that at first.)
> > > > 
> > > > Maybe think it's best to actually expect the warning, like
> > > > so.
> > > > 
> > > > Maintainers of 16-bit targets will have to address their
> > > > concerns separately.  For example, they may choose to not
> > > > run the test at all.
> > > > 
> > > > Ok to commit?
> > > > 
> > > > Subject: [PATCH] gcc.dg/Wnonnull-4.c: Handle new overflow warning for 
> > > > 32-bit targets [PR112419]
> > > > 
> > > > PR testsuite/112419
> > > > * gcc.dg/Wnonnull-4.c (test_fda_n_5): Expect warning for 
> > > > exceeding
> > > > maximum object size for 32-bit targets.
> > > > ---
> > > >   gcc/testsuite/gcc.dg/Wnonnull-4.c | 1 +
> > > >   1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.dg/Wnonnull-4.c 
> > > > b/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > > index 1f14fbba45df..d63e76da70a2 100644
> > > > --- a/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > > +++ b/gcc/testsuite/gcc.dg/Wnonnull-4.c
> > > > @@ -142,6 +142,7 @@ void test_fda_n_5 (int r_m1)
> > > > T (  1);  // { dg-bogus "argument 2 of variable length 
> > > > array 'double\\\[n]\\\[5]' is null but the corresponding bound argument 
> > > > 1 value is 1" }
> > > > T (  9);  // { dg-bogus "argument 2 of variable length 
> > > > array 'double\\\[n]\\\[5]' is null but the corresponding bound argument 
> > > > 1 value is 9" }
> > > > T (max);  // { dg-bogus "argument 2 of variable length 
> > > > array 'double\\\[n]\\\[5]' is null but the corresponding bound argument 
> > > > 1 value is \\d+" }
> > > > +// { dg-warning "size 4294967256 exceeds maximum object size" "" { 
> > > > target ilp32 } .-1 }
> > > >   }
> > Unfortunately I think we need to go back to the original issue that 
> > Martin (I think) dismissed.
> > 
> > Specifically, this is a regression.  It's very clear that prior to the 
> > patch in question there was no diagnostic about the size of the 
> > requested memory allocation and after the patch in question we get the 
> > "exceeds maximum object size" diagnostic.
> > 
> > Now one explanation could be that the diagnostic is warranted and it was 
> > a bug that the diagnostic hadn't been emitted prior to Martin's patch. 
> > In this case some kind of dg-blah is warranted, but I don't think anyone 
> > has made this argument.
> > 
> I believe the warning is correct but was suppressed before.
> 
> 
> My plan was to split up the test case in one which is for
> -Wstringop-overflow and one which is for -Wnonnull and then
> one could turn off the -Wstringop-overflow for the tests
> which are actually for -Wnonnull.  But adding the dg-blah
> would certainly be simpler.

Specifically, also with 13.2 if you suppress the warning which
I removed with -Wno-nonnull you will get the otherwise hidden
-Wstringop-overflow warning with -m32:

See here: https://godbolt.org/z/ev5GhMonq

The warning also seems correct to me, so I suggest to accept
the proposed patch. 

Martin






RE: [PATCH] aarch64: Improve cost of `a ? {-,}1 : b`

2023-11-27 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, November 27, 2023 7:35 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Improve cost of `a ? {-,}1 : b`
> 
> Andrew Pinski  writes:
> > While looking into PR 112454, I found the cost for `(if_then_else
> > (cmp) (const_int 1) (reg))` was being recorded as 8 (or `COSTS_N_INSNS
> > (2)`) but it should have been 4 (or `COSTS_N_INSNS (1)`).
> > This improves the cost by not adding the cost of `(const_int 1)` to
> > the total cost.
> >
> > It does not does not fix PR 112454 as that requires other changes to
> > forwprop the `(const_int 1)` earlier than combine.
> >
> > Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
> > Don't add the cost of `1` or `-1`.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/aarch64.cc | 13 ++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index f6f6f94bf43..63241c5aaa5 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -11642,9 +11642,16 @@ aarch64_if_then_else_costs (rtx op0, rtx op1,
> rtx op2, int *cost, bool speed)
> > /* CSINV/NEG with zero extend + const 0 (*csinv3_uxtw_insn3).  */
> > op1 = XEXP (inner, 0);
> > }
> > -
> > -  *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> > -  *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
> > +  if (op2 == constm1_rtx || op2 == const1_rtx)
> > +   *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> > +  else if (op1 == constm1_rtx || op1 == const1_rtx)
> > +   *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
> 
> It looks like this is really an extra option on top of the previous if-else 
> chain,
> since it only applies when OP1 and OP2 are still the operands of the
> if_then_else.  So how about:
> 
>   else if (op1 == constm1_rtx || op1 == const1_rtx)
> {
> /* Use CSINV.  */
> *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
> return true;
> }
>   else if (op2 == constm1_rtx || op2 == const1_rtx)
> {
> /* Use CSINV.  */
> *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> return true;
> }
> 
> leaving the code to fall through to:
> 
>   *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
>   *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 2, speed);
>   return true;
> 
> as it does currently.  OK in that form if you agree.

Yes I think this is the correct way of implementing this, Let me test it and 
get back to you.

Thanks,
Andrew

> 
> Let me know if you don't.  But in that case:
> 
> > +  else
> > +   {
> > + *cost += rtx_cost (op1, VOIDmode, IF_THEN_ELSE, 1, speed);
> > + *cost += rtx_cost (op2, VOIDmode, IF_THEN_ELSE, 1, speed);
> 
> should be 2, speed
> 
> > +   }
> > +
> 
> Thanks,
> Richard


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Jeff Law




On 11/26/23 09:42, rep.dot@gmail.com wrote:

On 22 November 2023 23:23:41 CET, Jeff Law  wrote:



On 11/20/23 11:56, Dimitar Dimitrov wrote:

On Sun, Nov 19, 2023 at 05:47:56PM -0700, Jeff Law wrote:
...



+  enum rtx_code xcode = GET_CODE (x);
+  if (xcode == SET)
+   {
+ const_rtx dst = SET_DEST (x);
+ rtx src = SET_SRC (x);
+ const_rtx y;
+ unsigned HOST_WIDE_INT bit = 0;
+
+ /* The code of the RHS of a SET.  */
+ enum rtx_code code = GET_CODE (src);
+
+ /* ?!? How much of this should mirror SET handling, potentially
+being shared?   */
+ if (SUBREG_BYTE (dst).is_constant () && SUBREG_P (dst))


Shouldn't SUBREG_P be checked first like:
  if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())

Yes, absolutely. It'll be fixed in the next update.

This also highlighted that I never added pru-elf to the configurations in my 
tester.  I remember thinking that it needed to be added, but obviously that 
mental TODO got lost.  I've just fixed that.



And please drop the superfluous enum from rtx_code while at it?

Sure.
jeff


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Jeff Law




On 11/27/23 04:30, Andrew Stubbs wrote:
I tried this patch for AMD GCN. We have a similar problem with excess 
extends, but also for vector modes. Each lane has a minimum 32 bits and 
GCC's normal assumption is that vector registers have precisely the 
number of bits they need, so the amdgcn backend patterns have explicit 
sign/zero extends for QImode and HImode for the instructions that might 
need it. It would be cool if this pass could eliminate some of those, 
but at this point I just wanted to check it didn't break anything.


Unfortunately I get a crash building libgcc:
I strongly suspect this is the same thing that was originally reported 
by Xi Ruoyao.  Just getting back on top of things after the holiday. 
I'll get the V2 posted today.


Jeff


Re: [RFC] vect: disable multiple calls of poly simdclones

2023-11-27 Thread Andre Vieira (lists)




On 06/11/2023 07:52, Richard Biener wrote:

On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:


Hi,

The current codegen code to support VF's that are multiples of a simdclone
simdlen rely on BIT_FIELD_REF to create multiple input vectors.  This does not
work for non-constant simdclones, so we should disable using such clones when
the VF is a multiple of the non-constant simdlen until we change the codegen
to support those.

Enabling SVE simdclone support will cause ICEs if the vectorizer decides to
use a SVE simdclone with a VF that is larger than the simdlen. I'll be away
for the next two weeks, so cant' really discuss this further.
I initially tried to solve the problem, but the way
vectorizable_simd_clone_call is structured doesn't make it easy to replace
BIT_FIELD_REF with the poly-suitable solution right now of using
unpack_{hi,lo}.


I think it should be straight-forward to use unpack_{even,odd} (it's
even/odd for VLA, right?  If lo/hi would be possible then doing
BIT_FIELD_REF would be, too?  Also you need to have multiple stages
of unpack/pack when the factor is more than 2).

There's plenty of time even during stage3 to address this.

At least your patch should have come with a testcase (or two).


Yeah I didn't add one as it didn't trigger on AArch64 without my two 
outstanding aarch64 simdclone patches.


Is there a bugreport tracking this issue?  It should affect GCN as well
I guess.


No, since I can't trigger them yet on trunk until the reviews on my 
target specific patches are done and they are committed.


I don't have a GCN backend lying around but I suspect GCN doesn't use 
poly simdlen simdclones yet either... I haven't checked. The issue 
triggers for aarch64 when trying to generate SVE simdclones for 
functions with mixed types.  I'll give the unpack thing a go locally.


Re: PR111754

2023-11-27 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> PR111754: Rework encoding of result for VEC_PERM_EXPR with constant input 
> vectors.
>
> gcc/ChangeLog:
>   PR middle-end/111754
>   * fold-const.cc (fold_vec_perm_cst): Set result's encoding to sel's
>   encoding, and set res_nelts_per_pattern to 2 if sel contains stepped
>   sequence but input vectors do not.
>   (test_nunits_min_2): New test Case 8.
>   (test_nunits_min_4): New tests Case 8 and Case 9.
>
> gcc/testsuite/ChangeLog:
>   PR middle-end/111754
>   * gcc.target/aarch64/sve/slp_3.c: Adjust code-gen.
>   * gcc.target/aarch64/sve/slp_4.c: Likewise.
>   * gcc.dg/vect/pr111754.c: New test.

OK, thanks.

Richard

> Co-authored-by: Richard Sandiford 
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 332bc8aead2..dff09b81f7b 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10803,27 +10803,38 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices &sel,
>unsigned res_npatterns, res_nelts_per_pattern;
>unsigned HOST_WIDE_INT res_nelts;
>  
> -  /* (1) If SEL is a suitable mask as determined by
> - valid_mask_for_fold_vec_perm_cst_p, then:
> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> - res_nelts_per_pattern = max of nelts_per_pattern between
> -  ARG0, ARG1 and SEL.
> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> - res_npatterns = nelts in result vector.
> - res_nelts_per_pattern = 1.
> - This exception is made so that VLS ARG0, ARG1 and SEL work as before.  
> */
> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> -{
> -  res_npatterns
> - = std::max (VECTOR_CST_NPATTERNS (arg0),
> - std::max (VECTOR_CST_NPATTERNS (arg1),
> -   sel.encoding ().npatterns ()));
> +  /* First try to implement the fold in a VLA-friendly way.
> +
> + (1) If the selector is simply a duplication of N elements, the
> +  result is likewise a duplication of N elements.
> +
> + (2) If the selector is N elements followed by a duplication
> +  of N elements, the result is too.
> +
> + (3) If the selector is N elements followed by an interleaving
> +  of N linear series, the situation is more complex.
> +
> +  valid_mask_for_fold_vec_perm_cst_p detects whether we
> +  can handle this case.  If we can, then each of the N linear
> +  series either (a) selects the same element each time or
> +  (b) selects a linear series from one of the input patterns.
>  
> -  res_nelts_per_pattern
> - = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> - std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> -   sel.encoding ().nelts_per_pattern ()));
> +  If (b) holds for one of the linear series, the result
> +  will contain a linear series, and so the result will have
> +  the same shape as the selector.  If (a) holds for all of
> +  the linear series, the result will be the same as (2) above.
>  
> +  (b) can only hold if one of the input patterns has a
> +  stepped encoding.  */
> +
> +  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> +{
> +  res_npatterns = sel.encoding ().npatterns ();
> +  res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> +  if (res_nelts_per_pattern == 3
> +   && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> +   && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> + res_nelts_per_pattern = 2;
>res_nelts = res_npatterns * res_nelts_per_pattern;
>  }
>else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> @@ -17622,6 +17633,29 @@ test_nunits_min_2 (machine_mode vmode)
>   tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
>   validate_res (1, 3, res, expected_res);
>}
> +
> +  /* Case 8: Same as aarch64/sve/slp_3.c:
> +  arg0, arg1 are dup vectors.
> +  sel = { 0, len, 1, len+1, 2, len+2, ... } // (2, 3)
> +  So res = { arg0[0], arg1[0], ... } // (2, 1)
> +
> +  In this case, since the input vectors are dup, only the first two
> +  elements per pattern in sel are considered significant.  */
> +  {
> + tree arg0 = build_vec_cst_rand (vmode, 1, 1);
> + tree arg1 = build_vec_cst_rand (vmode, 1, 1);
> + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> + vec_perm_builder builder (len, 2, 3);
> + poly_uint64 mask_elems[] = { 0, len, 1, len + 1, 2, len + 2 };
> + builder_push_elems (builder, mask_elems);
> +
> + vec_perm_indices sel (builder, 2, len);
> + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> + tree expected_res[] = { ARG0(0), ARG1(0) };
> + validate_res (2, 1, res, expected_res);
> +  }
>  }
>  }
>  
> @@ -17790,6 +17824,44 @@ test_nunits_min_4 (machine_mode vmode)
>   ASSERT_TRUE (res == NULL_TREE);
>   ASSERT_TR

Re: [PATCH V2 3/3] OpenMP: Use enumerators for names of trait-sets and traits

2023-11-27 Thread Tobias Burnus

Hi Sandra,

{BTW: 1/3 needs to be eventually rebased as it no longer applies
cleanly; I have not checked 2/3 or 3/3 yet.]

1/3+2/3 look good to me, unless Jakub has some comments, I think they
can go it.

Regarding 3/3, some first comments. I still want to read it a bit more
careful and play with it.

On 22.11.23 17:22, Sandra Loosemore wrote:

+static const char *const vendor_properties[] =
+  { "amd", "arm", "bsc", "cray", "fujitsu", "gnu", "ibm", "intel",
+"llvm", "nvidia", "pgi", "ti", "unknown", NULL };


Can you add "hpe"? Cf. "OpenMP API 5.2 Supplementary Source Code" at
https://www.openmp.org/specifications/


+static const char *const atomic_default_mem_order_properties[] =
+  { "seq_cst", "relaxed", "acq_rel", NULL };


Can you add "acquire" and "release"? Those have been added in OpenMP 5.1
for 'omp atomic', supported since GCC 12; albeit, for requires, that's
new since 5.2.


+   { "atomic_default_mem_order",
+ (1 << OMP_TRAIT_SET_IMPLEMENTATION),
+ OMP_TRAIT_PROPERTY_ID, true,
+ atomic_default_mem_order_properties,
+   },
+   { "requires",
+ (1 << OMP_TRAIT_SET_IMPLEMENTATION),
+ OMP_TRAIT_PROPERTY_CLAUSE_LIST, true,
+ NULL
+   },
+   { "unified_address",
+ (1 << OMP_TRAIT_SET_IMPLEMENTATION),
+ OMP_TRAIT_PROPERTY_NONE, true,
+ NULL
+   },


I don't understand this code. This looks as if "requires" and "unified_address"
are on the same level but in my understanding they have to be used as in:

 match(implementation = {requires(unified_address, 
atomic_default_mem_order_properties(release)})

while from the syntax, it looks as if this would permit:

 match(implementation = {unified_address, 
atomic_default_mem_order_properties(release))

Disclaimer: It might be that the code handles it correctly but I just misread 
it.
Or that I misread the spec.

 * * *


+   warning_at (loc, 0,
+   "unknown property %qE of %qs selector",


All '0' OpenMP warnings should now use 'OPT_Wopenmp' instead.

 * * *


-   if (selectors[i] == NULL)
+   /* Some trait sets permit extension traits which are supposed
+  to be ignored if the implementation doesn't support them.
+  GCC does not support any extension traits, and if it did, they
+  would have their own identifiers.  */


I am not sure whether I get this correctly. In my understanding

  match(implementation = {extension(ompx_myCompiler_abcd)])

should parse without error - but evaluate as false / not matching. Thus, it is 
not really
ignored but parsed – but still causing a not-matched.

(We can argue whether that should be silently accepted or still show a warning.)


Likewise for:
  match (implementation = { ompx_myCompiler_abcd(1) } )

albeit here a warning could make more sense than for 'extension', especially if 
a
typo fix would be available.

From the comment, it looks like as it is completely ignored - such that there 
could be still a match.

Disclaimer: I might have misunderstood the code - or might have missed 
something in the spec.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH v2] Fixed problem with BTF defining smaller enums.

2023-11-27 Thread Cupertino Miranda

Hi everyone,

David: Thanks for the v1 review.

This version adds the following;
 - test case,
 - improves condition logic,
 - fixes mask typo.

Looking forward to your review.

v1 at: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636391.html

Cheers,
Cupertino

commit 3f89d352a4ee90882089142d743f8a748013b5fe
Author: Cupertino Miranda 
Date:   Fri Nov 10 14:02:30 2023 +

Fixed problem with BTF defining smaller enums.

This patch fixes a BTF, which would become invalid when having
smaller then 4 byte definitions of enums.
For example, when using the __attribute__((mode(byte))) in the enum
definition.

Two problems were identified:
 - it would incorrectly create an entry for enum64 when the size of the
   enum was different then 4.
 - it would allocate less then 4 bytes for the value entry in BTF, in
   case the type was smaller.

BTF generated was validated against clang.

gcc/ChangeLog:
* bpfout.cc (btf_calc_num_vbytes): Fixed logic for enum64.
(btf_asm_enum_const): Corrected logic for enum64 and smaller
than 4 bytes values.

gcc/testsuite/ChangeLog:
gcc.dg/debug/btf/btf-enum-small.c: Added test.

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index e07fed302c24..5f2e99ce4725 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -299,7 +299,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
   break;
 
 case BTF_KIND_ENUM:
-  vlen_bytes += (dtd->dtd_data.ctti_size == 0x8)
+  vlen_bytes += (dtd->dtd_data.ctti_size > 4)
 			? vlen * sizeof (struct btf_enum64)
 			: vlen * sizeof (struct btf_enum);
   break;
@@ -914,8 +914,8 @@ btf_asm_enum_const (unsigned int size, ctf_dmdef_t * dmd, unsigned int idx)
 {
   dw2_asm_output_data (4, dmd->dmd_name_offset, "ENUM_CONST '%s' idx=%u",
 		   dmd->dmd_name, idx);
-  if (size == 4)
-dw2_asm_output_data (size, dmd->dmd_value, "bte_value");
+  if (size <= 4)
+dw2_asm_output_data (size < 4 ? 4 : size, dmd->dmd_value, "bte_value");
   else
 {
   dw2_asm_output_data (4, dmd->dmd_value & 0x, "bte_value_lo32");
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
new file mode 100644
index ..eb8a1bd2c438
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
@@ -0,0 +1,28 @@
+/* Test BTF generation for small enums.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-not "bte_value_lo32" } } */
+/* { dg-final { scan-assembler-not "bte_value_hi32" } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x602\[\t \]+\[^\n\]*btt_info" 1 } } */
+/* { dg-final { scan-assembler-times " ENUM_CONST 'eSMALL' idx=0" 1 } } */
+/* { dg-final { scan-assembler-times " ENUM_CONST 'eSMALLY' idx=1" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"eSMALL.0\"\[\t \]+\[^\n\]*btf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"eSMALLY.0\"\[\t \]+\[^\n\]*btf_string" 1 } } */
+/* { dg-final { scan-assembler-times "bte_value" 2 } } */
+
+enum smalled_enum
+{
+  eSMALL,
+  eSMALLY,
+} __attribute__((mode(byte)));
+
+struct root_struct {
+  enum smalled_enum esmall;
+};
+
+enum smalled_enum
+foo(struct root_struct *root) {
+  return root->esmall;
+}


[PATCH v2] Fortran: fix reallocation on assignment of polymorphic variables [PR110415]

2023-11-27 Thread Andrew Jenner

This is the second version of the patch - previous discussion at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636671.html

This patch adds the testcase from PR110415 and fixes the bug.

The problem is that in a couple of places in trans_class_assignment in
trans-expr.cc, we need to get the run-time size of the polymorphic
object from the vtbl, but we are currently getting that vtbl from the
lhs of the assignment rather than the rhs. This gives us the old value
of the size but we need to pass the new size to __builtin_malloc and
__builtin_realloc.

I'm fixing this by adding a parameter to trans_class_vptr_len_assignment
to retrieve the tree corresponding the vptr from the object on the rhs
of the assignment, and then passing this where it is needed. In the case
where trans_class_vptr_len_assignment returns NULL_TREE for the rhs vptr
we use the lhs vptr as before.

To get this to work I also needed to change the implementation of
trans_class_vptr_len_assignment to create a temporary for the assignment
in more circumstances. Currently, the "a = func()" assignment in MAIN__
doesn't hit the "Create a temporary for complication expressions" case
on line 9951 because "DECL_P (rse->expr)" is true - the expression has
already been placed into a temporary. That means we don't hit the "if
(temp_rhs ..." case on line 10038 and go on to get the vptr_expr from
"gfc_lval_expr_from_sym (gfc_find_vtab (&re->ts))" on line 10057 which
is the vtbl of the static type rather than the dynamic one from the rhs.
So with this fix we create an extra temporary, but that should be
optimised away in the middle-end so there should be no run-time effect.

I'm not sure if this is the best way to fix this (the Fortran front-end
is new territory for me) but I've verified that the testcase passes with
this change, fails without it, and that the change does not introduce
any FAILs when running the gfortran testcases on x86_64-pc-linux-gnu.

After the previous submission, Tobias Burnus found a closely related 
problem and contributed testcases and a fix for it, which I have 
incorporated into this version of the patch. The problem in this case is 
with the __builtin_realloc call that is executed if one polymorphic 
variable is replaced by another. The return value of this call was being 
ignored rather than used to replace the pointer being reallocated.


Is this OK for mainline, GCC 13 and OG13?

Thanks,

Andrew

gcc/fortran/
 PR fortran/110415
 * trans-expr.cc (trans_class_vptr_len_assignment): Add
 from_vptrp parameter. Populate it. Don't check for DECL_P
 when deciding whether to create temporary.
 (trans_class_pointer_fcn, gfc_trans_pointer_assignment): Add
 NULL argument to trans_class_vptr_len_assignment calls.
 (trans_class_assignment): Get rhs_vptr from
 trans_class_vptr_len_assignment and use it for determining size
 for allocation/reallocation. Use return value from realloc.

gcc/testsuite/
 PR fortran/110415
 * gfortran.dg/pr110415.f90: New test.
 * gfortran.dg/asan/pr110415-2.f90: New test.
 * gfortran.dg/asan/pr110415-3.f90: New test.diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 50c4604a025..bfe9996ced6 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9936,7 +9936,8 @@ trans_get_upoly_len (stmtblock_t *block, gfc_expr *expr)
 static tree
 trans_class_vptr_len_assignment (stmtblock_t *block, gfc_expr * le,
 gfc_expr * re, gfc_se *rse,
-tree * to_lenp, tree * from_lenp)
+tree * to_lenp, tree * from_lenp,
+tree * from_vptrp)
 {
   gfc_se se;
   gfc_expr * vptr_expr;
@@ -9944,10 +9945,11 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
   bool set_vptr = false, temp_rhs = false;
   stmtblock_t *pre = block;
   tree class_expr = NULL_TREE;
+  tree from_vptr = NULL_TREE;
 
   /* Create a temporary for complicated expressions.  */
   if (re->expr_type != EXPR_VARIABLE && re->expr_type != EXPR_NULL
-  && rse->expr != NULL_TREE && !DECL_P (rse->expr))
+  && rse->expr != NULL_TREE)
 {
   if (re->ts.type == BT_CLASS && !GFC_CLASS_TYPE_P (TREE_TYPE (rse->expr)))
class_expr = gfc_get_class_from_expr (rse->expr);
@@ -10044,6 +10046,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
tmp = rse->expr;
 
  se.expr = gfc_class_vptr_get (tmp);
+ from_vptr = se.expr;
  if (UNLIMITED_POLY (re))
from_len = gfc_class_len_get (tmp);
 
@@ -10065,6 +10068,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
  gfc_free_expr (vptr_expr);
  gfc_add_block_to_block (block, &se.pre);
  gcc_assert (se.post.head == NULL_TREE);
+ from_vptr = se.expr;
}
   gfc_add_modify (pr

Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
On 11/20/23 11:26, Richard Sandiford wrote:

>> +
>> +  mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
>> +  if (!mask)
>> + mask = -0x1ULL;
>
> Not sure I follow this.  What does the -0x1ULL constant indicate?
> Also, isn't it the mask of the outer register that is shifted, rather
> than the mask of the inner mode?  E.g. if we have:
Jeff Law:
> Inherited.  I should have marked it like the other one as needing
> investigation.  Probably the fastest way is to just rip it out for a
> test to see what breaks.

This is for support of types wider than DImode.

You unsupported tracking of these values in various places, though.


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
 On 11/20/23 11:26, Richard Sandiford wrote:
>> +  /* ?!? What is the point of this adjustment to DST_MASK?  */
>> +  if (code == PLUS || code == MINUS
>> +  || code == MULT || code == ASHIFT)
>> + dst_mask
>> +  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
>
> Yeah, sympathise with the ?!? here :)
Jeff Law:
> Inherited.  Like the other bit of magic I think I'll do a test with them
> pulled out to see if I can make something undesirable trigger.

This represents the carry effect.  Even if the destination only cares about
some high order bits, you have to consider all lower order bits of the inputs.

For ASHIFT, you could refine this in the case of a constant shift count.


[committed] arm: libgcc: tweak warning from __sync_synchronize

2023-11-27 Thread Richard Earnshaw

My previous patch to add an implementation of __sync_syncrhonize with
a warning trips a testsuite failure in fortran (and possibly other
languages as well) as the framework expects no blank lines in the
output, but this warning was generating one.  So remove the newline
from the end of the message and rely on the one added by the linker
instead.

Since we're there, remove the trailing period from the message as
well, since the convention seems to be not to have one.

libgcc/

* config/arm/lib1funcs.S (__sync_synchronize): Adjust warning message.
---
 libgcc/config/arm/lib1funcs.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 78887861616..40e9a7a87fb 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2214,7 +2214,7 @@ LSYM(Lchange_\register):
 	.ascii "no effect.  Relink with\n"
 	.ascii "  -specs=sync-{none,dmb,cp15dmb}.specs\n"
 	.ascii "to specify exactly which barrier format to use and avoid "
-	.ascii "this warning.\n\0"
+	.ascii "this warning\0"
 #endif
 #endif
 #endif


[PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-27 Thread Martin Jambor
Hi,

The enhancement to address PR 109849 contained an importsnt thinko,
and that any reference that is passed to a function and does not
escape, must also not happen to be aliased by the return value of the
function.  This has quickly transpired as bugs PR 112711 and PR
112721.

Just as IPA-modref does a good enough job to allow us to rely on the
escaped set of variables, it sems to be doing well also on updating
EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
exactly the situation we need to avoid.  Of course, if a call
statement ignores any returned value, we also do not need to check the
flag.

Hopefully this does not pessimize things too much, I have verified
that the PR 109849 testcae remains quick and so should also the
benchmark it is derived from.

The patch has passed bootstrap and testing on x86_64-linux, OK for
master?

Thanks,

Martin


gcc/ChangeLog:

2023-11-27  Martin Jambor  

PR tree-optimization/112711
PR tree-optimization/112721
* tree-sra.cc (build_access_from_call_arg): New parameter
CAN_BE_RETURNED, disqualify any candidate passed by reference if it is
true.  Adjust leading comment.
(scan_function): Pass appropriate value to CAN_BE_RETURNED of
build_access_from_call_arg.

gcc/testsuite/ChangeLog:

2023-11-27  Martin Jambor  

PR tree-optimization/112711
PR tree-optimization/112721
* g++.dg/tree-ssa/pr112711.C: New test.
* gcc.dg/tree-ssa/pr112721.c: Likewise.
---
 gcc/testsuite/g++.dg/tree-ssa/pr112711.C | 31 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c | 26 +++
 gcc/tree-sra.cc  | 40 ++--
 3 files changed, 88 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr112711.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr112711.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
new file mode 100644
index 000..c04524b04a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+typedef  int i32;
+typedef unsigned int u32;
+
+static inline void write_i32(void *memory, i32 value) {
+  // swap i32 bytes as if it was u32:
+  u32 u_value = value;
+  value = __builtin_bswap32(u_value);
+
+  // llvm infers '1' alignment from destination type
+  __builtin_memcpy(__builtin_assume_aligned(memory, 1), &value, sizeof(value));
+}
+
+__attribute__((noipa))
+static void bug (void) {
+  #define assert_eq(lhs, rhs) if (lhs != rhs) __builtin_trap()
+
+  unsigned char data[5];
+  write_i32(data, -1362446643);
+  assert_eq(data[0], 0xAE);
+  assert_eq(data[1], 0xCA);
+  write_i32(data + 1, -1362446643);
+  assert_eq(data[1], 0xAE);
+}
+
+int main() {
+bug();
+return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
new file mode 100644
index 000..adf62613266
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+unsigned * volatile gv;
+
+struct a {
+  int b;
+};
+int c, e;
+long d;
+unsigned * __attribute__((noinline))
+f(unsigned *g) {
+  for (; c;)
+e = d;
+  return gv ? gv : g;
+}
+int main() {
+  int *h;
+  struct a i = {8};
+  int *j = &i.b;
+  h = (unsigned *) f(j);
+  *h = 0;
+  if (i.b != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 3a0d52675fe..6a759783990 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1268,18 +1268,27 @@ abnormal_edge_after_stmt_p (gimple *stmt, enum 
out_edge_check *oe_check)
 }
 
 /* Scan expression EXPR which is an argument of a call and create access
-   structures for all accesses to candidates for scalarization.  Return true if
-   any access has been inserted.  STMT must be the statement from which the
-   expression is taken.  */
+   structures for all accesses to candidates for scalarization.  Return true
+   if any access has been inserted.  STMT must be the statement from which the
+   expression is taken.  CAN_BE_RETURNED must be true if call argument flags
+   do not rule out that the argument is directly returned.  OE_CHECK is used
+   to remember result of a test for abnormal outgoing edges after this
+   statement.  */
 
 static bool
-build_access_from_call_arg (tree expr, gimple *stmt,
+build_access_from_call_arg (tree expr, gimple *stmt, bool can_be_returned,
enum out_edge_check *oe_check)
 {
   if (TREE_CODE (expr) == ADDR_EXPR)
 {
   tree base = get_base_address (TREE_OPERAND (expr, 0));
 
+  if (can_be_returned)
+   {
+ disqualify_base_of_expr (base, "Address possibly returned, "
+  "leading to an alis SRA may not know.");
+ return false;
+   }
   if (abnormal_edge_after_stmt_p (stmt, oe_check))
{
   

Re: [PATCH V2 3/3] OpenMP: Use enumerators for names of trait-sets and traits

2023-11-27 Thread Tobias Burnus

On 27.11.23 18:19, Tobias Burnus wrote:

+   { "unified_address",
+ (1 << OMP_TRAIT_SET_IMPLEMENTATION),
+ OMP_TRAIT_PROPERTY_NONE, true,
+ NULL
+   },


I don't understand this code. This looks as if "requires" and
"unified_address"
are on the same level but in my understanding they have to be used as in:

 match(implementation = {requires(unified_address,
atomic_default_mem_order_properties(release)})

while from the syntax, it looks as if this would permit:

 match(implementation = {unified_address,
atomic_default_mem_order_properties(release))



Sandra pointed me to the spec: OpenMP 5.0 only permits the latter, i.e.
using the clause names of 'requires' directly. Since OpenMP 5.1, this
use is deprecated (removed in TR11/TR12) - in favor of the first syntax,
i.e. using them as argument to 'requires()'.

Thus, the code is fine. — And shows all the joy needing to read multiple
spec versions at the same time without getting confused.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-27 Thread Andrew Pinski
On Mon, Nov 27, 2023 at 10:16 AM Martin Jambor  wrote:
>
> Hi,
>
> The enhancement to address PR 109849 contained an importsnt thinko,
> and that any reference that is passed to a function and does not
> escape, must also not happen to be aliased by the return value of the
> function.  This has quickly transpired as bugs PR 112711 and PR
> 112721.
>
> Just as IPA-modref does a good enough job to allow us to rely on the
> escaped set of variables, it sems to be doing well also on updating
> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
> exactly the situation we need to avoid.  Of course, if a call
> statement ignores any returned value, we also do not need to check the
> flag.
>
> Hopefully this does not pessimize things too much, I have verified
> that the PR 109849 testcae remains quick and so should also the
> benchmark it is derived from.
>
> The patch has passed bootstrap and testing on x86_64-linux, OK for
> master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-11-27  Martin Jambor  
>
> PR tree-optimization/112711
> PR tree-optimization/112721
> * tree-sra.cc (build_access_from_call_arg): New parameter
> CAN_BE_RETURNED, disqualify any candidate passed by reference if it is
> true.  Adjust leading comment.
> (scan_function): Pass appropriate value to CAN_BE_RETURNED of
> build_access_from_call_arg.
>
> gcc/testsuite/ChangeLog:
>
> 2023-11-27  Martin Jambor  
>
> PR tree-optimization/112711
> PR tree-optimization/112721
> * g++.dg/tree-ssa/pr112711.C: New test.
> * gcc.dg/tree-ssa/pr112721.c: Likewise.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr112711.C | 31 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr112721.c | 26 +++
>  gcc/tree-sra.cc  | 40 ++--
>  3 files changed, 88 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr112711.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr112711.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
> new file mode 100644
> index 000..c04524b04a7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +typedef  int i32;
> +typedef unsigned int u32;
> +
> +static inline void write_i32(void *memory, i32 value) {
> +  // swap i32 bytes as if it was u32:
> +  u32 u_value = value;
> +  value = __builtin_bswap32(u_value);
> +
> +  // llvm infers '1' alignment from destination type
> +  __builtin_memcpy(__builtin_assume_aligned(memory, 1), &value, 
> sizeof(value));
> +}
> +
> +__attribute__((noipa))
> +static void bug (void) {
> +  #define assert_eq(lhs, rhs) if (lhs != rhs) __builtin_trap()
> +
> +  unsigned char data[5];
> +  write_i32(data, -1362446643);
> +  assert_eq(data[0], 0xAE);
> +  assert_eq(data[1], 0xCA);
> +  write_i32(data + 1, -1362446643);
> +  assert_eq(data[1], 0xAE);
> +}

Only a comment on this testcase, it is only valid for little-endian
and 32bit int targets.
You can modify it to fix it for both though.

Thanks,
Andrew

> +
> +int main() {
> +bug();
> +return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
> new file mode 100644
> index 000..adf62613266
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +unsigned * volatile gv;
> +
> +struct a {
> +  int b;
> +};
> +int c, e;
> +long d;
> +unsigned * __attribute__((noinline))
> +f(unsigned *g) {
> +  for (; c;)
> +e = d;
> +  return gv ? gv : g;
> +}
> +int main() {
> +  int *h;
> +  struct a i = {8};
> +  int *j = &i.b;
> +  h = (unsigned *) f(j);
> +  *h = 0;
> +  if (i.b != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 3a0d52675fe..6a759783990 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -1268,18 +1268,27 @@ abnormal_edge_after_stmt_p (gimple *stmt, enum 
> out_edge_check *oe_check)
>  }
>
>  /* Scan expression EXPR which is an argument of a call and create access
> -   structures for all accesses to candidates for scalarization.  Return true 
> if
> -   any access has been inserted.  STMT must be the statement from which the
> -   expression is taken.  */
> +   structures for all accesses to candidates for scalarization.  Return true
> +   if any access has been inserted.  STMT must be the statement from which 
> the
> +   expression is taken.  CAN_BE_RETURNED must be true if call argument flags
> +   do not rule out that the argument is directly returned.  OE_CHECK is used
> +   to remember result of a test for abnormal outgoing edges after this
> +   statement.  */
>
>  static bool
> -build_access_from_call_arg (tree expr, gimple *stmt,
> +build_access_from_call_arg (tree exp

Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
You are applying PATTERN to an INSN_LIST.
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 52032b50951..4523654538c 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -122,10 +122,9 @@ safe_for_live_propagation (rtx_code code)
optimziation phase during use handling will be.  */
 
 static void
-ext_dce_process_sets (rtx insn, bitmap livenow, bitmap live_tmp)
+ext_dce_process_sets (rtx insn, rtx pat, bitmap livenow, bitmap live_tmp)
 {
   subrtx_iterator::array_type array;
-  rtx pat = PATTERN (insn);
   FOR_EACH_SUBRTX (iter, array, pat, NONCONST)
 {
   const_rtx x = *iter;
@@ -377,7 +376,7 @@ binop_implies_op2_fully_live (rtx_code code)
eliminated in CHANGED_PSEUDOS.  */
 
 static void
-ext_dce_process_uses (rtx insn, bitmap livenow, bitmap live_tmp,
+ext_dce_process_uses (rtx insn, rtx pat, bitmap livenow, bitmap live_tmp,
  bool modify, bitmap changed_pseudos)
 {
   /* A nonlocal goto implicitly uses the frame pointer.  */
@@ -389,7 +388,6 @@ ext_dce_process_uses (rtx insn, bitmap livenow, bitmap 
live_tmp,
 }
 
   subrtx_var_iterator::array_type array_var;
-  rtx pat = PATTERN (insn);
   FOR_EACH_SUBRTX_VAR (iter, array_var, pat, NONCONST)
 {
   /* An EXPR_LIST (from call fusage) ends in NULL_RTX.  */
@@ -640,15 +638,16 @@ ext_dce_process_bb (basic_block bb, bitmap livenow,
   bitmap live_tmp = BITMAP_ALLOC (NULL);
 
   /* First process any sets/clobbers in INSN.  */
-  ext_dce_process_sets (insn, livenow, live_tmp);
+  ext_dce_process_sets (insn, PATTERN (insn), livenow, live_tmp);
 
   /* CALL_INSNs need processing their fusage data.  */
   if (GET_CODE (insn) == CALL_INSN)
-   ext_dce_process_sets (CALL_INSN_FUNCTION_USAGE (insn),
+   ext_dce_process_sets (insn, CALL_INSN_FUNCTION_USAGE (insn),
  livenow, live_tmp);
 
   /* And now uses, optimizing away SIGN/ZERO extensions as we go.  */
-  ext_dce_process_uses (insn, livenow, live_tmp, modify, changed_pseudos);
+  ext_dce_process_uses (insn, PATTERN (insn), livenow, live_tmp, modify,
+   changed_pseudos);
 
   /* And process fusage data for the use as well.  */
   if (GET_CODE (insn) == CALL_INSN)
@@ -663,7 +662,7 @@ ext_dce_process_bb (basic_block bb, bitmap livenow,
  if (global_regs[i])
bitmap_set_range (livenow, i * 4, 4);
 
- ext_dce_process_uses (CALL_INSN_FUNCTION_USAGE (insn),
+ ext_dce_process_uses (insn, CALL_INSN_FUNCTION_USAGE (insn),
livenow, live_tmp, modify, changed_pseudos);
}
 


Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-27 Thread Richard Sandiford
Catching up on backlog, so this might already be resolved, but:

Richard Biener  writes:
> On Tue, 7 Nov 2023, Tamar Christina wrote:
>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Tuesday, November 7, 2023 9:43 AM
>> > To: Tamar Christina 
>> > Cc: gcc-patches@gcc.gnu.org; nd 
>> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
>> > vectorization
>> > 
>> > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > 
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Monday, November 6, 2023 2:25 PM
>> > > > To: Tamar Christina 
>> > > > Cc: gcc-patches@gcc.gnu.org; nd 
>> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
>> > > > auto- vectorization
>> > > >
>> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > This patch adds initial support for early break vectorization in GCC.
>> > > > > The support is added for any target that implements a vector
>> > > > > cbranch optab, this includes both fully masked and non-masked 
>> > > > > targets.
>> > > > >
>> > > > > Depending on the operation, the vectorizer may also require
>> > > > > support for boolean mask reductions using Inclusive OR.  This is
>> > > > > however only checked then the comparison would produce multiple
>> > statements.
>> > > > >
>> > > > > Note: I am currently struggling to get patch 7 correct in all
>> > > > > cases and could
>> > > > use
>> > > > >   some feedback there.
>> > > > >
>> > > > > Concretely the kind of loops supported are of the forms:
>> > > > >
>> > > > >  for (int i = 0; i < N; i++)
>> > > > >  {
>> > > > >
>> > > > >if ()
>> > > > >  {
>> > > > >...
>> > > > >;
>> > > > >  }
>> > > > >
>> > > > >  }
>> > > > >
>> > > > > where  can be:
>> > > > >  - break
>> > > > >  - return
>> > > > >  - goto
>> > > > >
>> > > > > Any number of statements can be used before the  occurs.
>> > > > >
>> > > > > Since this is an initial version for GCC 14 it has the following
>> > > > > limitations and
>> > > > > features:
>> > > > >
>> > > > > - Only fixed sized iterations and buffers are supported.  That is to 
>> > > > > say any
>> > > > >   vectors loaded or stored must be to statically allocated arrays 
>> > > > > with
>> > known
>> > > > >   sizes. N must also be known.  This limitation is because our 
>> > > > > primary
>> > target
>> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do 
>> > > > > cross page
>> > > > >   iteraion checks. The result is likely to also not be beneficial. 
>> > > > > For that
>> > > > >   reason we punt support for variable buffers till we have 
>> > > > > First-Faulting
>> > > > >   support in GCC.
>> > 
>> > Btw, for this I wonder if you thought about marking memory accesses 
>> > required
>> > for the early break condition as required to be vector-size aligned, thus 
>> > peeling
>> > or versioning them for alignment?  That should ensure they do not fault.
>> > 
>> > OTOH I somehow remember prologue peeling isn't supported for early break
>> > vectorization?  ..
>> > 
>> > > > > - any stores in  should not be to the same objects as in
>> > > > >   .  Loads are fine as long as they don't have the 
>> > > > > possibility to
>> > > > >   alias.  More concretely, we block RAW dependencies when the
>> > > > > intermediate
>> > > > value
>> > > > >   can't be separated fromt the store, or the store itself can't be 
>> > > > > moved.
>> > > > > - Prologue peeling, alignment peelinig and loop versioning are 
>> > > > > supported.
>> > 
>> > .. but here you say it is.  Not sure if peeling for alignment works for 
>> > VLA vectors
>> > though.  Just to say x86 doesn't support first-faulting loads.
>> 
>> For VLA we support it through masking.  i.e. if you need to peel N 
>> iterations, we
>> generate a masked copy of the loop vectorized which masks off the first N 
>> bits.
>> 
>> This is not typically needed, but we do support it.  But the problem with 
>> this
>> scheme and early break is obviously that the peeled loop needs to be 
>> vectorized
>> so you kinda end up with the same issue again.  So Atm it rejects it for VLA.
>
> Hmm, I see.  I thought peeling by masking is an optimization.

Yeah, it's an opt-in optimisation.  No current Arm cores opt in though.

> Anyhow, I think it should still work here - since all accesses are aligned
> and we know that there's at least one original scalar iteration in the
> first masked and the following "unmasked" vector iterations there
> should never be faults for any of the aligned accesses.

Peeling via masking works by using the main loop for the "peeled"
iteration (so it's a bit of a misnomer).  The vector pointers start
out lower than the original scalar pointers, with some leading
inactive elements.

The awkwardness would be in skipping those leading inactive elements
in the epilogue, if an early break occurs in the first vector iteration.
Definitely doable, b

Re: [r14-5666 Regression] FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read tp_first_run: 2" 1 on Linux/x86_64

2023-11-27 Thread Andrew Pinski
On Mon, Nov 27, 2023 at 12:00 AM Sebastian Huber
 wrote:
>
> On 26.11.23 12:18, haochen.jiang wrote:
> > On Linux/x86_64,
> >
> > 41aacdea55c5d795a7aa195357d966645845d00e is the first bad commit
> > commit 41aacdea55c5d795a7aa195357d966645845d00e
> > Author: Sebastian Huber
> > Date:   Mon Nov 20 15:26:38 2023 +0100
> >
> >  gcov: Fix integer types in gen_counter_update()
> >
> > caused
> >
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read 
> > tp_first_run: 0" 1
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile "Read 
> > tp_first_run: 2" 1
>
> Please have a look at:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638104.html

Also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112689 .
Anyways I am testing a patch to fix this one.

Thanks,
Andrew

>
> --
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH v2] Fixed problem with BTF defining smaller enums.

2023-11-27 Thread David Faust
Hi Cupertino,

On 11/27/23 09:21, Cupertino Miranda wrote:
> Hi everyone,
> 
> David: Thanks for the v1 review.
> 
> This version adds the following;
>  - test case,
>  - improves condition logic,
>  - fixes mask typo.
> 
> Looking forward to your review.

v2 LGTM, please apply.
Thanks!

> 
> v1 at: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636391.html
> 
> Cheers,
> Cupertino
> 
> 
> 0004-Fixed-problem-with-BTF-defining-smaller-enums.patch
> 
> commit 3f89d352a4ee90882089142d743f8a748013b5fe
> Author: Cupertino Miranda 
> Date:   Fri Nov 10 14:02:30 2023 +
> 
> Fixed problem with BTF defining smaller enums.
> 
> This patch fixes a BTF, which would become invalid when having
> smaller then 4 byte definitions of enums.
> For example, when using the __attribute__((mode(byte))) in the enum
> definition.
> 
> Two problems were identified:
>  - it would incorrectly create an entry for enum64 when the size of the
>enum was different then 4.
>  - it would allocate less then 4 bytes for the value entry in BTF, in
>case the type was smaller.
> 
> BTF generated was validated against clang.
> 
> gcc/ChangeLog:
> * bpfout.cc (btf_calc_num_vbytes): Fixed logic for enum64.
> (btf_asm_enum_const): Corrected logic for enum64 and smaller
> than 4 bytes values.
> 
> gcc/testsuite/ChangeLog:
> gcc.dg/debug/btf/btf-enum-small.c: Added test.
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index e07fed302c24..5f2e99ce4725 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -299,7 +299,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
>break;
>  
>  case BTF_KIND_ENUM:
> -  vlen_bytes += (dtd->dtd_data.ctti_size == 0x8)
> +  vlen_bytes += (dtd->dtd_data.ctti_size > 4)
>   ? vlen * sizeof (struct btf_enum64)
>   : vlen * sizeof (struct btf_enum);
>break;
> @@ -914,8 +914,8 @@ btf_asm_enum_const (unsigned int size, ctf_dmdef_t * dmd, 
> unsigned int idx)
>  {
>dw2_asm_output_data (4, dmd->dmd_name_offset, "ENUM_CONST '%s' idx=%u",
>  dmd->dmd_name, idx);
> -  if (size == 4)
> -dw2_asm_output_data (size, dmd->dmd_value, "bte_value");
> +  if (size <= 4)
> +dw2_asm_output_data (size < 4 ? 4 : size, dmd->dmd_value, "bte_value");
>else
>  {
>dw2_asm_output_data (4, dmd->dmd_value & 0x, "bte_value_lo32");
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c 
> b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
> new file mode 100644
> index ..eb8a1bd2c438
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
> @@ -0,0 +1,28 @@
> +/* Test BTF generation for small enums.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -gbtf -dA" } */
> +
> +/* { dg-final { scan-assembler-not "bte_value_lo32" } } */
> +/* { dg-final { scan-assembler-not "bte_value_hi32" } } */
> +/* { dg-final { scan-assembler-times "\[\t \]0x602\[\t 
> \]+\[^\n\]*btt_info" 1 } } */
> +/* { dg-final { scan-assembler-times " ENUM_CONST 'eSMALL' idx=0" 1 } } */
> +/* { dg-final { scan-assembler-times " ENUM_CONST 'eSMALLY' idx=1" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"eSMALL.0\"\[\t 
> \]+\[^\n\]*btf_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"eSMALLY.0\"\[\t 
> \]+\[^\n\]*btf_string" 1 } } */
> +/* { dg-final { scan-assembler-times "bte_value" 2 } } */
> +
> +enum smalled_enum
> +{
> +  eSMALL,
> +  eSMALLY,
> +} __attribute__((mode(byte)));
> +
> +struct root_struct {
> +  enum smalled_enum esmall;
> +};
> +
> +enum smalled_enum
> +foo(struct root_struct *root) {
> +  return root->esmall;
> +}
> 


Re: hurd: Add multilib paths for gnu-x86_64

2023-11-27 Thread Samuel Thibault
Hello,

Thomas Schwinge, le lun. 27 nov. 2023 15:48:33 +0100, a ecrit:
> On 2023-10-28T21:19:59+0200, Samuel Thibault  wrote:
> > This is essentially based on t-linux64 version.
> 
> Yes, but isn't the overall setup diverged from GNU/Linux?

Not sure what you mean exactly?
I just meant that the content of t-gnu64 is almost the same as
t-linux64, the only difference being the multiarch path.

> Currently, x86_64 GNU/Hurd first gets 'i386/t-linux64', whose definitons
> are only later:
> 
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -5828,6 +5828,9 @@ case ${target} in
> >   visium-*-*)
> >   target_cpu_default2="TARGET_CPU_$with_cpu"
> >   ;;
> > + x86_64-*-gnu*)
> > + tmake_file="$tmake_file i386/t-gnu64"
> > + ;;
> >  esac
> 
> ... then here (effectively) overwritten by 'i386/t-gnu64'.

Yes, like it is done for the x86_64-*-freebsd*) case

> Instead, I suppose, we should handle 'i386/t-linux64' and
> 'i386/t-gnu64' alike, and resolve relevant configuration differences.

So essentially move 

tmake_file="${tmake_file} i386/t-linux64"

down from where it is currently, to the 

# Set some miscellaneous flags for particular targets.
target_cpu_default2=
case ${target} in

part? That should be fine for kfreebsd as well.

> As fas a I can tell, 'i386/t-linux64' is also used for multilib-enabled
> ('test x$enable_targets = xall') x86 GNU/Linux, and that's not
> (correspondingly) done for x86 GNU/Hurd?

We don't really plan to support 32/64 multilib support in GNU/Hurd.

> However, such things can certainly be resolved incrementally, later on.
> I understand that your change does work for you as-is,

Thanks for your understanding :) that'll help pushing further in Debian.

Samuel


Re: hurd: Ad default-pie and static-pie support

2023-11-27 Thread Samuel Thibault
Thomas Schwinge, le lun. 27 nov. 2023 15:52:02 +0100, a ecrit:
> On 2023-10-28T21:20:39+0200, Samuel Thibault  wrote:
> > This fixes the Hurd spec in the default-pie case, and adds static-pie
> > support.
> 
> I understand that your change does work for you as-is, so I've now pushed
> that to master branch in commit c768917402d4cba69a92c737e56e177f5b8ab0df
> "hurd: Ad default-pie and static-pie support", see attached.

Yes, thanks!
Samuel

> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> From c768917402d4cba69a92c737e56e177f5b8ab0df Mon Sep 17 00:00:00 2001
> From: Samuel Thibault 
> Date: Sat, 6 May 2023 13:55:44 +0200
> Subject: [PATCH] hurd: Ad default-pie and static-pie support
> 
> This fixes the Hurd spec in the default-pie case, and adds static-pie
> support.
> 
> gcc/ChangeLog:
> 
>   * config/i386/gnu.h: Use PIE_SPEC, add static-pie case.
>   * config/i386/gnu64.h: Use PIE_SPEC, add static-pie case.
> ---
>  gcc/config/i386/gnu.h   | 6 +++---
>  gcc/config/i386/gnu64.h | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
> index 8dc6d9ee4e3..e776144f96c 100644
> --- a/gcc/config/i386/gnu.h
> +++ b/gcc/config/i386/gnu.h
> @@ -27,12 +27,12 @@ along with GCC.  If not, see 
> .
>  #undef   STARTFILE_SPEC
>  #if defined HAVE_LD_PIE
>  #define STARTFILE_SPEC \
> -  "%{!shared: 
> %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}}
>  \
> -   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
> +  "%{!shared: 
> %{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
>  PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
> +   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
> ":crtbeginS.o%s;:crtbegin.o%s}"
>  #else
>  #define STARTFILE_SPEC \
>"%{!shared: 
> %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
> -   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
> +   crti.o%s %{static:crtbeginT.o%s;shared:crtbeginS.o%s;:crtbegin.o%s}"
>  #endif
>  
>  #ifdef TARGET_LIBC_PROVIDES_SSP
> diff --git a/gcc/config/i386/gnu64.h b/gcc/config/i386/gnu64.h
> index a411f0e802a..332372fa067 100644
> --- a/gcc/config/i386/gnu64.h
> +++ b/gcc/config/i386/gnu64.h
> @@ -31,10 +31,10 @@ along with GCC.  If not, see 
> .
>  #undef   STARTFILE_SPEC
>  #if defined HAVE_LD_PIE
>  #define STARTFILE_SPEC \
> -  "%{!shared: 
> %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}}
>  \
> -   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
> +  "%{!shared: 
> %{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
>  PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
> +   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
> ":crtbeginS.o%s;:crtbegin.o%s}"
>  #else
>  #define STARTFILE_SPEC \
>"%{!shared: 
> %{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
> -   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
> +   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
> ":crtbeginS.o%s;:crtbegin.o%s}"
>  #endif
> -- 
> 2.34.1
> 


-- 
Samuel
---
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.


[PATCH] Fortran: deferred-length character optional dummy arguments [PR93762,PR100651]

2023-11-27 Thread Harald Anlauf
Dear all,

the attached patch fixes the passing of deferred-length character
to optional dummy arguments: the character length shall be passed
by reference, not by value.

Original analysis of the issue by Steve in PR93762, independently
done by FX in PR100651.  The patch fixes both PRs.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As the fix is local and affects only deferred-length character,
would it be ok to backport to 13-branch?

Thanks,
Harald

From 8ce1c8e7d0390361a1507000b7abbf6509b2fee9 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 27 Nov 2023 20:19:11 +0100
Subject: [PATCH] Fortran: deferred-length character optional dummy arguments
 [PR93762,PR100651]

gcc/fortran/ChangeLog:

	PR fortran/93762
	PR fortran/100651
	* trans-expr.cc (gfc_conv_missing_dummy): The character length for
	deferred-length dummy arguments is passed by reference, so that its
	value can be returned.  Adjust handling for optional dummies.

gcc/testsuite/ChangeLog:

	PR fortran/93762
	PR fortran/100651
	* gfortran.dg/optional_deferred_char_1.f90: New test.
---
 gcc/fortran/trans-expr.cc |  22 +++-
 .../gfortran.dg/optional_deferred_char_1.f90  | 100 ++
 2 files changed, 118 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 50c4604a025..e992f60d8bb 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2116,10 +2116,24 @@ gfc_conv_missing_dummy (gfc_se * se, gfc_expr * arg, gfc_typespec ts, int kind)

   if (ts.type == BT_CHARACTER)
 {
-  tmp = build_int_cst (gfc_charlen_type_node, 0);
-  tmp = fold_build3_loc (input_location, COND_EXPR, gfc_charlen_type_node,
-			 present, se->string_length, tmp);
-  tmp = gfc_evaluate_now (tmp, &se->pre);
+  /* Handle deferred-length dummies that pass the character length by
+	 reference so that the value can be returned.  */
+  if (ts.deferred && INDIRECT_REF_P (se->string_length))
+	{
+	  tmp = gfc_build_addr_expr (NULL_TREE, se->string_length);
+	  tmp = fold_build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp),
+ present, tmp, null_pointer_node);
+	  tmp = gfc_evaluate_now (tmp, &se->pre);
+	  tmp = build_fold_indirect_ref_loc (input_location, tmp);
+	}
+  else
+	{
+	  tmp = build_int_cst (gfc_charlen_type_node, 0);
+	  tmp = fold_build3_loc (input_location, COND_EXPR,
+ gfc_charlen_type_node,
+ present, se->string_length, tmp);
+	  tmp = gfc_evaluate_now (tmp, &se->pre);
+	}
   se->string_length = tmp;
 }
   return;
diff --git a/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90 b/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90
new file mode 100644
index 000..d399dd11ca2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90
@@ -0,0 +1,100 @@
+! { dg-do run }
+! PR fortran/93762
+! PR fortran/100651 - deferred-length character as optional dummy argument
+
+program main
+  implicit none
+  character(:), allocatable :: err_msg, msg3(:)
+  character(:), pointer :: err_msg2 => NULL()
+
+  ! Subroutines with optional arguments
+  call to_int ()
+  call to_int_p ()
+  call test_rank1 ()
+  call assert_code ()
+  call assert_p ()
+  call assert_rank1 ()
+
+  ! Test passing of optional arguments
+  call to_int (err_msg)
+  if (.not. allocated (err_msg)) stop 1
+  if (len (err_msg) /= 7)stop 2
+  if (err_msg(1:7) /= "foo bar") stop 3
+
+  call to_int2 (err_msg)
+  if (.not. allocated (err_msg)) stop 4
+  if (len (err_msg) /= 7)stop 5
+  if (err_msg(1:7) /= "foo bar") stop 6
+  deallocate (err_msg)
+
+  call to_int_p (err_msg2)
+  if (.not. associated (err_msg2)) stop 11
+  if (len (err_msg2) /= 8) stop 12
+  if (err_msg2(1:8) /= "poo bla ") stop 13
+  deallocate (err_msg2)
+
+  call to_int2_p (err_msg2)
+  if (.not. associated (err_msg2)) stop 14
+  if (len (err_msg2) /= 8) stop 15
+  if (err_msg2(1:8) /= "poo bla ") stop 16
+  deallocate (err_msg2)
+
+  call test_rank1 (msg3)
+  if (.not. allocated (msg3)) stop 21
+  if (len (msg3) /= 2)stop 22
+  if (size (msg3) /= 42)  stop 23
+  if (any (msg3 /= "ok")) stop 24
+  deallocate (msg3)
+
+contains
+
+  ! Deferred-length character, allocatable:
+  subroutine assert_code (err_msg0)
+character(:), optional, allocatable :: err_msg0
+if (present (err_msg0)) err_msg0 = 'foo bar'
+  end
+  ! Test: optional argument
+  subroutine to_int (err_msg1)
+character(:), optional, allocatable :: err_msg1
+call assert_code (err_msg1)
+  end
+  ! Control: non-optional argument
+  subroutine to_int2 (err_msg2)
+character(:), allocatable :: err_msg2
+call assert_code (err_msg2)
+  end
+
+  ! Rank-1:
+  subroutine assert_rank1 (msg)
+character(:), optional, allocatable, intent(out) :: msg(:)
+if (present (msg)) then
+   allocate (character(2) :: msg(42))
+   msg(:) = "ok"
+end if
+

Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Richard Sandiford
Joern Rennecke  writes:
>  On 11/20/23 11:26, Richard Sandiford wrote:
>>> +  /* ?!? What is the point of this adjustment to DST_MASK?  */
>>> +  if (code == PLUS || code == MINUS
>>> +  || code == MULT || code == ASHIFT)
>>> + dst_mask
>>> +  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
>>
>> Yeah, sympathise with the ?!? here :)
> Jeff Law:
>> Inherited.  Like the other bit of magic I think I'll do a test with them
>> pulled out to see if I can make something undesirable trigger.
>
> This represents the carry effect.  Even if the destination only cares about
> some high order bits, you have to consider all lower order bits of the inputs.
>
> For ASHIFT, you could refine this in the case of a constant shift count.

Ah, right.  Think it would be worth a comment.

But I wonder whether we should centralise all this code-specific
information into a single place.  I.e. rather than having one switch to
say "PLUS is OK" or "AND is OK", and then having code-specific handling
elsewhere, we could enumerate how to handle a code.

Thanks,
Richard


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Richard Sandiford
[Sorry for the slow response]

Jeff Law  writes:
> On 11/20/23 11:26, Richard Sandiford wrote:
>> 
>>scalar_int_mode outer_mode;
>>if (!is_a (GET_MODE (x), &outer_mode)
>>|| GET_MODE_BITSIZE (outer_mode) > 64)
>>  continue;
> Wouldn't we also want to verify that the size is constant, or is it the 
> case that all the variable cases are vector (and would we want to 
> actually depend on that)?

Yeah, all the variable cases are vectors.  We don't support variable-length
scalars at the moment.  (And I hope that never changes. :))

>>> + /* We will handle the other operand of a binary operator
>>> +at the bottom of the loop by resetting Y.  */
>>> + if (BINARY_P (src))
>>> +   y = XEXP (src, 0);
>> 
>> What about UNARY_P, given that NOT is included in the codes above?
> We'll break that inner for(;;) then iterate into the subobject, marking 
> the relevant bits live.  FWIW, the control flow of this code continues 
> to be my biggest concern from a maintenance standpoint.  Figuring it out 
> was a major pain and I've tried to document what is and what is not 
> safe.  But it's still painful to walk through.
>
> I pondered if note_uses/note_stores would be better, but concluded we'd 
> just end up with a ton of state objects to carry around and reasoning 
> about that would be just as hard.

Feels like it would be good to handle the top-level structure explicitly,
(PARALLELs, SETs, SET_SRCs, etc.), then fall back to iteration at the
point that we can no longer do better then "all registers in this expression
are fully live".

If we do that, rtx_properties might be an alternative to explicit
iteration.  The advantage of that is that it can handle destination
and sources as the top-level expression, and records whether each
register is itself a destination or source.

Thanks,
Richard


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Jeff Law




On 11/27/23 13:03, Richard Sandiford wrote:

Joern Rennecke  writes:

  On 11/20/23 11:26, Richard Sandiford wrote:

+  /* ?!? What is the point of this adjustment to DST_MASK?  */
+  if (code == PLUS || code == MINUS
+  || code == MULT || code == ASHIFT)
+ dst_mask
+  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;


Yeah, sympathise with the ?!? here :)

Jeff Law:

Inherited.  Like the other bit of magic I think I'll do a test with them
pulled out to see if I can make something undesirable trigger.


This represents the carry effect.  Even if the destination only cares about
some high order bits, you have to consider all lower order bits of the inputs.

For ASHIFT, you could refine this in the case of a constant shift count.


Ah, right.  Think it would be worth a comment.
Definitely.  Wouldn't SIGN_EXTEND have a similar problem?  While we 
don't care about all the low bits, we do care about that MSB.





But I wonder whether we should centralise all this code-specific
information into a single place.  I.e. rather than having one switch to
say "PLUS is OK" or "AND is OK", and then having code-specific handling
elsewhere, we could enumerate how to handle a code.
Yea.  That's where I was starting to go with the code which indicates we 
can't necessarily narrow a shift count.  ie, what are the properties of 
the opcodes and how do they translate into the bits we need clear from 
LIVENOW (for sets) and the bits we need to make live (for uses).


Jeff


Re: [PATCH v3 00/11] : More warnings as errors by default

2023-11-27 Thread Sam James


Florian Weimer  writes:

> * Jeff Law:
>
>> On 11/20/23 02:55, Florian Weimer wrote:
>>> This revision addresses Marek's comment about handing
>>> -Wdeclaration-missing-parameter-type properly in conjunction with
>>> -fpermissive.  A new test (permerror-fpermissive-nowarning.c)
>>> demonstrates the expected behavior.  I added a test for -std=gnu89
>>> -fno-permissive, too.
>>> I'm including the precursor cleanup patches in this posting.
>>> Hopefully
>>> this will make the aarch64 tester happy.
>>> Thanks,
>>> Florian
>>> Florian Weimer (11):
>>>aarch64: Avoid -Wincompatible-pointer-types warning in Linux unwinder
>>>aarch64: Call named function in gcc.target/aarch64/aapcs64/ice_1.c
>>>gm2: Add missing declaration of m2pim_M2RTS_Terminate to test
>>>Add tests for validating future C permerrors
>>>c: Turn int-conversion warnings into permerrors
>>>c: Turn -Wimplicit-function-declaration into a permerror
>>>c: Turn -Wimplicit-int into a permerror
>>>c: Do not ignore some forms of -Wimplicit-int in system headers
>>>c: Turn -Wreturn-mismatch into a permerror
>>>c: Turn -Wincompatible-pointer-types into a permerror
>>>c: Add new -Wdeclaration-missing-parameter-type permerror
>
>> The series is fine by me.
>
> Thanks.
>
>> But give Marek additional time to chime in, particularly given the
>> holidays this week in the US.  Say through this time next week?
>
> [...]
>
> I'm also gathering some numbers regarding autoconf impact and potential
> silent miscompilation.

I'd actually forgot about another element here: FreeBSD 14 which was
just released now ships with Clang 16 so we seem to be getting some
activity from them which is a help.

I've resumed our testing for configure diffs and am going to
focus on that for now. It's just laborious because of how many errors
are actually fine.



  1   2   >