Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-15 Thread HAO CHEN GUI
Hi Kewen,

在 2024/1/15 14:16, Kewen.Lin 写道:
> Considering it's stage 4 now and the impact of this patch, let's defer
> this to next stage 1, if possible could you organize the above changes
> into patches:
> 
> 1) Refactor expand_compare_loop by splitting into two functions without
>any functional changes.
> 2) Remove some useless codes like 2, 4, 5.
> 3) Some more enhancements like 1, 3, 6.
> 
> ?  It would be helpful for the review.  Thanks!

Thanks for your review comments. I will re-organize it at new stage 1.


Re: HELP: Questions on unshare_expr

2024-01-15 Thread Eric Botcazou
> Okay, so, the "unsharing everything” is done automatically by the compiler
> before gimplification? 

See the blurb at gimplify.cc:835 and below about this.

-- 
Eric Botcazou




[PATCH v4 0/3] RISC-V: Add intrinsics for Bitmanip and Scalar Crypto extensions

2024-01-15 Thread Liao Shihua
Update v3 -> v4:
  1.Typo fix.
  2.Only test *intrinsic-32 on rv32 and *intrinsic-64 on rv64.
  3.Update Copyright year to 2024.

Update v2 -> v3:
  1. Change pattern mode form X to GPR in orcb, clmul, and brev8.
  2. Add emulated testsuite.
  3. Removed duplicate testsuite between built-in and intrinsic. 
  4. Typo fix.

Update v1 -> v2:
  1. Rename *_intrinsic-* to *_intrinsic-XLEN.
  2. Typo fix.
  3. Intrinsics with immediate arguments will use marcos at O0 .

It's a little patch add just provides a mapping from the RV intrinsics to the 
builtin 
names within GCC.

Liao Shihua (3):
  RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function
testsuites
  RISC-V: Add C intrinsic for Scalar Crypto Extension
  RISC-V: Add C intrinsic for Scalar Bitmanip Extension

 gcc/config.gcc|   2 +-
 gcc/config/riscv/bitmanip.md  |  10 +-
 gcc/config/riscv/crypto.md|   4 +-
 gcc/config/riscv/riscv-builtins.cc|  22 ++
 gcc/config/riscv/riscv-cmo.def|  12 +-
 gcc/config/riscv/riscv-ftypes.def |   2 +
 gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
 gcc/config/riscv/riscv_bitmanip.h | 297 +
 gcc/config/riscv/riscv_crypto.h   | 309 ++
 .../riscv/scalar_bitmanip_intrinsic-32.c  |  97 ++
 .../scalar_bitmanip_intrinsic-64-emulated.c   |  33 ++
 .../riscv/scalar_bitmanip_intrinsic-64.c  | 115 +++
 .../riscv/scalar_crypto_intrinsic-32.c| 115 +++
 .../riscv/scalar_crypto_intrinsic-64.c| 123 +++
 .../gcc.target/riscv/zbb_32_bswap-1.c |  11 -
 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c  |  11 -
 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c  |  12 -
 .../riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} |   3 +-
 gcc/testsuite/gcc.target/riscv/zbbw.c |  26 --
 gcc/testsuite/gcc.target/riscv/zbc32.c|  23 --
 gcc/testsuite/gcc.target/riscv/zbc64.c|  23 --
 gcc/testsuite/gcc.target/riscv/zbkb32.c   |  18 -
 gcc/testsuite/gcc.target/riscv/zbkb64.c   |   5 -
 gcc/testsuite/gcc.target/riscv/zbkc32.c   |  17 -
 gcc/testsuite/gcc.target/riscv/zbkc64.c   |  17 -
 gcc/testsuite/gcc.target/riscv/zbkx32.c   |  18 -
 gcc/testsuite/gcc.target/riscv/zbkx64.c   |  18 -
 gcc/testsuite/gcc.target/riscv/zknd32-2.c |  28 --
 gcc/testsuite/gcc.target/riscv/zknd64-2.c |  42 ---
 gcc/testsuite/gcc.target/riscv/zkne32-2.c |  28 --
 gcc/testsuite/gcc.target/riscv/zkne64-2.c |  34 --
 .../gcc.target/riscv/zknh-sha256-32.c |  10 -
 .../gcc.target/riscv/zknh-sha256-64.c |  28 --
 .../gcc.target/riscv/zknh-sha512-32.c |  42 ---
 .../gcc.target/riscv/zknh-sha512-64.c |  31 --
 gcc/testsuite/gcc.target/riscv/zksed32-2.c|  29 --
 gcc/testsuite/gcc.target/riscv/zksed64-2.c|  29 --
 gcc/testsuite/gcc.target/riscv/zksh32.c   |  19 --
 gcc/testsuite/gcc.target/riscv/zksh64.c   |  19 --
 39 files changed, 1149 insertions(+), 555 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_bitmanip.h
 create mode 100644 gcc/config/riscv/riscv_crypto.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c
 rename gcc/testsuite/gcc.target/riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} (59%)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbbw.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zkne32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zkne64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha256-32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha256-64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha512-32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha512-64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksed32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksed64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksh32.c
 delete 

[PATCH v4 1/3] RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function testsuites

2024-01-15 Thread Liao Shihua
The serials patch provides a mapping from the RV intrinsics to the builtin 
names.
There are some duplicates testsuites between intrinsic and built-in function.
Remove the Scalar Bitmanip and Scalar Crypto Built-In function testsuites
that will be included in the intrinsic functions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb_32_bswap-2.c: Rename to zbb_bswap16.c and only 
test __builtin_bswap16.
* gcc.target/riscv/zbkb32.c: Remove 
__builtin_riscv_(un)zip,__builtin_riscv_brev8.
* gcc.target/riscv/zbkb64.c: Remove __builtin_riscv_brev8.
* gcc.target/riscv/zbb_32_bswap-1.c: Removed.
* gcc.target/riscv/zbb_bswap-1.c: Removed.
* gcc.target/riscv/zbb_bswap-2.c: Removed.
* gcc.target/riscv/zbbw.c: Removed.
* gcc.target/riscv/zbc32.c: Removed.
* gcc.target/riscv/zbc64.c: Removed.
* gcc.target/riscv/zbkc32.c: Removed.
* gcc.target/riscv/zbkc64.c: Removed.
* gcc.target/riscv/zbkx32.c: Removed.
* gcc.target/riscv/zbkx64.c: Removed.
* gcc.target/riscv/zknd32-2.c: Removed.
* gcc.target/riscv/zknd64-2.c: Removed.
* gcc.target/riscv/zkne32-2.c: Removed.
* gcc.target/riscv/zkne64-2.c: Removed.
* gcc.target/riscv/zknh-sha256-32.c: Removed.
* gcc.target/riscv/zknh-sha256-64.c: Removed.
* gcc.target/riscv/zknh-sha512-32.c: Removed.
* gcc.target/riscv/zknh-sha512-64.c: Removed.
* gcc.target/riscv/zksed32-2.c: Removed.
* gcc.target/riscv/zksed64-2.c: Removed.
* gcc.target/riscv/zksh32.c: Removed.
* gcc.target/riscv/zksh64.c: Removed.

---
 .../gcc.target/riscv/zbb_32_bswap-1.c | 11 -
 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c  | 11 -
 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c  | 12 --
 .../riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} |  3 +-
 gcc/testsuite/gcc.target/riscv/zbbw.c | 26 
 gcc/testsuite/gcc.target/riscv/zbc32.c| 23 --
 gcc/testsuite/gcc.target/riscv/zbc64.c| 23 --
 gcc/testsuite/gcc.target/riscv/zbkb32.c   | 18 
 gcc/testsuite/gcc.target/riscv/zbkb64.c   |  5 ---
 gcc/testsuite/gcc.target/riscv/zbkc32.c   | 17 
 gcc/testsuite/gcc.target/riscv/zbkc64.c   | 17 
 gcc/testsuite/gcc.target/riscv/zbkx32.c   | 18 
 gcc/testsuite/gcc.target/riscv/zbkx64.c   | 18 
 gcc/testsuite/gcc.target/riscv/zknd32-2.c | 28 -
 gcc/testsuite/gcc.target/riscv/zknd64-2.c | 42 ---
 gcc/testsuite/gcc.target/riscv/zkne32-2.c | 28 -
 gcc/testsuite/gcc.target/riscv/zkne64-2.c | 34 ---
 .../gcc.target/riscv/zknh-sha256-32.c | 10 -
 .../gcc.target/riscv/zknh-sha256-64.c | 28 -
 .../gcc.target/riscv/zknh-sha512-32.c | 42 ---
 .../gcc.target/riscv/zknh-sha512-64.c | 31 --
 gcc/testsuite/gcc.target/riscv/zksed32-2.c| 29 -
 gcc/testsuite/gcc.target/riscv/zksed64-2.c| 29 -
 gcc/testsuite/gcc.target/riscv/zksh32.c   | 19 -
 gcc/testsuite/gcc.target/riscv/zksh64.c   | 19 -
 25 files changed, 2 insertions(+), 539 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c
 rename gcc/testsuite/gcc.target/riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} (59%)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbbw.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zkne32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zkne64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha256-32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha256-64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha512-32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zknh-sha512-64.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksed32-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksed64-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksh32.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/zksh64.c

diff --git a/gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c 
b/gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
deleted file mode 100644
index 789dda17f05..000
--- a/gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
+++ /dev/null
@@ -1,11 +0,0 @@
-/* { d

[PATCH] lower-bitint: Fix up handling of INTEGER_CSTs in handle_operand in right shifts or comparisons [PR113370]

2024-01-15 Thread Jakub Jelinek
Hi!

The INTEGER_CST code uses the remainder bits in computations whether to use
whole constant or just part of it and extend it at runtime, and furthermore
uses it to avoid using all bits even when using the (almost) whole constant.
The problem is that the prec % (2 * limb_prec) computation it uses is
appropriate only for the normal lowering of mergeable operations (where
we process 2 limbs at a time in a loop starting with least significant
limbs and process the remaining 0-2 limbs after the loop (there with
constant indexes).  For that case it is ok not to emit the upper
prec % (2 * limb_prec) bits into the constant, because those bits will be
extracted using INTEGER_CST idx and so will be used directly in the
statements as INTEGER_CSTs.
For other cases, where we either process just a single limb in a loop,
process it downwards (e.g. non-equality comparisons) or with some runtime
addends (some shifts), there is either just at most one limb lowered with
INTEGER_CST idx after the loop (e.g. for right shift) or before the loop
(e.g. non-equality comparisons), or all limbs are processed with
non-INTEGER_CST indexes (e.g. for left shift, when m_var_msb is set).
Now, the m_var_msb case is already handled through
  if (m_var_msb)
type = TREE_TYPE (op);
  else
/* If we have a guarantee the most significant partial limb
   (if any) will be only accessed through handle_operand
   with INTEGER_CST idx, we don't need to include the partial
   limb in .rodata.  */
type = build_bitint_type (prec - rem, 1);
but for the right shifts or comparisons the prec - rem when rem was
prec % (2 * limb_prec) was incorrect, so the following patch fixes it
to use remainder for 2 limbs only if m_upwards_2limb and remainder for
1 limb otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-15  Jakub Jelinek  

PR tree-optimization/113370
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Only
set rem to prec % (2 * limb_prec) if m_upwards_2limb, otherwise
set it to just prec % limb_prec.

* gcc.dg/torture/bitint-48.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-13 11:29:08.005574338 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-13 11:33:55.284646856 +0100
@@ -869,7 +869,7 @@ bitint_large_huge::handle_operand (tree
  && m_data[m_data_cnt + 1] == NULL_TREE))
{
  unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
- unsigned int rem = prec % (2 * limb_prec);
+ unsigned int rem = prec % ((m_upwards_2limb ? 2 : 1) * limb_prec);
  int ext;
  unsigned min_prec = bitint_min_cst_precision (op, ext);
  if (m_first)
@@ -996,7 +996,7 @@ bitint_large_huge::handle_operand (tree
   if (m_data[m_data_cnt + 1] == integer_type_node)
{
  unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
- unsigned rem = prec % (2 * limb_prec);
+ unsigned rem = prec % ((m_upwards_2limb ? 2 : 1) * limb_prec);
  int ext = wi::neg_p (wi::to_wide (op)) ? -1 : 0;
  tree c = m_data[m_data_cnt];
  unsigned min_prec = TYPE_PRECISION (TREE_TYPE (c));
--- gcc/testsuite/gcc.dg/torture/bitint-48.c.jj 2024-01-13 11:46:30.707309245 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-48.c2024-01-13 11:46:25.493380637 
+0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/113370 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 255
+_BitInt(255)
+foo (int s)
+{
+  return -(_BitInt(255)) 3 >> s;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 255
+  if (foo (51) != -1)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



Re: [PATCH] lower-bitint: Fix up handle_operand_addr INTEGER_CST handling [PR113361]

2024-01-15 Thread Richard Biener
On Sat, 13 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, the INTEGER_CST handling in handle_operand_addr
> (i.e. what is used when passing address of an integer to a bitint library
> routine) wasn't correct.  If the minimum precision to represent an
> INTEGER_CST is smaller or equal to limb_prec, the code correctly uses
> m_limb_type; if the minimum precision of a _BitInt INTEGER_CST is large
> enough such that the bitint is middle, large or huge, everything is fine
> too.  But the code wasn't handling correctly e.g. __int128 constants which
> need more than limb_prec bits or _BitInt constants which on the architecture
> are considered small (say have DImode limb_mode, TImode abi_limb_mode and
> for [65, 128] bits use TImode scalar like the proposed aarch64 patch).
> Best would be to use an array of 2/3/4 limbs in that case, but we'd need to
> convert the INTEGER_CST to a CONSTRUCTOR in the right endianity etc.,
> so the code was using mid_min_prec to enforce a middle _BitInt precision.
> Except that mid_min_prec can be 0 and not computed yet, or it doesn't have
> to be the smallest middle _BitInt precision, just the smallest so far
> encountered.  So, on the testcase one possibility was that it used precision
> 65 from mid_min_prec, even when the INTEGER_CST actually needed larger
> minimum precision (96 bits at least), or crashed when mid_min_prec was 0.
> 
> The patch fixes it in 2 hunks, the first makes sure we actually try to
> create a BITINT_TYPE for the > limb_prec cases like __int128, and the second
> instead of using mid_min_prec attempts to increase mp precision until it
> isn't small anymore.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-13  Jakub Jelinek  
> 
>   PR tree-optimization/113361
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_operand_addr):
>   Fix up determination of the type for > limb_prec constants.
> 
>   * gcc.dg/torture/bitint-47.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-12 11:23:12.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-13 00:18:19.255889866 +0100
> @@ -2227,7 +2227,9 @@ bitint_large_huge::handle_operand_addr (
>mp = CEIL (min_prec, limb_prec) * limb_prec;
>if (mp == 0)
>   mp = 1;
> -  if (mp >= (unsigned) TYPE_PRECISION (TREE_TYPE (op)))
> +  if (mp >= (unsigned) TYPE_PRECISION (TREE_TYPE (op))
> +   && (TREE_CODE (TREE_TYPE (op)) == BITINT_TYPE
> +   || TYPE_PRECISION (TREE_TYPE (op)) <= limb_prec))
>   type = TREE_TYPE (op);
>else
>   type = build_bitint_type (mp, 1);
> @@ -2237,11 +2239,15 @@ bitint_large_huge::handle_operand_addr (
> if (TYPE_PRECISION (type) <= limb_prec)
>   type = m_limb_type;
> else
> - /* This case is for targets which e.g. have 64-bit
> -limb but categorize up to 128-bits _BitInts as
> -small.  We could use type of m_limb_type[2] and
> -similar instead to save space.  */
> - type = build_bitint_type (mid_min_prec, 1);
> + {
> +   while (bitint_precision_kind (mp) == bitint_prec_small)
> + mp += limb_prec;
> +   /* This case is for targets which e.g. have 64-bit
> +  limb but categorize up to 128-bits _BitInts as
> +  small.  We could use type of m_limb_type[2] and
> +  similar instead to save space.  */
> +   type = build_bitint_type (mp, 1);
> + }
>   }
>if (prec_stored)
>   {
> --- gcc/testsuite/gcc.dg/torture/bitint-47.c.jj   2024-01-13 
> 00:23:40.627562314 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-47.c  2024-01-13 00:25:35.571025508 
> +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/113361 */
> +/* { dg-do run { target { bitint && int128 } } } */
> +/* { dg-options "-std=gnu23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 129
> +int
> +foo (_BitInt(65) x)
> +{
> +  return __builtin_mul_overflow_p ((__int128) 0x << 64, x, 
> (_BitInt(129)) 0);
> +}
> +
> +int
> +bar (_BitInt(63) x)
> +{
> +  return __builtin_mul_overflow_p ((__int128) 0x << 64, x, 
> (_BitInt(129)) 0);
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 129
> +  if (!foo (5167856845))
> +__builtin_abort ();
> +  if (!bar (5167856845))
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Fix up handling of INTEGER_CSTs in handle_operand in right shifts or comparisons [PR113370]

2024-01-15 Thread Richard Biener
On Mon, 15 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The INTEGER_CST code uses the remainder bits in computations whether to use
> whole constant or just part of it and extend it at runtime, and furthermore
> uses it to avoid using all bits even when using the (almost) whole constant.
> The problem is that the prec % (2 * limb_prec) computation it uses is
> appropriate only for the normal lowering of mergeable operations (where
> we process 2 limbs at a time in a loop starting with least significant
> limbs and process the remaining 0-2 limbs after the loop (there with
> constant indexes).  For that case it is ok not to emit the upper
> prec % (2 * limb_prec) bits into the constant, because those bits will be
> extracted using INTEGER_CST idx and so will be used directly in the
> statements as INTEGER_CSTs.
> For other cases, where we either process just a single limb in a loop,
> process it downwards (e.g. non-equality comparisons) or with some runtime
> addends (some shifts), there is either just at most one limb lowered with
> INTEGER_CST idx after the loop (e.g. for right shift) or before the loop
> (e.g. non-equality comparisons), or all limbs are processed with
> non-INTEGER_CST indexes (e.g. for left shift, when m_var_msb is set).
> Now, the m_var_msb case is already handled through
>   if (m_var_msb)
> type = TREE_TYPE (op);
>   else
> /* If we have a guarantee the most significant partial limb
>(if any) will be only accessed through handle_operand
>with INTEGER_CST idx, we don't need to include the partial
>limb in .rodata.  */
> type = build_bitint_type (prec - rem, 1);
> but for the right shifts or comparisons the prec - rem when rem was
> prec % (2 * limb_prec) was incorrect, so the following patch fixes it
> to use remainder for 2 limbs only if m_upwards_2limb and remainder for
> 1 limb otherwise.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-15  Jakub Jelinek  
> 
>   PR tree-optimization/113370
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Only
>   set rem to prec % (2 * limb_prec) if m_upwards_2limb, otherwise
>   set it to just prec % limb_prec.
> 
>   * gcc.dg/torture/bitint-48.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-13 11:29:08.005574338 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-13 11:33:55.284646856 +0100
> @@ -869,7 +869,7 @@ bitint_large_huge::handle_operand (tree
> && m_data[m_data_cnt + 1] == NULL_TREE))
>   {
> unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
> -   unsigned int rem = prec % (2 * limb_prec);
> +   unsigned int rem = prec % ((m_upwards_2limb ? 2 : 1) * limb_prec);
> int ext;
> unsigned min_prec = bitint_min_cst_precision (op, ext);
> if (m_first)
> @@ -996,7 +996,7 @@ bitint_large_huge::handle_operand (tree
>if (m_data[m_data_cnt + 1] == integer_type_node)
>   {
> unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
> -   unsigned rem = prec % (2 * limb_prec);
> +   unsigned rem = prec % ((m_upwards_2limb ? 2 : 1) * limb_prec);
> int ext = wi::neg_p (wi::to_wide (op)) ? -1 : 0;
> tree c = m_data[m_data_cnt];
> unsigned min_prec = TYPE_PRECISION (TREE_TYPE (c));
> --- gcc/testsuite/gcc.dg/torture/bitint-48.c.jj   2024-01-13 
> 11:46:30.707309245 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-48.c  2024-01-13 11:46:25.493380637 
> +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/113370 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 255
> +_BitInt(255)
> +foo (int s)
> +{
> +  return -(_BitInt(255)) 3 >> s;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 255
> +  if (foo (51) != -1)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH v4 2/3] RISC-V: Add C intrinsic for Scalar Crypto Extension

2024-01-15 Thread Liao Shihua
This patch adds C intrinsics for Scalar Crypto Extension.

gcc/ChangeLog:

* config.gcc: Include riscv_crypto.h.
* config/riscv/riscv_crypto.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/scalar_crypto_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_crypto_intrinsic-64.c: New test.
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv_crypto.h   | 309 ++
 .../riscv/scalar_crypto_intrinsic-32.c| 115 +++
 .../riscv/scalar_crypto_intrinsic-64.c| 123 +++
 4 files changed, 548 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv_crypto.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-64.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d17787bc9ad..11c3a647b5e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_crypto.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/riscv_crypto.h b/gcc/config/riscv/riscv_crypto.h
new file mode 100644
index 000..1bfe3d7c675
--- /dev/null
+++ b/gcc/config/riscv/riscv_crypto.h
@@ -0,0 +1,309 @@
+/* RISC-V 'Scalar Crypto' Extension intrinsics include file.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef __RISCV_SCALAR_CRYPTO_H
+#define __RISCV_SCALAR_CRYPTO_H
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if defined (__riscv_zknd)
+
+#if __riscv_xlen == 32
+
+#ifdef __OPTIMIZE__
+
+extern __inline uint32_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes32dsi (uint32_t rs1, uint32_t rs2, const int bs)
+{
+  return __builtin_riscv_aes32dsi (rs1,rs2,bs);
+}
+
+extern __inline uint32_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes32dsmi (uint32_t rs1, uint32_t rs2, const int bs)
+{
+  return __builtin_riscv_aes32dsmi (rs1,rs2,bs);
+}
+
+#else
+#define __riscv_aes32dsi(x, y, bs) __builtin_riscv_aes32dsi (x, y, bs)
+#define __riscv_aes32dsmi(x, y, bs) __builtin_riscv_aes32dsmi (x, y, bs)
+#endif
+
+#endif
+
+#if __riscv_xlen == 64
+
+extern __inline uint64_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes64ds (uint64_t rs1, uint64_t rs2)
+{
+  return __builtin_riscv_aes64ds (rs1,rs2);
+}
+
+extern __inline uint64_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes64dsm (uint64_t rs1, uint64_t rs2)
+{
+  return __builtin_riscv_aes64dsm (rs1,rs2);
+}
+
+extern __inline uint64_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes64im (uint64_t rs1)
+{
+  return __builtin_riscv_aes64im (rs1);
+}
+#endif
+#endif // __riscv_zknd
+
+#if (defined (__riscv_zknd) || defined (__riscv_zkne)) && (__riscv_xlen == 64)
+
+#ifdef __OPTIMIZE__
+
+extern __inline uint64_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes64ks1i (uint64_t rs1, const int rnum)
+{
+  return __builtin_riscv_aes64ks1i (rs1,rnum);
+}
+
+#else
+#define __riscv_aes64ks1i(x, rnum) __builtin_riscv_aes64ks1i (x, rnum)
+#endif
+
+extern __inline uint64_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__riscv_aes64ks2 (uint64_t rs1, uint64_t rs2)
+{
+return __builtin_riscv_aes64ks2 (rs1,rs2);
+}
+
+#endif // __riscv_zknd || __riscv_zkne
+
+#if defined (__riscv_zkne)
+
+#if __riscv_xlen == 32
+
+#ifdef __OPTIMIZE__
+
+extern __inline uint32_t
+__attribute__ ((__gnu_inline__, __alway

[PATCH v4 3/3] RISC-V: Add C intrinsic for Scalar Bitmanip Extension

2024-01-15 Thread Liao Shihua
This patch adds C intrinsics for Bitmanip Extension.
RISCV_BUILTIN_NO_PREFIX is a new riscv_builtin_description like RISCV_BUILTIN.
But it uses CODE_FOR_##INSN rather than CODE_FOR_riscv_##INSN.
Changed orcb, clmul, brev8 pattern's mode form X to GPR because orcbsi, 
clmul_si, 
brev8_si are both included in rv32 and rv64. Test them in 
scalar_bitmanip_intrinsic-64-emulated.c.

gcc/ChangeLog:

* config.gcc: Include riscv_bitmanip.h.
* config/riscv/bitmanip.md: Changed mode form X to GPR in orcb and 
clmul pattern.
* config/riscv/crypto.md: Changed mode form X to GPR in brev8 pattern.
* config/riscv/riscv-builtins.cc (AVAIL): New AVAIL.
(RISCV_BUILTIN_NO_PREFIX): New riscv_builtin_description.
* config/riscv/riscv-cmo.def (RISCV_BUILTIN): New builtins.
* config/riscv/riscv-ftypes.def (2): New ftypes.
* config/riscv/riscv-scalar-crypto.def (RISCV_BUILTIN): New builtins.
(RISCV_BUILTIN_NO_PREFIX): Ditto.
* config/riscv/riscv_bitmanip.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/scalar_bitmanip_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64.c: New test.

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/bitmanip.md  |  10 +-
 gcc/config/riscv/crypto.md|   4 +-
 gcc/config/riscv/riscv-builtins.cc|  22 ++
 gcc/config/riscv/riscv-cmo.def|  12 +-
 gcc/config/riscv/riscv-ftypes.def |   2 +
 gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
 gcc/config/riscv/riscv_bitmanip.h | 297 ++
 .../riscv/scalar_bitmanip_intrinsic-32.c  |  97 ++
 .../scalar_bitmanip_intrinsic-64-emulated.c   |  33 ++
 .../riscv/scalar_bitmanip_intrinsic-64.c  | 115 +++
 11 files changed, 600 insertions(+), 16 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_bitmanip.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 11c3a647b5e..00355509c92 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h riscv_crypto.h"
+   extra_headers="riscv_vector.h riscv_crypto.h riscv_bitmanip.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index fdab0017a2b..ccda25c01c1 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -443,8 +443,8 @@
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb2"
-  [(set (match_operand:X 0 "register_operand" "=r")
-   (unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (unspec:GPR [(match_operand:GPR 1 "register_operand" "r")] 
UNSPEC_ORC_B))]
   "TARGET_ZBB"
   "orc.b\t%0,%1"
   [(set_attr "type" "bitmanip")])
@@ -852,9 +852,9 @@
 
 ;; ZBKC or ZBC extension
 (define_insn "riscv_clmul_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
+  (match_operand:GPR 2 "register_operand" "r")]
   UNSPEC_CLMUL))]
   "TARGET_ZBKC || TARGET_ZBC"
   "clmul\t%0,%1,%2"
diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index bf613fca056..dd2bc94ee88 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -72,8 +72,8 @@
 
 ;; ZBKB extension
 (define_insn "riscv_brev8_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec:GPR [(match_operand:GPR 1 "register_operand" "r")]
   UNSPEC_BREV8))]
   "TARGET_ZBKB"
   "brev8\t%0,%1"
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index e85169374eb..1932ff069c6 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -105,6 +105,7 @@ AVAIL (zero32,  TARG

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello All:

Following performance gains for spec2017 FP benchmarks.

554.roms_r 16% gains
544.nab_r  9.98% gains
521.wrf_r  6.89% gains.

Thanks & Regards
Ajit


On 14/01/24 8:55 pm, Ajit Agarwal wrote:
> Hello All:
> 
> This patch add the vecload pass to replace adjacent memory accesses lxv with 
> lxvp
> instructions. This pass is added before ira pass.
> 
> vecload pass removes one of the defined adjacent lxv (load) and replace with 
> lxvp.
> Due to removal of one of the defined loads the allocno is has only uses but
> not defs.
> 
> Due to this IRA pass doesn't assign register pairs like registers in sequence.
> Changes are made in IRA register allocator to assign sequential registers to
> adjacent loads.
> 
> Some of the registers are cleared and are not set as profitable registers due 
> to zero cost is greater than negative costs and checks are added to compare
> positive costs.
> 
> LRA register is changed not to reassign them to different register and form
> the sequential register pairs intact.
> 
> 
> contrib/check_GNU_style.sh run on patch looks good.
> 
> Bootstrapped and regtested for powerpc64-linux-gnu.
> 
> Spec2017 benchmarks are run and I get impressive benefits for some of the FP
> benchmarks.
> 
> Thanks & Regards
> Ajit
> 
> 
> rs6000: New  pass for replacement of adjacent lxv with lxvp.
> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.
> 
> 2024-01-14  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: Registered vecload pass.
>   * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>   pass.
>   * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * ira-color.cc: Form register pair with adjacent loads.
>   * lra-assigns.cc: Skip modifying register pair assignment.
>   * lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
>   * lra.cc: Initialize pseudo_conflict field.
>   * ira-build.cc: Use of REG_FREQ.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/vecload.C: New test.
>   * g++.target/powerpc/vecload1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
>  gcc/config.gcc|   4 +-
>  gcc/config/rs6000/rs6000-passes.def   |   4 +
>  gcc/config/rs6000/rs6000-protos.h |   5 +-
>  gcc/config/rs6000/rs6000-vecload-opt.cc   | 432 ++
>  gcc/config/rs6000/rs6000.cc   |   8 +-
>  gcc/config/rs6000/t-rs6000|   5 +
>  gcc/ira-color.cc  | 220 -
>  gcc/lra-assigns.cc| 118 -
>  gcc/lra-int.h |   2 +
>  gcc/lra.cc|   1 +
>  gcc/testsuite/g++.target/powerpc/vecload.C|  15 +
>  gcc/testsuite/g++.target/powerpc/vecload1.C   |  22 +
>  .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
>  13 files changed, 816 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..8bd172dd779 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -29,6 +29,10 @@ along with GCC; see the file COPYING3.  If not see
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
>  
> +  /* Pass to replace adjacen

[PATCH] Mark ASM_OUTPUT_FUNCTION_LABEL ()'s DECL argument as used

2024-01-15 Thread Ilya Leoshkevich
Compile tested for the ia64-elf target; bootstrap and regtest running
on x86_64-redhat-linux.  Ok for trunk when successful?



ia64-elf build fails with the following warning:

[all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc:3889:59: 
error: unused parameter 'decl' [-Werror=unused-parameter]
[all 2024-01-12 16:32:34]  3889 | ia64_start_function (FILE *file, 
const char *fnname, tree decl)

decl is passed to ASM_OUTPUT_FUNCTION_LABEL (), whose default
implementation does not use it.  Mark it as used in order to avoid the
warning.

Reported-by: Jan-Benedict Glaw 
Suggested-by: Jan-Benedict Glaw 
Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
ASM_OUTPUT_FUNCTION_LABEL")
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

* defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Mark DECL as used.
---
 gcc/defaults.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index 92f3e07f742..1a2ea68a543 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -149,8 +149,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
NAME, such as the label on a function.  */
 
 #ifndef ASM_OUTPUT_FUNCTION_LABEL
-#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \
-  assemble_function_label_raw ((FILE), (NAME))
+#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL)\
+  do { \
+(void) (DECL); \
+assemble_function_label_raw ((FILE), (NAME));  \
+  } while (0)
 #endif
 
 /* Output the definition of a compiler-generated label named NAME.  */
-- 
2.43.0



Re: [PATCH/RFC] Add --with-dwarf4 configure option.

2024-01-15 Thread Richard Biener
On Sun, Jan 14, 2024 at 8:32 PM Roger Sayle  wrote:
>
>
> This patch fixes three of the four unexpected failures that I'm seeing
> in the gcc testsuite on x86_64-pc-linux-gnu.  The three FAILs are:
> FAIL: gcc.c-torture/execute/fprintf-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.c-torture/execute/printf-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.c-torture/execute/user-printf.c   -O3 -g  (test for excess errors)
>
> and are caused by the linker/toolchain (GNU ld 2.27 on RedHat 7) issuing
> a link-time warning:
> /usr/bin/ld: Dwarf Error: found dwarf version '5', this reader only handles
> version 2, 3 and 4 information.

We're patching GCC on old systems to avoid this (and fallout from tools
not supporting DWARF5).

> This also explains why these c-torture tests only fail with -g.
>
> One solution might be to tweak/improve GCC's testsuite to ignore
> these warnings.  However, ideally it should also be possible to
> configure gcc not to generate dwarf5 debugging information on
> systems that don't/can't support it.  This patch supplements the
> current --with-dwarf2 configure option with the addition of a
> --with-dwarf4 option that adds a tm-dwarf4.h to $tm_file (using
> the same mechanism as --with-dwarf2) that changes/redefines
> DWARF_VERSION_DEFAULT to 4 (overriding the current default of 5),
>
> This patch has been tested on x86_64-pc-linux-gnu, with a full
> make bootstrap, both with and without --with-dwarf4.  This is
> fixes the three failures above, and causes no new failures outside
> of the gcc.dg/guality directory.  Unfortunately, the guality
> testsuite contains a large number of tests that assume support
> for dwarf5 and don't (yet) check check_effective_target_dwarf5.
> Hopefully, adding --with-dwarf4 will help improve/test the
> portability of the guality testsuite.
>
> Ok for mainline?  An alternative implementation might be to
> allow integer values for $with_dwarf such that --with-dwarf5,
> --with-dwarf3 etc. do the right thing.  In fact, I'd originally
> misread the documentation and assumed --with-dwarf4 was already
> supported.

The only thing to watch out for is ordering of tm_file since there's

./config/vxworks.h:#undef DWARF_VERSION_DEFAULT
./config/vxworks.h:#define DWARF_VERSION_DEFAULT (TARGET_VXWORKS7 ? 3 : 2)

it looks like the --with-dwarf4 adjustment is before the vxworks handling
so the vxworks handling will prevail even with --with-dwarf4.  Not sure
if that's intended?

I think this also needs documenting in install.texi (and mention whether
target specific defaults will prevail or not)

Richard.

>
> 2024-01-14  Roger Sayle  
>
> gcc/ChangeLog
> * configure.ac: Add a with --with dwarf4 option.
> * configure: Regenerate.
> * config/tm-dwarf4.h: New target file to define
> DWARF_VERSION_DEFAULT to 4.
>
>
> Thanks in advance,
> Roger
> --
>


Re: HELP: Questions on unshare_expr

2024-01-15 Thread Richard Biener
On Fri, Jan 12, 2024 at 6:30 PM Qing Zhao  wrote:
>
> Thanks a lot for the reply.
>
> > On Jan 12, 2024, at 11:28 AM, Richard Biener  
> > wrote:
> >
> >
> >
> >> Am 12.01.2024 um 16:55 schrieb Qing Zhao :
> >>
> >> Hi,
> >>
> >> I have some questions on using the utility routine “unshare_expr”:
> >>
> >> From my understanding, there should be NO shared nodes in a GENERIC 
> >> function.
> >> Otherwise, gimplication might fail.
> >
> > There is sharing and this is why we unshare everything before 
> > gimplification.
>
> Okay, so, the "unsharing everything” is done automatically by the compiler 
> before gimplification?
> I don’t need to worry about this?
>
> I see  many places in FE where “unshare_expr” is used, for example, 
> “ubsan_instrument_division”,
>  “ubsan_instrument_shift”, etc.

It's likely doing sth during gimplification.

> So, usually, when should “unshare_expr” be used?

You should usually unshare when you are putting the same 'tree' into multiple
operands.  Using a SAVE_EXPR avoids redundant code but it also requires
that the SAVE_EXPR uses are ordered.

> >> Therefore, when we insert new tree nodes manually into the GENERIC 
> >> function, we should
> >> Make sure there is no shared nodes introduced.
> >>
> >> 1. Is the above understanding correct?
> >
> > No
> >
> >> 2. Is there any tool to check there is no shared nodes in the GENERIC 
> >> function?
> >> 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
> >> function? If so, what are they?
> >
> > There’s some allowed sharing on GIMPLE and a verifier.
> What’s the name of the verifier that I can search and check?

verify_node_sharing

> >
> >> 4. For the following:
> >>
> >> If both “op1” and “op2” are existing tree nodes in the current GENERIC 
> >> function,
> >> and we will insert a new tree node:
> >>
> >> tree  new_tree = build2 (CODE, TYPE, op1, op2)
> >>
> >>
> >> Should we add “unshare_expr” on both “op1” and “op2” as:
> >>
> >> Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
> >> ?
> >
> > Not necessarily but instead you have to watch for evaluating side-effects 
> > only once.  See save_expr.
>
> Okay.  I see.
> >
> >>
> >> If op2 is a node that is allowed to be shared, whether the additional 
> >> “unshare_expr” on it trigger any potential problem?
> >
> > If you unshare side-effects that’s generating wrong-code.  Otherwise 
> > unsharing is safe.
>
> Okay.
> Will unnecessary unshareing produce redundant IRs?

Yes.

> All my questions for unshare_expr relate to a  LTO bug that I currently stuck 
> with
> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
> -flto, no issue):
>
> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
> during IPA pass: modref
> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in 
> LTO streams
> 0x14c3993 lto_write_tree
> ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
> 0x14c3aeb lto_output_tree_1
>
> And the value of the tree node that triggered the ICE is:
> (gdb) call debug_tree(expr)
>  
> nothrow
> def_stmt
> version:13 in-free-list>
>
> Is there any good way to debug LTO bug?

This happens usually when you have a VLA type and its type fields are not
properly gimplified which usually happens because the frontend fails to
insert a gimplification point for it (a DECL_EXPR).

> Thanks a lot for the help.
>
> Qing
>
>
> >
> > Richard
> >
> >> Thanks a lot for your help.
> >>
> >> Qing
>


Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Richard Biener
On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal  wrote:
>
> Hello All:
>
> This patch add the vecload pass to replace adjacent memory accesses lxv with 
> lxvp
> instructions. This pass is added before ira pass.
>
> vecload pass removes one of the defined adjacent lxv (load) and replace with 
> lxvp.
> Due to removal of one of the defined loads the allocno is has only uses but
> not defs.
>
> Due to this IRA pass doesn't assign register pairs like registers in sequence.
> Changes are made in IRA register allocator to assign sequential registers to
> adjacent loads.
>
> Some of the registers are cleared and are not set as profitable registers due
> to zero cost is greater than negative costs and checks are added to compare
> positive costs.
>
> LRA register is changed not to reassign them to different register and form
> the sequential register pairs intact.
>
>
> contrib/check_GNU_style.sh run on patch looks good.
>
> Bootstrapped and regtested for powerpc64-linux-gnu.
>
> Spec2017 benchmarks are run and I get impressive benefits for some of the FP
> benchmarks.

I want to point out the aarch64 target recently got a ld/st fusion
pass which sounds
related.  It would be nice to have at least common infrastructure for
this (the aarch64
one also looks quite more powerful)

> Thanks & Regards
> Ajit
>
>
> rs6000: New  pass for replacement of adjacent lxv with lxvp.
>
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.
>
> 2024-01-14  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-passes.def: Registered vecload pass.
> * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
> * config.gcc: Add new executable.
> * config/rs6000/rs6000-protos.h: Add new prototype for vecload
> pass.
> * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
> * config/rs6000/t-rs6000: Add new rule.
> * ira-color.cc: Form register pair with adjacent loads.
> * lra-assigns.cc: Skip modifying register pair assignment.
> * lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
> * lra.cc: Initialize pseudo_conflict field.
> * ira-build.cc: Use of REG_FREQ.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/vecload.C: New test.
> * g++.target/powerpc/vecload1.C: New test.
> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
>  gcc/config.gcc|   4 +-
>  gcc/config/rs6000/rs6000-passes.def   |   4 +
>  gcc/config/rs6000/rs6000-protos.h |   5 +-
>  gcc/config/rs6000/rs6000-vecload-opt.cc   | 432 ++
>  gcc/config/rs6000/rs6000.cc   |   8 +-
>  gcc/config/rs6000/t-rs6000|   5 +
>  gcc/ira-color.cc  | 220 -
>  gcc/lra-assigns.cc| 118 -
>  gcc/lra-int.h |   2 +
>  gcc/lra.cc|   1 +
>  gcc/testsuite/g++.target/powerpc/vecload.C|  15 +
>  gcc/testsuite/g++.target/powerpc/vecload1.C   |  22 +
>  .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
>  13 files changed, 816 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
> ;;
>  powerpc*-*-*)
> cpu_type=rs6000
> -   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> +   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
> extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
> extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
> extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
> ;;
>  rs6000*-*-*)
> extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> -   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> +   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
> extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
> target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
> target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..8bd172dd779 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -29,6 +29,10 @@ along with GCC; see the file COPYING3.  If not see
>   for loads and stores.

[PATCH] c++: Fix ENABLE_SCOPE_CHECKING printing

2024-01-15 Thread Nathaniel Shead
While working on another bug, I noticed the ENABLE_SCOPE_CHECKING macro
and thought to try it out. It caused selftest to ICE. This patch is a
minimal fix to get it working again.

Probably this should use a test to stop this regressing again in the
future the next time new scope-kinds are added, but given it's dependent
on a (almost certainly rarely-used) build-time macro I'm not sure
exactly how you would do that?

Or alternatively I could add a `sk_count` to the end of the scope kind
list and `static_assert` that the size of the descriptor list matches?

(Also not sure if this would be appropriate for stage 4 or if it should
wait till next stage 1. I suppose this fixes a regression but I suspect
this has been broken for a very long time.)

-- >8 --

The lists of scope kinds used by ENABLE_SCOPE_CHECKING don't seem to
have been updated in a long while, causing ICEs and confusing output.
This patch brings the list into line.

gcc/cp/ChangeLog:

* name-lookup.cc (cp_binding_level_descriptor): Add missing
scope kinds.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index d827d337d3b..2e93ed183f1 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4464,11 +4464,16 @@ cp_binding_level_descriptor (cp_binding_level *scope)
 "try-scope",
 "catch-scope",
 "for-scope",
+"cond-init-scope",
+"stmt-expr-scope",
 "function-parameter-scope",
 "class-scope",
+"enum-scope",
 "namespace-scope",
 "template-parameter-scope",
-"template-explicit-spec-scope"
+"template-explicit-spec-scope",
+"transaction-scope",
+"openmp-scope"
   };
   const scope_kind kind = scope->explicit_spec_p
 ? sk_template_spec : scope->kind;
-- 
2.43.0



[patch,avr,applied] Fix PR target/113156 - ICE when building libgcc

2024-01-15 Thread Georg-Johann Lay

I went ahead and installed Andrew's patch

https://gcc.gnu.org/r14-7240

Johann

Am 15.01.24 um 00:19 schrieb Levente via Gcc-help:
I'm trying to set up a toolchain for avr-dd MCUs, and I get this error 
message when I try to compile gcc:

Lev


--

Author: Andrew Pinski 
Date:   Mon Jan 15 10:31:36 2024 +0100

AVR: target/113156 - Fix ICE due to missing "Save" on 
-m[long-]double= options.


Multilib options -mdouble= and -mlong-double= are not 
orthogonal:
TARGET_HANDLE_OPTION = avr-common.cc::avr_handle_option() 
sets them
such that  sizeof(double) <= sizeof(long double)  is always 
true.


gcc/
PR target/113156
* config/avr/avr.opt (-mdouble, -mlong-double): Add "Save" 
flag.

(-mbranch-cost): Set "Optimization" flag.

diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index ee0b40603f0..c9f2b4d2fe5 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -27,7 +27,7 @@ Target RejectNegative Joined Var(avr_mmcu) 
MissingArgError(missing device or arc

 -mmcu=MCU  Select the target MCU.

 mgas-isr-prologues
-Target Var(avr_gasisr_prologues) UInteger Init(0) Optimization
+Target Var(avr_gasisr_prologues) UInteger Init(0) Optimization
 Allow usage of __gcc_isr pseudo instructions in ISR prologues and 
epilogues.


 mn-flash=
@@ -61,7 +61,7 @@ Target RejectNegative Mask(NO_INTERRUPTS)
 Change the stack pointer without disabling interrupts.

 mbranch-cost=
-Target Joined RejectNegative UInteger Var(avr_branch_cost) Init(0)
+Target Joined RejectNegative UInteger Var(avr_branch_cost) Init(0) 
Optimization
 Set the branch costs for conditional branch instructions.  Reasonable 
values are small, non-negative integers.  The default branch cost is 0.


 mmain-is-OS_task
@@ -124,11 +124,11 @@ Target Mask(ABSDATA)
 Assume that all data in static storage can be accessed by LDS / STS. 
This option is only useful for reduced Tiny devices.


 mdouble=
-Target Joined RejectNegative Var(avr_double) Init(0) Enum(avr_bits_e)
+Target Joined RejectNegative Var(avr_double) Init(0) Enum(avr_bits_e) Save
 -mdouble=  Use  bits wide double type.

 mlong-double=
-Target Joined RejectNegative Var(avr_long_double) Init(0) Enum(avr_bits_e)
+Target Joined RejectNegative Var(avr_long_double) Init(0) 
Enum(avr_bits_e) Save

 -mlong-double= Use  bits wide long double type.

 nodevicelib


[Committed] RISC-V: Add optimized dump check of VLS reduc tests

2024-01-15 Thread Juzhe-Zhong
Add more dump check to robostify the tests.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: Ditto.

---
 .../gcc.target/riscv/rvv/autovec/vls/reduc-1.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-10.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-11.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-12.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-13.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-14.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-15.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-16.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-17.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-18.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-19.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-2.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-20.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-21.c| 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-3.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-4.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-5.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-6.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-7.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-8.c | 14 +-
 .../gcc.target/riscv/rvv/autovec/vls/reduc-9.c | 14 +-
 21 files changed, 273 insertions(+), 21 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c
index 2db25a2b05d..b6d8e6a51ed 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
--param=riscv-autovec-lmul=m8" } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
--param=riscv-autovec-lmul=m8 -fdump-tree-optimized-details" } */
 
 #include "def.h"
 
@@ -29,3 +29,15 @@ DEF_REDUC_PLUS (uint8_t, 4096)
 
 /* { dg-final { scan-assembler-times {vredsum\.vs} 22 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c
index cdbbe11f611..22aace423cf 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
--param=riscv-autovec-lmul=m8" } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
--param=riscv-autove

Re: [PATCH 1/2] RISC-V: delete all the vector psabi checking.

2024-01-15 Thread juzhe.zh...@rivai.ai
LGTM. I think removing riscv_vector_abi can be another separate followup patch.

But plz make sure you have passed the regression before committed.

Thanks.



juzhe.zh...@rivai.ai
 
From: yanzhang.wang
Date: 2024-01-15 14:00
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; lehua.ding; yanzhang.wang
Subject: [PATCH 1/2] RISC-V: delete all the vector psabi checking.
From: Yanzhang Wang 
 
Thanks the
https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/389, we
need not to maintain the psabi checking any more.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_arg_has_vector): Delete.
(riscv_pass_in_vector_p): Delete.
(riscv_init_cumulative_args): Delete the checking.
(riscv_get_arg_info): Delete the checking.
(riscv_function_value): Delete the checking.
* config/riscv/riscv.h: Delete the member for checking.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/binop_vx_constraint-120.c: Delete the -Wno-psabi.
* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto.
* gcc.target/riscv/rvv/base/pr110109-2.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto.
* gcc.target/riscv/rvv/base/spill-10.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto.
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/base/vector-abi-1.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-2.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-3.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-4.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-5.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-6.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-7.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-8.c: Removed.
 
Signed-off-by: Yanzhang Wang 
 
---
Have tested the two patches on my local and there's no regression.
 
---
gcc/config/riscv/riscv.cc | 80 +--
gcc/config/riscv/riscv.h  |  2 -
.../riscv/rvv/base/binop_vx_constraint-120.c  |  2 +-
.../rvv/base/integer_compare_insn_shortcut.c  |  2 +-
.../riscv/rvv/base/mask_insn_shortcut.c   |  2 +-
.../rvv/base/misc_vreinterpret_vbool_vint.c   |  2 +-
.../gcc.target/riscv/rvv/base/pr110109-2.c|  2 +-
.../gcc.target/riscv/rvv/base/scalar_move-9.c |  2 +-
.../gcc.target/riscv/rvv/base/spill-10.c  |  2 +-
.../gcc.target/riscv/rvv/base/spill-11.c  |  2 +-
.../gcc.target/riscv/rvv/base/spill-9.c   |  2 +-
.../gcc.target/riscv/rvv/base/vector-abi-1.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-2.c  | 15 
.../gcc.target/riscv/rvv/base/vector-abi-3.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-4.c  | 16 
.../gcc.target/riscv/rvv/base/vector-abi-5.c  | 20 -
.../gcc.target/riscv/rvv/base/vector-abi-6.c  | 20 -
.../gcc.target/riscv/rvv/base/vector-abi-7.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-8.c  | 14 
.../gcc.target/riscv/rvv/base/vlmul_ext-1.c   |  2 +-
.../base/zero_base_load_store_optimization.c  |  2 +-
.../riscv/rvv/base/zvfh-intrinsic.c   |  2 +-
.../riscv/rvv/base/zvfh-over-zvfhmin.c|  2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|  2 +-
24 files changed, 15 insertions(+), 222 deletions(-)
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-7.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-8.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..e7f7ce605db 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4844,59 +4844,6 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned regno1,
   GEN_INT (offset2;
}
-/* Return true if a vector type is included in the type TYPE.  */
-
-static bool
-riscv_arg_has_vector (const_tree type)
-{
-  if (riscv_v_ext_mode_p (TYPE_MODE (type)))
-return true;
-
-  if (!COMPLETE_TYPE_P (type))
-return false;
-
-  switch (TREE_CODE (type))
-{
-case RECORD_TYPE:
-  /* If it is a record, it is further determined whether its fields have
- vector type.  */
-  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
- if (TREE_CODE (f) == FIELD_DECL)
-   {
- tree field_type = TREE_TYPE (f);
- if (!TYPE_P (

Re: [PATCH] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-15 Thread Robin Dapp
OK, thanks.

Regards
 Robin



Re: [PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-15 Thread Robin Dapp
LGTM.

Regards
 Robin



[PATCH 2/5] tree: Extend DECL_FUNCTION_VERSIONED to an enum

2024-01-15 Thread Andrew Carlotti
This allows code to determine why a particular function is
multiversioned.  For now, this will primarily be used to preserve
existing name mangling quirks when subsequent commits change all
function multiversioning name mangling to use explicit target hooks.
However, this can also be used in future to allow more of the
multiversioning logic to be moved out of target hooks, and to allow
targets to simultaneously enable multiversioning with both 'target' and
'target_version' attributes.

gcc/ChangeLog:

* multiple_target.cc (expand_target_clones): Use new enum value.
* tree-core.h (enum function_version_source): New enum.
(struct tree_function_decl): Extend versioned_function to two
bits.

gcc/cp/ChangeLog:

* decl.cc (maybe_mark_function_versioned): Use new enum value.
(duplicate_decls): Preserve DECL_FUNCTION_VERSIONED enum value.
* module.cc (trees_out::core_bools): Use two bits for
function_decl.versioned_function.
(trees_in::core_bools): Ditto.


diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 
b10a72a87bf0a1cabab52c1e4b657bc8a379b91e..527931cd90a0a779a508a096b2623351fd65a2e8
 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1254,7 +1254,10 @@ maybe_mark_function_versioned (tree decl)
 {
   if (!DECL_FUNCTION_VERSIONED (decl))
 {
-  DECL_FUNCTION_VERSIONED (decl) = 1;
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET;
+  else
+   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET_VERSION;
   /* If DECL_ASSEMBLER_NAME has already been set, re-mangle
 to include the version marker.  */
   if (DECL_ASSEMBLER_NAME_SET_P (decl))
@@ -3159,7 +3162,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, 
bool was_hidden)
   && DECL_FUNCTION_VERSIONED (olddecl))
 {
   /* Set the flag for newdecl so that it gets copied to olddecl.  */
-  DECL_FUNCTION_VERSIONED (newdecl) = 1;
+  DECL_FUNCTION_VERSIONED (newdecl) = DECL_FUNCTION_VERSIONED (olddecl);
   /* newdecl will be purged after copying to olddecl and is no longer
  a version.  */
   cgraph_node::delete_function_version_by_decl (newdecl);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 
aa75e2809d8fdca14443c6b911bf725f6d286d20..ba60d0753f91ef91d45fb5d62f26118be4e34840
 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5473,7 +5473,11 @@ trees_out::core_bools (tree t)
   WB (t->function_decl.looping_const_or_pure_flag);
 
   WB (t->function_decl.has_debug_args_flag);
-  WB (t->function_decl.versioned_function);
+
+  /* versioned_function is a 2 bit enum.  */
+  unsigned vf = t->function_decl.versioned_function;
+  WB ((vf >> 0) & 1);
+  WB ((vf >> 1) & 1);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = t->function_decl.decl_type;
@@ -5618,7 +5622,12 @@ trees_in::core_bools (tree t)
   RB (t->function_decl.looping_const_or_pure_flag);
   
   RB (t->function_decl.has_debug_args_flag);
-  RB (t->function_decl.versioned_function);
+
+  /* versioned_function is a 2 bit enum.  */
+  unsigned vf = 0;
+  vf |= unsigned (b ()) << 0;
+  vf |= unsigned (b ()) << 1;
+  t->function_decl.versioned_function = function_version_source (vf);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = 0;
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 
1fdd279da04a7acc5e8c50f528139f19cadcd5ff..56a1934fe820e91b2fa451dcf6989382c906b98c
 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -383,7 +383,7 @@ expand_target_clones (struct cgraph_node *node, bool 
definition)
   if (decl1_v == NULL)
 decl1_v = node->insert_new_function_version ();
   before = decl1_v;
-  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+  DECL_FUNCTION_VERSIONED (node->decl) = FUNCTION_VERSION_TARGET_CLONES;
 
   for (i = 0; i < attrnum; i++)
 {
@@ -421,7 +421,8 @@ expand_target_clones (struct cgraph_node *node, bool 
definition)
 
   before->next = after;
   after->prev = before;
-  DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+  DECL_FUNCTION_VERSIONED (new_node->decl)
+   = FUNCTION_VERSION_TARGET_CLONES;
 }
 
   XDELETEVEC (attrs);
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 
8a89462bd7ecac52fcdc11c0b57ccf7c190572b3..e159d53f9d11ba848c49499aa963daa2fbcbc648
 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1955,6 +1955,19 @@ enum function_decl_type
   /* 0 values left */
 };
 
+/* Enumerate function multiversioning attributes.  This is used to record which
+   attribute enabled multiversioning on a function, and allows targets to
+   adjust their behaviour accordingly.  */
+
+enum function_version_source
+{
+  FUNCTION_VERSION_NONE = 0,
+  FUNCTION_VERSION_TARGET = 1,
+  FUNCTION_VERSION_TARGET_CLONES = 2,
+  FUNCTION_VERSION_TARGET_VERSION = 3
+};
+
+
 /* 

[PATCH 3/5] Change create_version_clone_with_body parameter name

2024-01-15 Thread Andrew Carlotti
The new name better describes where it is used, and will be more
suitable when subsequent commits make further changes to this function.

gcc/ChangeLog:

* cgraph.h (create_version_clone_with_body): Rename parameter
and change default value.
* cgraphclones.cc: Rename parameter.
* multiple_target.cc (create_target_clone): Update for inverted
boolean parameter.


diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 
b4f028b3f3034056de1050ea1ab93a682197d0e1..16e2b2d045767206d5ccf12ee226f92ee10511d9
 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1015,8 +1015,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
  of the declaration.
 
- If VERSION_DECL is set true, use clone_function_name_numbered for the
- function clone.  Otherwise, use clone_function_name.
+ If TARGET_VERSION is set true, use clone_function_name to set new names.
+ Otherwise, use clone_function_name_numbered.
 
  Return the new version's cgraph node.  */
   cgraph_node *create_version_clone_with_body
@@ -1024,7 +1024,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  vec *tree_map,
  ipa_param_adjustments *param_adjustments,
  bitmap bbs_to_copy, basic_block new_entry_block, const char *clone_name,
- tree target_attributes = NULL_TREE, bool version_decl = true);
+ tree target_attributes = NULL_TREE, bool target_version = false);
 
   /* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
  corresponding to cgraph_node.  */
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 
6d7bc402a29161f473aaa34fb11b24264a7e8b7c..ab9a0fe7ccc5fcf9a0a03363c66016466d39427e
 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -1013,7 +1013,7 @@ cgraph_node::create_version_clone_with_body
vec *tree_map,
ipa_param_adjustments *param_adjustments,
bitmap bbs_to_copy, basic_block new_entry_block, const char *suffix,
-   tree target_attributes, bool version_decl)
+   tree target_attributes, bool target_version)
 {
   tree old_decl = decl;
   cgraph_node *new_version_node = NULL;
@@ -1034,8 +1034,8 @@ cgraph_node::create_version_clone_with_body
 new_decl = copy_node (old_decl);
 
   /* Generate a new name for the new version. */
-  tree fnname = (version_decl ? clone_function_name_numbered (old_decl, suffix)
-   : clone_function_name (old_decl, suffix));
+  tree fnname = (target_version ? clone_function_name (old_decl, suffix)
+   : clone_function_name_numbered (old_decl, suffix));
   DECL_NAME (new_decl) = fnname;
   SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 
56a1934fe820e91b2fa451dcf6989382c906b98c..5fa13ee78035924e5dbd2aec1dd05192342c1a59
 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -281,7 +281,7 @@ create_target_clone (cgraph_node *node, bool definition, 
char *name,
 {
   new_node
= node->create_version_clone_with_body (vNULL, NULL, NULL, NULL, NULL,
-   name, attributes, false);
+   name, attributes, true);
   if (new_node == NULL)
return NULL;
   new_node->force_output = true;


[PATCH 5/5] Add target hook for function version name mangling

2024-01-15 Thread Andrew Carlotti
When using "target" or "target_version" attributes, some parts of the
code assume that the default version has no function-specific mangling
while generating names for the resolver and ifunc.  Since aarch64 now
breaks that assumption, we add an explicit workaround for this issue.

Ideally we'd also use a target hook to generate the ifunc name, but it
turns out to be rather tricky to reproduce the existing x86 double
mangling quirk.

There should be no functional change, except on aarch64 where the
mangling is changed to match the latest proposed spec.

gcc/ChangeLog:

* cgraph.h (create_version_clone_with_body): Update comment.
* cgraphclones.cc: Set assembler name after attaching new
  attributes, and use new target hook.
* config/aarch64/aarch64.cc
(make_resolver_func): Change ifunc and resolver assembler names.
(aarch64_mangle_decl_assembler_name): Rename to ...
(aarch64_mangle_function_version_name): ... this, and adjust
mangling for default version.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Don't use this hook.
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Use this hook instead.
* config/i386/i386-features.cc
(is_valid_asm_symbol): Copy from multiple_target.cc.
(ix86_mangle_function_version_assembler_name): Rename to ...
(ix86_mangle_function_version_name): ... this, and add different
handling for target clones.
(ix86_mangle_decl_assembler_name): Remove target version mangling.
* config/i386/i386-features.h
(ix86_mangle_function_version_name): New declaration.
* config/i386/i386.cc
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Implement this hook.
* config/rs6000/rs6000.cc
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Implement this hook.
(is_valid_asm_symbol): Copy from multiple_target.cc.
(rs6000_mangle_function_version_name): New hook implementation.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add TARGET_MANGLE_FUNCTION_VERSION_NAME hook.
* multiple_target.cc
(create_dispatcher_calls): Use new target hook for mangling.
(is_valid_asm_symbol): Move helper function to targets.
(create_new_asm_name): Move and inline into target hooks.
(create_target_clone): Use new target hook for mangling, and
pass "target_version" instead of 'name' parameter for dump info.
(expand_target_clones): Use new target hook for name mangling.
* target.def (name): Define mangle_function_version_name hook.

gcc/cp/ChangeLog:

* mangle.cc (get_mangled_id): Call the separate target hook for
  target version magnling.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: Update for mangling fixes.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.


diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 
16e2b2d045767206d5ccf12ee226f92ee10511d9..4150c5ea7fce01f49971134a6f8e47cf4e1533b0
 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1015,8 +1015,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
  of the declaration.
 
- If TARGET_VERSION is set true, use clone_function_name to set new names.
- Otherwise, use clone_function_name_numbered.
+ If TARGET_VERSION is set true, use targetm.mangle_function_version_name
+ to set new names.  Otherwise, use clone_function_name_numbered.
 
  Return the new version's cgraph node.  */
   cgraph_node *create_version_clone_with_body
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 
ab9a0fe7ccc5fcf9a0a03363c66016466d39427e..ab8818e7057da3c0bc59f086abcdb5c577d1d935
 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -1033,11 +1033,6 @@ cgraph_node::create_version_clone_with_body
   else
 new_decl = copy_node (old_decl);
 
-  /* Generate a new name for the new version. */
-  tree fnname = (target_version ? clone_function_name (old_decl, suffix)
-   : clone_function_name_numbered (old_decl, suffix));
-  DECL_NAME (new_decl) = fnname;
-  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
 
   DECL_VIRTUAL_P (new_decl) = 0;
@@ -1065,6 +1060,18 @@ cgraph_node::create_version_clone_with_body
return NULL;
 }
 
+  /* Generate a new name for the new version.  */
+  tree fnname;
+  if (target_version)
+{
+  fnname = DECL_ASSEMBLER_NAME (old_decl);
+  fnname = targetm.mangle_function_version_name (new_decl, fnname);
+}
+  else
+fnname = (clone_function_name_numbere

[PATCH] tree-optimization/113385 - wrong loop father with early exit vectorization

2024-01-15 Thread Richard Biener
The following avoids splitting an edge before redirecting it.  This
allows the loop father of the new block to be correct in the first
place.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113385
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
First redirect, then split the exit edge.
---
 gcc/tree-vect-loop-manip.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 1d6e5e045c3..c7e73f65155 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1613,11 +1613,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
{
  if (!alt_loop_exit_block)
{
- alt_loop_exit_block = split_edge (exit);
  edge res = redirect_edge_and_branch (
-   single_succ_edge (alt_loop_exit_block),
+   exit,
new_preheader);
  flush_pending_stmts (res);
+ alt_loop_exit_block = split_edge (res);
  continue;
}
  dest = alt_loop_exit_block;
-- 
2.35.3


[PATCH 0/5] Fix fmv mangling for AArch64

2024-01-15 Thread Andrew Carlotti
This patch series should have no functional change besides the mangling of some 
symbol names on AArch64.

Patch 1/5 adds lots of tests to verify that existing mangling behaviour on x86 
and PowerPC is unchanged.

Patch 2/5 extends DECL_FUNCTION_VERSIONED to a 2-bit enum.

Patches 3/5 and 4/5 are trivial refactorings.

Patch 5/5 is the only patch with any functional change, and that should be
minimal.  I've bootstrapped and tested the entire series on both AArch64 and
x86.  I've also run the new x86 and PowerPC tests on a cross-compiler (with a
temporary hack to disable ifunc availability checks) to verify that function
multiversioning still works on those platforms, with the symbol mangling
unchanged.

I'm aware now that we just started of Stage 4, and this isn't actually a
regression, but is this still ok for master?



Some other things I previously tried that I couldn't make work:
- I had hoped to create an explicit target hook for the ifunc symbol name
mangling as well, but it turned out to be rather tricky to replicate the
existing double mangling weirdness for x86 (I didn't work out how to convince
the frontend to apply C++ mangling to the new symbol on-demand without breaking
other things).

- It's also awkward to try to access the base assembler name after applying
function version mangling - this is why I resorted to just reversing the
default version mangling in the AArch64 backend.  I tried delaying function
version mangling until after the resolver was generated, but that led to issues
with duplicate comdat group names from make_decl_one_only.

There may be less hacky solutions or workarounds for these issues, but they
would involve a more substantial refactoring and will have to wait until GCC 15
(or later).


[PATCH 4/5] cp: Use get_mangled_id in more places in mangle_decl

2024-01-15 Thread Andrew Carlotti
There's no functional change here, but it makes it clearer that all
three locations should be doing the same thing (aside from changes to
flag_abi_version).

gcc/cp/ChangeLog:

* mangle.cc (mangle_decl): Consistently use get_mangled_id.


diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 
a04bc584586f28cb80d21b5c6d647416aa8843df..9bd684608b9e3378292cdb042184ba603b3d69aa
 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -4503,8 +4503,7 @@ mangle_decl (const tree decl)
return;
 
  flag_abi_version = flag_abi_compat_version;
- id2 = mangle_decl_string (decl);
- id2 = targetm.mangle_decl_assembler_name (decl, id2);
+ id2 = get_mangled_id (decl);
  flag_abi_version = save_ver;
 
  if (id2 != id)
@@ -4519,8 +4518,7 @@ mangle_decl (const tree decl)
  || id2 == NULL_TREE)
{
  flag_abi_version = warn_abi_version;
- id2 = mangle_decl_string (decl);
- id2 = targetm.mangle_decl_assembler_name (decl, id2);
+ id2 = get_mangled_id (decl);
}
  flag_abi_version = save_ver;
 


[PATCH 1/5] testsuite: Add tests for fmv symbol presence and mangling

2024-01-15 Thread Andrew Carlotti
These tests are not intended to designate "correct" behaviour, but are
instead intended to demonstrate current behaviour, and provide a warning
if subsequent patches might lead to compatibility issues for targets
with existing function multiversioning support.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: New test.
* g++.target/aarch64/mv-symbols2.C: New test.
* g++.target/aarch64/mv-symbols3.C: New test.
* g++.target/aarch64/mv-symbols4.C: New test.
* g++.target/aarch64/mv-symbols5.C: New test.
* g++.target/aarch64/mvc-symbols1.C: New test.
* g++.target/aarch64/mvc-symbols2.C: New test.
* g++.target/aarch64/mvc-symbols3.C: New test.
* g++.target/aarch64/mvc-symbols4.C: New test.
* g++.target/i386/mv-symbols1.C: New test.
* g++.target/i386/mv-symbols2.C: New test.
* g++.target/i386/mv-symbols3.C: New test.
* g++.target/i386/mv-symbols4.C: New test.
* g++.target/i386/mv-symbols5.C: New test.
* g++.target/i386/mvc-symbols1.C: New test.
* g++.target/i386/mvc-symbols2.C: New test.
* g++.target/i386/mvc-symbols3.C: New test.
* g++.target/i386/mvc-symbols4.C: New test.
* g++.target/powerpc/mvc-symbols1.C: New test.
* g++.target/powerpc/mvc-symbols2.C: New test.
* g++.target/powerpc/mvc-symbols3.C: New test.
* g++.target/powerpc/mvc-symbols4.C: New test.


diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols1.C 
b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
new file mode 100644
index 
..afbd9cacfc72e89ff4a06e3baae7ccc63ed64fc0
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("dotprod")))
+int foo ()
+{
+  return 3;
+}
+__attribute__((target_version("sve+sve2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target_version("sve+sve2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target_version("dotprod")))
+int foo (int)
+{
+  return 4;
+}
+
+int foo (int)
+{
+  return 2;
+}
+
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mv-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z7_Z3foovv\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, 
%gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z7_Z3fooii\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3fooii, 
%gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3fooii,_Z3fooi\.resolver\n" 1 } } */
diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols2.C 
b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
new file mode 100644
index 
..54d2396f40705b6a6f7839ded78dcfddd911f7dd
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_version("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("dotprod")))
+int foo ()
+{
+  return 3;
+}
+__attribute__((target_version("sve+sve2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target_version("sve+sve2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target_version("dotprod")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target_version("default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 0 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, 
%gnu_indirect_function\n" 0 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 0 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\._

[Committed V3] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-15 Thread Juzhe-Zhong
Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin.
Update in v2: Add dynmaic lmul test.

This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)

GCC 13.2.0:

lui a5,%hi(a)
li  a4,19
sb  a4,%lo(a)(a5)
li  a0,0
ret

Trunk GCC:

vsetvli a5,zero,e8,mf2,ta,ma
li  a4,-32768
vid.v   v1
vsetvli zero,zero,e16,m1,ta,ma
addiw   a4,a4,104
vmv.v.i v3,15
lui a1,%hi(a)
li  a0,19
vsetvli zero,zero,e8,mf2,ta,ma
vadd.vi v1,v1,1
sb  a0,%lo(a)(a1)
vsetvli zero,zero,e16,m1,ta,ma
vzext.vf2   v2,v1
vmv.v.x v1,a4
vminu.vvv2,v2,v3
vsrl.vv v1,v1,v2
vslidedown.vi   v1,v1,17
vmv.x.s a0,v1
sneza0,a0
ret

The root cause we are vectorizing the codes inefficiently since we doesn't cost 
len when NITERS < VF.
Leverage loop control of mask targets or rs6000 fixes the regression.

Tested no regression. Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc 
(costs::adjust_vect_cost_per_loop): New function.
(costs::finish_cost): Adjust cost for LOOP LEN with NITERS < VF.
* config/riscv/riscv-vector-costs.h: New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-5.c: New test.
---
 gcc/config/riscv/riscv-vector-costs.cc| 57 +++
 gcc/config/riscv/riscv-vector-costs.h |  2 +
 .../vect/costmodel/riscv/rvv/pr113281-3.c | 18 ++
 .../vect/costmodel/riscv/rvv/pr113281-4.c | 18 ++
 .../vect/costmodel/riscv/rvv/pr113281-5.c | 18 ++
 5 files changed, 113 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-5.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 090275c7efe..90ab93b7506 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1097,9 +1097,66 @@ costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   return record_stmt_cost (stmt_info, where, count * stmt_cost);
 }
 
+/* For some target specific vectorization cost which can't be handled per stmt,
+   we check the requisite conditions and adjust the vectorization cost
+   accordingly if satisfied.  One typical example is to model model and adjust
+   loop_len cost for known_lt (NITERS, VF).  */
+
+void
+costs::adjust_vect_cost_per_loop (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+  && !LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  /* In middle-end loop vectorizer, we don't count the loop_len cost in
+vect_estimate_min_profitable_iters when NITERS < VF, that is, we only
+count cost of len that we need to iterate loop more than once with VF.
+It's correct for most of the cases:
+
+E.g. VF = [4, 4]
+  for (int i = 0; i < 3; i ++)
+a[i] += b[i];
+
+We don't need to cost MIN_EXPR or SELECT_VL for the case above.
+
+However, for some inefficient vectorized cases, it does use MIN_EXPR
+to generate len.
+
+E.g. VF = [256, 256]
+
+Loop body:
+  # loop_len_110 = PHI <18(2), _119(11)>
+  ...
+  _117 = MIN_EXPR ;
+  _118 = 18 - _117;
+  _119 = MIN_EXPR <_118, POLY_INT_CST [256, 256]>;
+  ...
+
+Epilogue:
+  ...
+  _112 = .VEC_EXTRACT (vect_patt_27.14_109, _111);
+
+We cost 1 unconditionally for this situation like other targets which
+apply mask as the loop control.  */
+  rgroup_controls *rgc;
+  unsigned int num_vectors_m1;
+  unsigned int body_stmts = 0;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+   if (rgc->type)
+ body_stmts += num_vectors_m1 + 1;
+
+  add_stmt_cost (body_stmts, scalar_stmt, NULL, NULL, NULL_TREE, 0,
+vect_body);
+}
+}
+
 void
 costs::finish_cost (const vector_costs *scalar_costs)
 {
+  if (loop_vec_info loop_vinfo = dyn_cast (m_vinfo))
+{
+  adjust_vect_cost_per_loop (loop_vinfo);
+}
   vector_costs::finish_cost (scalar_costs);
 }
 
diff --git a/gcc/config/riscv/riscv-vector-costs.h 
b/gcc/config/riscv/riscv-vector-costs.h
index dc0d61f5d4a..4e2bbfd5ca9 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -96,6 +96,8 @@ private:
  V_REGS spills according to the analysis.  */
   bool m_has_unexpected_spills_p = false;
   void record_potential_une

[Committed V2] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-15 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with 
-march=rv64gcv in real hardware.

The root cause is incorrect cost model cause inefficient vectorization which 
makes us performance drop significantly.

So this patch does:

1. Adjust vector to scalar cost by introducing v to scalar reg move.
2. Adjust vec_construct cost since we does spend NUNITS instructions to 
construct the vector.

Tested on both RV32/RV64 no regression, Rebase to the trunk and commit it as it 
is approved by Robin.

PR target/113247

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): Add vector 
to scalar regmove.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Ditto.
* config/riscv/riscv.cc (riscv_builtin_vectorization_cost): Adjust 
vec_construct cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-4.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-vector-costs.cc|   3 +
 gcc/config/riscv/riscv.cc |   4 +-
 .../vect/costmodel/riscv/rvv/pr113247-1.c | 195 ++
 .../vect/costmodel/riscv/rvv/pr113247-2.c |   6 +
 .../vect/costmodel/riscv/rvv/pr113247-3.c |   6 +
 .../vect/costmodel/riscv/rvv/pr113247-4.c |   6 +
 .../riscv/rvv/autovec/vls/reduc-19.c  |   2 +-
 .../riscv/rvv/autovec/vls/reduc-20.c  |   2 +-
 .../riscv/rvv/autovec/vls/reduc-21.c  |   2 +-
 10 files changed, 224 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-4.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4f3b677f4f9..21f6dadf113 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -255,6 +255,8 @@ struct regmove_vector_cost
 {
   const int GR2VR;
   const int FR2VR;
+  const int VR2GR;
+  const int VR2FR;
 };
 
 /* Cost for vector insn classes.  */
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 90ab93b7506..7c9840df4e9 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1056,6 +1056,9 @@ adjust_stmt_cost (enum vect_cost_for_stmt kind, tree 
vectype, int stmt_cost)
 case scalar_to_vec:
   return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->FR2VR
  : costs->regmove->GR2VR);
+case vec_to_scalar:
+  return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->VR2FR
+ : costs->regmove->VR2GR);
 default:
   break;
 }
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ee1a57b321d..568db90a27d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -395,6 +395,8 @@ static const scalable_vector_cost rvv_vla_vector_cost = {
 static const regmove_vector_cost rvv_regmove_vector_cost = {
   2, /* GR2VR  */
   2, /* FR2VR  */
+  2, /* VR2GR  */
+  2, /* VR2FR  */
 };
 
 /* Generic costs for vector insn classes.  It is supposed to be the vector cost
@@ -10522,7 +10524,7 @@ riscv_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
 case vec_construct:
-  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype)) - 1;
+  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
 
 default:
   gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
new file mode 100644
index 000..0d09a624a00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
@@ -0,0 +1,195 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
--param=riscv-autovec-lmul=dynamic" } */
+
+#include 
+
+#define Ch(x,y,z)   (z ^ (x & (y ^ z)))
+#define Maj(x,y,z)  ((x & y) | (z & (x | y)))
+
+#define SHR(x, n)(x >> n)
+#define ROTR(x,n)(SHR(x,n) | (x << (32 - n)))
+#define S1(x)(ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))
+#define S0(x)(ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))
+
+#define s1(x)(ROTR(x,17) ^ ROTR(x,19) ^  SHR(x,10))
+#define s0(x)(ROTR

Re: [PATCH 2/5] tree: Extend DECL_FUNCTION_VERSIONED to an enum

2024-01-15 Thread Richard Biener
On Mon, Jan 15, 2024 at 12:27 PM Andrew Carlotti
 wrote:
>
> This allows code to determine why a particular function is
> multiversioned.  For now, this will primarily be used to preserve
> existing name mangling quirks when subsequent commits change all
> function multiversioning name mangling to use explicit target hooks.
> However, this can also be used in future to allow more of the
> multiversioning logic to be moved out of target hooks, and to allow
> targets to simultaneously enable multiversioning with both 'target' and
> 'target_version' attributes.

Why does module.cc need to stream the bits?  target_clone runs long
after the FE finished.  Instead I wonder why LTO doesn't stream the bits
(tree-streamer-{in,out}.cc)?

You have four states but only mention 'target' and 'target_version', what's the
states actually?  Can you amend the function_version_source enum
comment accordingly?

This looks like stage1 material to me.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * multiple_target.cc (expand_target_clones): Use new enum value.
> * tree-core.h (enum function_version_source): New enum.
> (struct tree_function_decl): Extend versioned_function to two
> bits.
>
> gcc/cp/ChangeLog:
>
> * decl.cc (maybe_mark_function_versioned): Use new enum value.
> (duplicate_decls): Preserve DECL_FUNCTION_VERSIONED enum value.
> * module.cc (trees_out::core_bools): Use two bits for
> function_decl.versioned_function.
> (trees_in::core_bools): Ditto.
>
>
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 
> b10a72a87bf0a1cabab52c1e4b657bc8a379b91e..527931cd90a0a779a508a096b2623351fd65a2e8
>  100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -1254,7 +1254,10 @@ maybe_mark_function_versioned (tree decl)
>  {
>if (!DECL_FUNCTION_VERSIONED (decl))
>  {
> -  DECL_FUNCTION_VERSIONED (decl) = 1;
> +  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
> +   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET;
> +  else
> +   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET_VERSION;
>/* If DECL_ASSEMBLER_NAME has already been set, re-mangle
>  to include the version marker.  */
>if (DECL_ASSEMBLER_NAME_SET_P (decl))
> @@ -3159,7 +3162,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
> hiding, bool was_hidden)
>&& DECL_FUNCTION_VERSIONED (olddecl))
>  {
>/* Set the flag for newdecl so that it gets copied to olddecl.  */
> -  DECL_FUNCTION_VERSIONED (newdecl) = 1;
> +  DECL_FUNCTION_VERSIONED (newdecl) = DECL_FUNCTION_VERSIONED (olddecl);
>/* newdecl will be purged after copying to olddecl and is no longer
>   a version.  */
>cgraph_node::delete_function_version_by_decl (newdecl);
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 
> aa75e2809d8fdca14443c6b911bf725f6d286d20..ba60d0753f91ef91d45fb5d62f26118be4e34840
>  100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -5473,7 +5473,11 @@ trees_out::core_bools (tree t)
>WB (t->function_decl.looping_const_or_pure_flag);
>
>WB (t->function_decl.has_debug_args_flag);
> -  WB (t->function_decl.versioned_function);
> +
> +  /* versioned_function is a 2 bit enum.  */
> +  unsigned vf = t->function_decl.versioned_function;
> +  WB ((vf >> 0) & 1);
> +  WB ((vf >> 1) & 1);
>
>/* decl_type is a (misnamed) 2 bit discriminator. */
>unsigned kind = t->function_decl.decl_type;
> @@ -5618,7 +5622,12 @@ trees_in::core_bools (tree t)
>RB (t->function_decl.looping_const_or_pure_flag);
>
>RB (t->function_decl.has_debug_args_flag);
> -  RB (t->function_decl.versioned_function);
> +
> +  /* versioned_function is a 2 bit enum.  */
> +  unsigned vf = 0;
> +  vf |= unsigned (b ()) << 0;
> +  vf |= unsigned (b ()) << 1;
> +  t->function_decl.versioned_function = function_version_source (vf);
>
>/* decl_type is a (misnamed) 2 bit discriminator. */
>unsigned kind = 0;
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index 
> 1fdd279da04a7acc5e8c50f528139f19cadcd5ff..56a1934fe820e91b2fa451dcf6989382c906b98c
>  100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -383,7 +383,7 @@ expand_target_clones (struct cgraph_node *node, bool 
> definition)
>if (decl1_v == NULL)
>  decl1_v = node->insert_new_function_version ();
>before = decl1_v;
> -  DECL_FUNCTION_VERSIONED (node->decl) = 1;
> +  DECL_FUNCTION_VERSIONED (node->decl) = FUNCTION_VERSION_TARGET_CLONES;
>
>for (i = 0; i < attrnum; i++)
>  {
> @@ -421,7 +421,8 @@ expand_target_clones (struct cgraph_node *node, bool 
> definition)
>
>before->next = after;
>after->prev = before;
> -  DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
> +  DECL_FUNCTION_VERSIONED (new_node->decl)
> +   = FUNCTION_VERSION_TARGET_CLONES;
>  }
>
>XDELETEVEC (attrs)

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello Richard:

On 15/01/24 3:03 pm, Richard Biener wrote:
> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal  wrote:
>>
>> Hello All:
>>
>> This patch add the vecload pass to replace adjacent memory accesses lxv with 
>> lxvp
>> instructions. This pass is added before ira pass.
>>
>> vecload pass removes one of the defined adjacent lxv (load) and replace with 
>> lxvp.
>> Due to removal of one of the defined loads the allocno is has only uses but
>> not defs.
>>
>> Due to this IRA pass doesn't assign register pairs like registers in 
>> sequence.
>> Changes are made in IRA register allocator to assign sequential registers to
>> adjacent loads.
>>
>> Some of the registers are cleared and are not set as profitable registers due
>> to zero cost is greater than negative costs and checks are added to compare
>> positive costs.
>>
>> LRA register is changed not to reassign them to different register and form
>> the sequential register pairs intact.
>>
>>
>> contrib/check_GNU_style.sh run on patch looks good.
>>
>> Bootstrapped and regtested for powerpc64-linux-gnu.
>>
>> Spec2017 benchmarks are run and I get impressive benefits for some of the FP
>> benchmarks.
> i
> I want to point out the aarch64 target recently got a ld/st fusion
> pass which sounds
> related.  It would be nice to have at least common infrastructure for
> this (the aarch64
> one also looks quite more powerful)

load/store fusion pass in aarch64 is scheduled to use before peephole2 pass 
and after register allocator pass. In our case, if we do after register 
allocator
then we should keep register assigned to lower offset load and other load
that is adjacent to previous load with offset difference of 16 is removed.

Then we are left with one load with lower offset and register assigned 
by register allocator for lower offset load should be lower than other
adjacent load. If not, we need to change it to lower register and 
propagate them with all the uses of the variable. Similary for other
adjacent load that we are removing, register needs to be propagated to
all the uses.

In that case we are doing the work of register allocator. In most of our
example testcases the lower offset load is assigned greater register 
than other adjacent load by register allocator and hence we are left
with propagating them always and almost redoing the register allocator
work.

Is it same/okay to use load/store fusion pass as on aarch64 for our cases
considering the above scenario.

Please let me know what do you think. 

Thanks & Regards
Ajit
>> Thanks & Regards
>> Ajit
>>
>>
>> rs6000: New  pass for replacement of adjacent lxv with lxvp.
>>
>> New pass to replace adjacent memory addresses lxv with lxvp.
>> This pass is registered before ira rtl pass.
>>
>> 2024-01-14  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000-passes.def: Registered vecload pass.
>> * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>> * config.gcc: Add new executable.
>> * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>> pass.
>> * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>> * config/rs6000/t-rs6000: Add new rule.
>> * ira-color.cc: Form register pair with adjacent loads.
>> * lra-assigns.cc: Skip modifying register pair assignment.
>> * lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
>> * lra.cc: Initialize pseudo_conflict field.
>> * ira-build.cc: Use of REG_FREQ.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/powerpc/vecload.C: New test.
>> * g++.target/powerpc/vecload1.C: New test.
>> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>> ---
>>  gcc/config.gcc|   4 +-
>>  gcc/config/rs6000/rs6000-passes.def   |   4 +
>>  gcc/config/rs6000/rs6000-protos.h |   5 +-
>>  gcc/config/rs6000/rs6000-vecload-opt.cc   | 432 ++
>>  gcc/config/rs6000/rs6000.cc   |   8 +-
>>  gcc/config/rs6000/t-rs6000|   5 +
>>  gcc/ira-color.cc  | 220 -
>>  gcc/lra-assigns.cc| 118 -
>>  gcc/lra-int.h |   2 +
>>  gcc/lra.cc|   1 +
>>  gcc/testsuite/g++.target/powerpc/vecload.C|  15 +
>>  gcc/testsuite/g++.target/powerpc/vecload1.C   |  22 +
>>  .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
>>  13 files changed, 816 insertions(+), 24 deletions(-)
>>  create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
>>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
>>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C
>>
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index f0676c830e8..4cf15e807de 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -518,7 +518,7 @@ or1k*-*-*)
>> ;;
>>  powerpc*-*-*)
>> cpu_type=rs6000
>> - 

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal



On 15/01/24 6:14 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> On 15/01/24 3:03 pm, Richard Biener wrote:
>> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal  wrote:
>>>
>>> Hello All:
>>>
>>> This patch add the vecload pass to replace adjacent memory accesses lxv 
>>> with lxvp
>>> instructions. This pass is added before ira pass.
>>>
>>> vecload pass removes one of the defined adjacent lxv (load) and replace 
>>> with lxvp.
>>> Due to removal of one of the defined loads the allocno is has only uses but
>>> not defs.
>>>
>>> Due to this IRA pass doesn't assign register pairs like registers in 
>>> sequence.
>>> Changes are made in IRA register allocator to assign sequential registers to
>>> adjacent loads.
>>>
>>> Some of the registers are cleared and are not set as profitable registers 
>>> due
>>> to zero cost is greater than negative costs and checks are added to compare
>>> positive costs.
>>>
>>> LRA register is changed not to reassign them to different register and form
>>> the sequential register pairs intact.
>>>
>>>
>>> contrib/check_GNU_style.sh run on patch looks good.
>>>
>>> Bootstrapped and regtested for powerpc64-linux-gnu.
>>>
>>> Spec2017 benchmarks are run and I get impressive benefits for some of the FP
>>> benchmarks.
>> i
>> I want to point out the aarch64 target recently got a ld/st fusion
>> pass which sounds
>> related.  It would be nice to have at least common infrastructure for
>> this (the aarch64
>> one also looks quite more powerful)
> 
> load/store fusion pass in aarch64 is scheduled to use before peephole2 pass 
> and after register allocator pass. In our case, if we do after register 
> allocator
> then we should keep register assigned to lower offset load and other load
> that is adjacent to previous load with offset difference of 16 is removed.
> 
> Then we are left with one load with lower offset and register assigned 
> by register allocator for lower offset load should be lower than other
> adjacent load. If not, we need to change it to lower register and 
> propagate them with all the uses of the variable. Similary for other
> adjacent load that we are removing, register needs to be propagated to
> all the uses.
> 
> In that case we are doing the work of register allocator. In most of our
> example testcases the lower offset load is assigned greater register 
> than other adjacent load by register allocator and hence we are left
> with propagating them always and almost redoing the register allocator
> work.
> 
> Is it same/okay to use load/store fusion pass as on aarch64 for our cases
> considering the above scenario.
> 
> Please let me know what do you think. 
> 

Also Mike and Kewwn suggested to use this pass \before IRA register
allocator. They are in To List. They have other concerns doing after 
register allocator.

They have responded in other mail Chain.

Mike and Kewen ! Please respond.

Thanks & Regards
Ajit
> Thanks & Regards
> Ajit
>>> Thanks & Regards
>>> Ajit
>>>
>>>
>>> rs6000: New  pass for replacement of adjacent lxv with lxvp.
>>>
>>> New pass to replace adjacent memory addresses lxv with lxvp.
>>> This pass is registered before ira rtl pass.
>>>
>>> 2024-01-14  Ajit Kumar Agarwal  
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000-passes.def: Registered vecload pass.
>>> * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>>> * config.gcc: Add new executable.
>>> * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>>> pass.
>>> * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>>> * config/rs6000/t-rs6000: Add new rule.
>>> * ira-color.cc: Form register pair with adjacent loads.
>>> * lra-assigns.cc: Skip modifying register pair assignment.
>>> * lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
>>> * lra.cc: Initialize pseudo_conflict field.
>>> * ira-build.cc: Use of REG_FREQ.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/vecload.C: New test.
>>> * g++.target/powerpc/vecload1.C: New test.
>>> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>>> ---
>>>  gcc/config.gcc|   4 +-
>>>  gcc/config/rs6000/rs6000-passes.def   |   4 +
>>>  gcc/config/rs6000/rs6000-protos.h |   5 +-
>>>  gcc/config/rs6000/rs6000-vecload-opt.cc   | 432 ++
>>>  gcc/config/rs6000/rs6000.cc   |   8 +-
>>>  gcc/config/rs6000/t-rs6000|   5 +
>>>  gcc/ira-color.cc  | 220 -
>>>  gcc/lra-assigns.cc| 118 -
>>>  gcc/lra-int.h |   2 +
>>>  gcc/lra.cc|   1 +
>>>  gcc/testsuite/g++.target/powerpc/vecload.C|  15 +
>>>  gcc/testsuite/g++.target/powerpc/vecload1.C   |  22 +
>>>  .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
>>>  13 files changed, 816 insertions(+),

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-15 Thread Maxim Kuvyrkov
Hi Vladimir,
Hi Jeff,

Richard and Alexander have reviewed this patch and [I assume] have no
further comments.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
>
> [For simplicity, let's assume find_inc(backwards==true)].
> In this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/96388
> PR rtl-optimization/111554
> * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 48 +++
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..005fc0f567e 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii,
> rtx_insn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding data
> in MII has been filled in, this function is called to find a suitable
> add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies
> between
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if
> backwards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if
> !backwards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>sd_iterator_def sd_it;
>dep_t dep;
> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK :
> SD_LIST_FORW;
> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it = sd_iterator_start (mii->mem_insn,
> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>while (sd_iterator_cond (&sd_it, &dep))
>  {
>dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>rtx_insn *pro = DEP_PRO (dep);
>rtx_insn *con = DEP_CON (dep);
> -  rtx_insn *inc_cand = backwards ? pro : con;
> +  rtx_insn *inc_cand;
> +  int n_inc_deps;
> +
>if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
> goto next;
> +
> +  if (backwards)
> +   {
> + inc_cand = pro;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> +   }
> +  else
> +   {
> + inc_cand = con;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> +   }
> +
> +  /* In the FOR_EACH_DEP loop below we will create additional
> n_inc_deps
> +for mem_insn.  This by itself is not a problem, since each
> mem_insn
> +will have only a few inc_insns associated with it.  However, if
> +we consider that a single inc_insn may have a lot of mem_insns,
> AND,
> +on top of that, a few other inc_insns associated with it --
> +those _other inc_insns_ will get (n_mem_deps * number of MEM
> insns)
> +dependencies created for them.  This may cause an exponential
> +growth of memory usage and scheduling time.
> +See PR96388 for details.
> +We [heuristically] use n_inc_deps as a proxy for the number of MEM
> +insns, and drop opportunities for breaking modifiable_mem
> dependencies
> +when dependency lists grow beyond reasonable size.  */
> +  if (n_mem_deps * n_inc_deps
> + >= param_max_pending_list_length * param_max_pending_list_length)
> +   goto next;
> +
>if (parse_add_or_inc (mii, inc_cand, backwards))
> {
>   struct dep_replacement *desc;
> @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>   desc->insn = mii->mem_insn;
>   move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
>  INSN_SPEC_BACK_DEPS (con));
> +
> + /* Make sure that n_inc_deps above is consistent with
> dependencies
> +we creat

Re: [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc

2024-01-15 Thread Maxim Kuvyrkov
Dear RTL maintainers,

Gently ping.  This patch adds a couple of new functions to lists.cc, which
then are used to simplify logic in the scheduler.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> This patch simplifies logic behind deps_join(), which will be
> important for the upcoming improvements of sched-deps.cc logging.
>
> The only functional change is that when deps_join() is called with
> empty state for the 2nd argument, it will not reverse INSN_ and
> EXPR_LISTs in the 1st argument.  Before this patch the lists were
> reversed due to use of concat_*_LIST().  Now, with copy_*_LIST()
> used for this case, the lists will remain in the original order.
>
> gcc/ChangeLog:
>
> * lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions.
> * rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare.
> * sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers.
> (concat_insn_mem_list): Simplify.
> (deps_join): Update
> ---
>  gcc/lists.cc | 30 +++-
>  gcc/rtl.h|  4 +++-
>  gcc/sched-rgn.cc | 51 +++-
>  3 files changed, 61 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/lists.cc b/gcc/lists.cc
> index 2cdf37ad533..83e7bf32176 100644
> --- a/gcc/lists.cc
> +++ b/gcc/lists.cc
> @@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp)
>free_list ((rtx *)listp, &unused_insn_list);
>  }
>
> +/* Make a copy of the EXPR_LIST list LINK and return it.  */
> +rtx_expr_list *
> +copy_EXPR_LIST (rtx_expr_list *link)
> +{
> +  rtx_expr_list *new_queue;
> +  rtx_expr_list **pqueue = &new_queue;
> +
> +  for (; link; link = link->next ())
> +{
> +  rtx x = link->element ();
> +  rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x,
> NULL);
> +  *pqueue = newlink;
> +  pqueue = (rtx_expr_list **)&XEXP (newlink, 1);
> +}
> +  *pqueue = NULL;
> +  return new_queue;
> +}
> +
>  /* Make a copy of the INSN_LIST list LINK and return it.  */
>  rtx_insn_list *
>  copy_INSN_LIST (rtx_insn_list *link)
> @@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link)
>return new_queue;
>  }
>
> +/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD.  */
> +rtx_expr_list *
> +concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old)
> +{
> +  rtx_expr_list *new_rtx = old;
> +  for (; copy; copy = copy->next ())
> +new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (),
> new_rtx);
> +  return new_rtx;
> +}
> +
>  /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD.  */
>  rtx_insn_list *
>  concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old)
>  {
>rtx_insn_list *new_rtx = old;
> -  for (; copy ; copy = copy->next ())
> +  for (; copy; copy = copy->next ())
>  {
>new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx);
>PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy));
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e4b6cc0dbb5..7e952d7cbeb 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **);
>  extern void free_INSN_LIST_list (rtx_insn_list **);
>  extern void free_EXPR_LIST_node (rtx);
>  extern void free_INSN_LIST_node (rtx);
> +extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx);
> +extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *);
>  extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *);
> +extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *);
>  extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *);
> -extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **);
>  extern rtx remove_list_elem (rtx, rtx *);
>  extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **);
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..da3ec0458ff 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn
> *tail)
>
>  static class deps_desc *bb_deps;
>
> +/* Return a new insn_list with all the elements from the two input
> lists.  */
> +static rtx_insn_list *
> +concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
> +{
> +  if (!old)
> +return copy_INSN_LIST (copy);
> +  return concat_INSN_LIST (copy, old);
> +}
> +
> +/* Return a new expr_list with all the elements from the two input
> lists.  */
> +static rtx_expr_list *
> +concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
> +{
> +  if (!old)
> +return copy_EXPR_LIST (copy);
> +  return concat_EXPR_LIST (copy, old);
> +}
> +
>  static void
>  concat_insn_mem_list (rtx_insn_list *copy_insns,
>   rtx_expr_list *copy_mems,
>   rtx_insn_list **old_insns_p,
>   rtx_expr_list **old_mems_p)
>  {
> -  rtx_insn_list *new_insns = 

Re: [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().

2024-01-15 Thread Maxim Kuvyrkov
Dear scheduler maintainers,

Gentle ping.  This patch is borderline trivial and affects only the lucky
few who debug sched-deps.cc code.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> Better propagate flags from dump_lists() into dump_dep() and
> add a missing "]" in dump_lists().
>
> gcc/ChangeLog:
>
> * sched-deps.cc (DUMP_DEP_PRO): Improve comment.
> (dump_dep_flags): Remove.
> (DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue
> numbering from DUMP_DEP_* flags.
> (dump_lists): Update and fix.
> ---
>  gcc/sched-deps.cc | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 005fc0f567e..4d357079a7a 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t);
>  /* Define flags for dump_dep ().  */
>
>  /* Dump producer of the dependence.  */
> -#define DUMP_DEP_PRO (2)
> +#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and
> +   DUMP_LISTS_ALL.  */
>
>  /* Dump consumer of the dependence.  */
>  #define DUMP_DEP_CON (4)
> @@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags)
>fprintf (dump, ">");
>  }
>
> -/* Default flags for dump_dep ().  */
> -static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON);
> -
>  /* Dump all fields of DEP to STDERR.  */
>  void
>  sd_debug_dep (dep_t dep)
> @@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it)
>  }
>
>  /* Dump size of the lists.  */
> -#define DUMP_LISTS_SIZE (2)
> +#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1)  */
>
>  /* Dump dependencies of the lists.  */
> -#define DUMP_LISTS_DEPS (4)
> +#define DUMP_LISTS_DEPS (64)
>
>  /* Dump all information about the lists.  */
>  #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS)
>
>  /* Dump deps_lists of INSN specified by TYPES to DUMP.
> -   FLAGS is a bit mask specifying what information about the lists needs
> -   to be printed.
> +   FLAGS is a bit mask specifying what information about the lists and
> +   the individual deps needs to be printed, this is a combination of
> +   DUMP_DEP_* and DUMP_LISTS_* flags.
> If FLAGS has the very first bit set, then dump all information about
> -   the lists and propagate this bit into the callee dump functions.  */
> +   the lists and deps propagate this bit into the callee dump functions.
> */
>  static void
>  dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
>  {
> @@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn,
> sd_list_types_def types, int flags)
>  {
>FOR_EACH_DEP (insn, types, sd_it, dep)
> {
> - dump_dep (dump, dep, dump_dep_flags | all);
> + dump_dep (dump, dep, flags | all);
>   fprintf (dump, " ");
> }
>  }
> +
> +  fprintf (dump, "]");
>  }
>
>  /* Dump all information about deps_lists of INSN specified by TYPES
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org


Re: [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis

2024-01-15 Thread Maxim Kuvyrkov
Dear scheduler maintainers,

Gentle ping.  This is a trivial patch, which makes debugging sched-deps.cc
slightly more enjoyable.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> gcc/ChangeLog:
>
> * sched-deps.cc (sd_add_dep, find_inc): Add logging about
> dependency creation.
> ---
>  gcc/sched-deps.cc | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 4d357079a7a..2a87158ba4b 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p)
>   in the bitmap caches of dependency information.  */
>if (true_dependency_cache != NULL)
>  set_dependency_caches (dep);
> +
> +  if (sched_verbose >= 9)
> +{
> +  fprintf (sched_dump, "created dependency ");
> +  dump_dep (sched_dump, dep, 1);
> +  fprintf (sched_dump, "\n");
> +}
>  }
>
>  /* Add or update backward dependence between INSN and ELEM
> @@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>  we create.  */
>   gcc_assert (mii->inc_insn == inc_cand);
>
> + int n_deps_created = 0;
>   if (backwards)
> {
>   FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> -   add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> - REG_DEP_TRUE);
> +   {
> + add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> +   REG_DEP_TRUE);
> + ++n_deps_created;
> +   }
> }
>   else
> {
>   FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
> -   add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> - REG_DEP_ANTI);
> +   {
> + add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> +   REG_DEP_ANTI);
> + ++n_deps_created;
> +   }
> }
> + if (sched_verbose >= 6)
> +   fprintf (sched_dump,
> +"created %d deps for mem_insn %d due to "
> +"inc_insn %d %s deps\n",
> +n_deps_created, INSN_UID (mii->mem_insn),
> +INSN_UID (mii->inc_insn),
> +backwards ? "backward" : "forward");
> +
>   return true;
> }
>  next:
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org


Re: [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts

2024-01-15 Thread Maxim Kuvyrkov
Dear scheduler maintainers,

Gentle ping.  This is a trivial cleanup.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> gcc/ChangeLog:
>
> * sched-deps.cc (init_deps, init_deps_reg_last): Simplify.
> (free_deps): Remove useless code.
> ---
>  gcc/sched-deps.cc | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 2a87158ba4b..e0d3c97d935 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool
> lazy_reg_last)
>int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num
> ());
>
>deps->max_reg = max_reg;
> -  if (lazy_reg_last)
> -deps->reg_last = NULL;
> -  else
> -deps->reg_last = XCNEWVEC (struct deps_reg, max_reg);
> +  deps->reg_last = NULL;
> +  if (!lazy_reg_last)
> +init_deps_reg_last (deps);
>INIT_REG_SET (&deps->reg_last_in_use);
>
>deps->pending_read_insns = 0;
> @@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
>  void
>  init_deps_reg_last (class deps_desc *deps)
>  {
> -  gcc_assert (deps && deps->max_reg > 0);
> -  gcc_assert (deps->reg_last == NULL);
> -
> +  gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL);
>deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg);
>  }
>
> @@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps)
>   it at all.  */
>free (deps->reg_last);
>deps->reg_last = NULL;
> -
> -  deps = NULL;
>  }
>
>  /* Remove INSN from dependence contexts DEPS.  */
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org


Re: [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis

2024-01-15 Thread Maxim Kuvyrkov
Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov 
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (1).
>
> gcc/ChangeLog:
>
> * sched-deps.cc (print-rtl.h): Include for str_pattern_slim().
> (dump_reg_pending_data): New function.
> (sched_analyze_insn): Use it.
> ---
>  gcc/sched-deps.cc | 90 ++-
>  1 file changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index e0d3c97d935..f9290c82fd2 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "cselib.h"
>  #include "function-abi.h"
> +#include "print-rtl.h"
>
>  #ifdef INSN_SCHEDULING
>
> @@ -432,10 +433,24 @@ dep_spec_p (dep_t dep)
>return false;
>  }
>
> +/* These regsets describe how a single instruction affects registers.
> +   Their "life-time" is restricted to a single call of
> sched_analyze_insn().
> +   They are populated by sched_analyze_1() and sched_analyze_2(), and
> +   then sched_analyze_insn() transfers data from these into
> deps->reg_last[i].
> +   Near the end sched_analyze_insn() clears these regsets for the next
> +   insn.  */
>  static regset reg_pending_sets;
>  static regset reg_pending_clobbers;
>  static regset reg_pending_uses;
>  static regset reg_pending_control_uses;
> +
> +/* Similar to reg_pending_* regsets, this variable specifies whether
> +   the current insn analyzed by sched_analyze_insn() is a scheduling
> +   barrier that should "split" dependencies inside a block.  Internally
> +   sched-deps.cc does this by pretending that the barrier insn uses and
> +   sets all registers.
> +   Near the end sched_analyze_insn() transfers barrier info from this
> variable
> +   into deps->last_reg_pending_barrier.  */
>  static enum reg_pending_barrier_mode reg_pending_barrier;
>
>  /* Hard registers implicitly clobbered or used (or may be implicitly
> @@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET
> *temp, rtx_insn *insn)
>*temp &= ~ira_no_alloc_regs;
>  }
>
> -/* Analyze an INSN with pattern X to find all dependencies.  */
> +/* Dump state of reg_pending_* data for debug purposes.
> +   Dump only non-empty data to reduce log clobber.  */
> +static void
> +dump_reg_pending_data (FILE *file, rtx_insn *insn)
> +{
> +  fprintf (file, "\n");
> +  fprintf (file, ";; sched_analysis after insn %d: %s\n",
> +  INSN_UID (insn), str_pattern_slim (PATTERN (insn)));
> +
> +  if (!REG_SET_EMPTY_P (reg_pending_sets)
> +  || !REG_SET_EMPTY_P (reg_pending_clobbers)
> +  || !REG_SET_EMPTY_P (reg_pending_uses)
> +  || !REG_SET_EMPTY_P (reg_pending_control_uses))
> +{
> +  fprintf (file, ";; insn reg");
> +  if (!REG_SET_EMPTY_P (reg_pending_sets))
> +   {
> + fprintf (file, " sets(");
> + dump_regset (reg_pending_sets, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_clobbers))
> +   {
> + fprintf (file, " clobbers(");
> + dump_regset (reg_pending_clobbers, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_uses))
> +   {
> + fprintf (file, " uses(");
> + dump_regset (reg_pending_uses, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_control_uses))
> +   {
> + fprintf (file, " control(");
> + dump_regset (reg_pending_control_uses, file);
> + fprintf (file, ")");
> +   }
> +  fprintf (file, "\n");
> +}
> +
> +  if (reg_pending_barrier)
> +fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier);
> +
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)
> +  || !hard_reg_set_empty_p (implicit_reg_pending_uses))
> +{
> +  fprintf (file, ";; insn reg");
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers))
> +   {
> + print_hard_reg_set (file, implicit_reg_pending_clobbers,
> + " implicit clobbers(", false);
> + fprintf (file, ")");
> +   }
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_uses))
> +   {
> + print_hard_reg_set (file, implicit_reg_pending_uses,
> + " implicit uses(", false);
> + fprintf (file, ")");
> +   }
> +  fprintf (file, "\n");
> +}
> +}
> +
> +/* Analyze an INSN with pattern X to find all dependencies.
> +   This analysis uses two m

Re: [PATCH v3 8/8] Improve logging of scheduler dependency analysis context

2024-01-15 Thread Maxim Kuvyrkov
Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov 
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (2).
>
> Dependency analysis contexts (aka deps_desc objects) are huge, but
> each instruction affects only a small amount of data in these objects.
> Therefore, it is most useful to dump differential information
> compared to the dependency state after previous instruction.
>
> gcc/ChangeLog:
>
> * sched-deps.cc (reset_deps, dump_rtx_insn_list)
> (rtx_insn_list_same_p): New helper functions.
> (dump_deps_desc_diff): New function to dump dependency information.
> (sched_analysis_prev_deps): New static variable.
> (sched_analyze_insn): Dump dependency information.
> (init_deps_global, finish_deps_global): Handle
> sched_analysis_prev_deps.
> * sched-int.h (struct deps_reg): Update comments.
> * sched-rgn.cc (concat_insn_list, concat_expr_list): Update
> comments.
> ---
>  gcc/sched-deps.cc | 197 ++
>  gcc/sched-int.h   |   9 ++-
>  gcc/sched-rgn.cc  |   5 ++
>  3 files changed, 210 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index f9290c82fd2..edca9927e23 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn)
>  sd_delete_dep (sd_it);
>  }
>
> +/* Re-initialize existing dependency context DEPS to be a copy of FROM.
> */
> +static void
> +reset_deps (class deps_desc *deps, class deps_desc *from)
> +{
> +  free_deps (deps);
> +  init_deps (deps, false);
> +  deps_join (deps, from);
> +}
> +
>  /* All insns in a scheduling group except the first should only have
> dependencies on the previous insn in the group.  So we find the
> first instruction in the scheduling group by walking the dependence
> @@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn)
>  }
>  }
>
> +/* Dump rtx_insn_list LIST.
> +   Consider moving to lists.cc if there are users outside of
> sched-deps.cc.  */
> +static void
> +dump_rtx_insn_list (FILE *file, rtx_insn_list *list)
> +{
> +  for (; list; list = list->next ())
> +fprintf (file, " %d", INSN_UID (list->insn ()));
> +}
> +
> +/* Return TRUE if lists A and B have same elements in the same order.  */
> +static bool
> +rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b)
> +{
> +  for (; a && b; a = a->next (), b = b->next ())
> +if (a->insn () != b->insn ())
> +  return false;
> +
> +  if (a || b)
> +return false;
> +
> +  return true;
> +}
> +
> +/* Dump parts of DEPS that are different from PREV.
> +   Dumping all information from dependency context produces huge
> +   hard-to-analize logs; differential dumping is way more managable.  */
> +static void
> +dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc
> *prev)
> +{
> +  /* Each "paragraph" is a single line of output.  */
> +
> +  /* Note on param_max_pending_list_length:
> + During normal dependency analysis various lists should not exceed
> this
> + limit.  Searching for "!!!" in scheduler logs can point to potential
> bugs
> + or poorly-handled corner-cases.  */
> +
> +  if (!rtx_insn_list_same_p (deps->pending_read_insns,
> +prev->pending_read_insns))
> +{
> +  fprintf (file, ";; deps pending mem reads length(%d):",
> +  deps->pending_read_list_length);
> +  if ((deps->pending_read_list_length +
> deps->pending_write_list_length)
> + >= param_max_pending_list_length)
> +   fprintf (file, "%d insns!!!", deps->pending_read_list_length);
> +  else
> +   dump_rtx_insn_list (file, deps->pending_read_insns);
> +  fprintf (file, "\n");
> +}
> +
> +  if (!rtx_insn_list_same_p (deps->pending_write_insns,
> +prev->pending_write_insns))
> +{
> +  fprintf (file, ";; deps pending mem writes length(%d):",
> +  deps->pending_write_list_length);
> +  if ((deps->pending_read_list_length +
> deps->pending_write_list_length)
> + >= param_max_pending_list_length)
> +   fprintf (file, "%d insns!!!", deps->pending_write_list_length);
> +  else
> +   dump_rtx_insn_list (file, deps->pending_write_insns);
> +  fprintf (file, "\n");
> +}
> +
> +  if (!rtx_insn_list_same_p (deps->pending_jump_insns,
> +prev->pending_jump_insns))
> +{
> +  fprintf (file, ";; deps pending jump length(%d):",
> +  deps->pending_flush_length);
> +  

[PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-15 Thread Richard Biener
When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

  src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

8: r134:DI=const(`g'+0x44)
9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
  REG_UNUSED flags:CC
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x4c)&0xfff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
  REG_DEAD r134:DI
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
  REG_DEAD r133:DI
  REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

Bootstrapped and tested on x86_64-unknown-linux-gnu (together
with 2/2).  Jeff, can you maybe throw this on your tester?
Jakub, you did the PR64025 fix which was for a similar issue.

OK for trunk?

Thanks,
Richard.

PR rtl-optimization/113255
* alias.cc (find_base_term): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

* gcc.dg/torture/pr113255.c: New testcase.
---
 gcc/alias.cc| 28 +
 gcc/testsuite/gcc.dg/torture/pr113255.c | 27 
 2 files changed, 32 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113255.c

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 99008b0390d..bdc119822b4 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -2077,31 +2077,13 @@ find_base_term (rtx x, vec= 0)
+{
+  r++;
+  e[1].y++;
+}
+  g[1] = e[1];
+  return r;
+}
+
+int
+main ()
+{
+  test (1);
+  if (g[1].y != 1)
+__builtin_abort ();
+  return 0;
+}
-- 
2.35.3



[PATCH 2/2] find_base_value part

2024-01-15 Thread Richard Biener
The following adjusts find_base_value similar as to what
find_base_term was adjusted for PR113255.

* alias.cc (known_base_value_p): Remove.
(find_base_value): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.
---
 gcc/alias.cc | 62 
 1 file changed, 4 insertions(+), 58 deletions(-)

diff --git a/gcc/alias.cc b/gcc/alias.cc
index bdc119822b4..29b3ba82dba 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -1400,26 +1400,6 @@ unique_base_value_p (rtx x)
   return GET_CODE (x) == ADDRESS && GET_MODE (x) == Pmode;
 }
 
-/* Return true if X is known to be a base value.  */
-
-static bool
-known_base_value_p (rtx x)
-{
-  switch (GET_CODE (x))
-{
-case LABEL_REF:
-case SYMBOL_REF:
-  return true;
-
-case ADDRESS:
-  /* Arguments may or may not be bases; we don't know for sure.  */
-  return GET_MODE (x) != VOIDmode;
-
-default:
-  return false;
-}
-}
-
 /* Inside SRC, the source of a SET, find a base address.  */
 
 static rtx
@@ -1490,46 +1470,12 @@ find_base_value (rtx src)
 case PLUS:
 case MINUS:
   {
-   rtx temp, src_0 = XEXP (src, 0), src_1 = XEXP (src, 1);
-
-   /* If either operand is a REG that is a known pointer, then it
-  is the base.  */
-   if (REG_P (src_0) && REG_POINTER (src_0))
- return find_base_value (src_0);
-   if (REG_P (src_1) && REG_POINTER (src_1))
- return find_base_value (src_1);
-
-   /* If either operand is a REG, then see if we already have
-  a known value for it.  */
-   if (REG_P (src_0))
- {
-   temp = find_base_value (src_0);
-   if (temp != 0)
- src_0 = temp;
- }
-
-   if (REG_P (src_1))
- {
-   temp = find_base_value (src_1);
-   if (temp!= 0)
- src_1 = temp;
- }
-
-   /* If either base is named object or a special address
-  (like an argument or stack reference), then use it for the
-  base term.  */
-   if (src_0 != 0 && known_base_value_p (src_0))
- return src_0;
-
-   if (src_1 != 0 && known_base_value_p (src_1))
- return src_1;
+   rtx src_0 = XEXP (src, 0), src_1 = XEXP (src, 1);
 
-   /* Guess which operand is the base address:
-  If either operand is a symbol, then it is the base.  If
-  either operand is a CONST_INT, then the other is the base.  */
-   if (CONST_INT_P (src_1) || CONSTANT_P (src_0))
+   /* If either operand is a CONST_INT, then the other is the base.  */
+   if (CONST_INT_P (src_1))
  return find_base_value (src_0);
-   else if (CONST_INT_P (src_0) || CONSTANT_P (src_1))
+   else if (CONST_INT_P (src_0))
  return find_base_value (src_1);
 
return 0;
-- 
2.35.3


[PATCH] aarch64: Don't record hazards against paired insns [PR113356]

2024-01-15 Thread Alex Coplan
Hi,

For the testcase in the PR, we try to pair insns where the first has
writeback and the second uses the updated base register.  This causes us
to record a hazard against the second insn, thus narrowing the move
range away from the end of the BB.

However, it isn't meaningful to record hazards against the other insn
in the pair, as this doesn't change which pairs can be formed, and also
doesn't change where the pair is formed (from the perspective of
nondebug insns).

To see why this is the case, consider the two cases:

 - Suppoe we are finding hazards for insns[0].  If we record a hazard
   against insns[1], then range.last becomes
   insns[1]->prev_nondebug_insn (), but note that this is equivalent to
   inserting after insns[1] (since insns[1] is being changed).
 - Now consider finding hazards for insns[1].  Suppose we record
   insns[0] as a hazard.  Then we set range.first = insns[0], which is a
   no-op.

As such, it seems better to never record hazards against the other insn
in the pair, as we check whether the insns themselves are suitable for
combination separately (e.g. for ldp checking that they use distinct
transfer registers).  Avoiding unnecessarily narrowing the move range
avoids unnecessarily re-ordering over debug insns.

This should also mean that we can only narrow the move range away from
the end of the BB in the case that we record a hazard for insns[0]
against insns[1]->prev_nondebug_insn () or earlier.  This means that for
the non-call-exceptions case, either the move range includes insns[1],
or we reject the pair (thus the assert tripped in the PR should always
hold).

Bootstrapped/regtested on aarch64-linux-gnu with/without ldp passes
enabled on top of the PR113070 fixes, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113356
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::try_fuse_pair):
Don't record hazards against the opposite insn in the pair.

gcc/testsuite/ChangeLog:

PR target/113356
* gcc.target/aarch64/pr113356.C: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 703cfb1228c..6834560c5fb 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2216,11 +2216,11 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
access_size,
  ignore[j] = &XEXP (cand_mems[j], 0);
 
   insn_info *h = first_hazard_after (insns[0], ignore[0]);
-  if (h && *h <= *insns[1])
+  if (h && *h < *insns[1])
cand.hazards[0] = h;
 
   h = latest_hazard_before (insns[1], ignore[1]);
-  if (h && *h >= *insns[0])
+  if (h && *h > *insns[0])
cand.hazards[1] = h;
 
   if (!cand.viable ())
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113356.C 
b/gcc/testsuite/gcc.target/aarch64/pr113356.C
new file mode 100644
index 000..0de17a54a53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113356.C
@@ -0,0 +1,8 @@
+// { dg-do compile }
+// { dg-options "-Os -fnon-call-exceptions -mearly-ldp-fusion 
-fno-lifetime-dse -fno-forward-propagate" }
+struct Class1 {
+  virtual ~Class1() {}
+  unsigned Field1;
+};
+struct Class4 : virtual Class1 {};
+int main() { Class4 var1; }


Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 4:31 AM, Richard Biener  
> wrote:
> 
> On Fri, Jan 12, 2024 at 6:30 PM Qing Zhao  wrote:
>> 
>> Thanks a lot for the reply.
>> 
>>> On Jan 12, 2024, at 11:28 AM, Richard Biener  
>>> wrote:
>>> 
>>> 
>>> 
 Am 12.01.2024 um 16:55 schrieb Qing Zhao :
 
 Hi,
 
 I have some questions on using the utility routine “unshare_expr”:
 
 From my understanding, there should be NO shared nodes in a GENERIC 
 function.
 Otherwise, gimplication might fail.
>>> 
>>> There is sharing and this is why we unshare everything before 
>>> gimplification.
>> 
>> Okay, so, the "unsharing everything” is done automatically by the compiler 
>> before gimplification?
>> I don’t need to worry about this?
>> 
>> I see  many places in FE where “unshare_expr” is used, for example, 
>> “ubsan_instrument_division”,
>> “ubsan_instrument_shift”, etc.
> 
> It's likely doing sth during gimplification.

So, before gimplification,  when inserting tree node, we don’t need manually
 add unshare_expr since the gimplification will automatically unshare nodes. 

However, during or after gimplfication, when inserting nodes, we should manually
 add unshare_expr when we put the same “tree” into multiple operands.

Is this understanding correct?

>> So, usually, when should “unshare_expr” be used?
> 
> You should usually unshare when you are putting the same 'tree' into multiple
> operands.  

Okay, I see.

> Using a SAVE_EXPR avoids redundant code but it also requires
> that the SAVE_EXPR uses are ordered.

“Require the SAVE_EXPR uses are ordered”, does this mean that 
SAVE_EXPRs for the same node should be in a correct order? Or something else?


> 
 Therefore, when we insert new tree nodes manually into the GENERIC 
 function, we should
 Make sure there is no shared nodes introduced.
 
 1. Is the above understanding correct?
>>> 
>>> No
>>> 
 2. Is there any tool to check there is no shared nodes in the GENERIC 
 function?
 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
 function? If so, what are they?
>>> 
>>> There’s some allowed sharing on GIMPLE and a verifier.
>> What’s the name of the verifier that I can search and check?
> 
> verify_node_sharing

Okay, thanks. 

> 
>>> 
 4. For the following:
 
 If both “op1” and “op2” are existing tree nodes in the current GENERIC 
 function,
 and we will insert a new tree node:
 
 tree  new_tree = build2 (CODE, TYPE, op1, op2)
 
 
 Should we add “unshare_expr” on both “op1” and “op2” as:
 
 Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
 ?
>>> 
>>> Not necessarily but instead you have to watch for evaluating side-effects 
>>> only once.  See save_expr.
>> 
>> Okay.  I see.
>>> 
 
 If op2 is a node that is allowed to be shared, whether the additional 
 “unshare_expr” on it trigger any potential problem?
>>> 
>>> If you unshare side-effects that’s generating wrong-code.  Otherwise 
>>> unsharing is safe.
>> 
>> Okay.
>> Will unnecessary unshareing produce redundant IRs?
> 
> Yes.
> 
>> All my questions for unshare_expr relate to a  LTO bug that I currently 
>> stuck with
>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
>> -flto, no issue):
>> 
>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>> during IPA pass: modref
>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in 
>> LTO streams
>> 0x14c3993 lto_write_tree
>>../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>> 0x14c3aeb lto_output_tree_1
>> 
>> And the value of the tree node that triggered the ICE is:
>> (gdb) call debug_tree(expr)
>> 
>>nothrow
>>def_stmt
>>version:13 in-free-list>
>> 
>> Is there any good way to debug LTO bug?
> 
> This happens usually when you have a VLA type and its type fields are not
> properly gimplified which usually happens because the frontend fails to
> insert a gimplification point for it (a DECL_EXPR).
Thanks for the info. 
This is happening for a structure TYPE with FAM (I guess similar as VLA?)
Usually what’s the good solution to it?

thanks.

Qing
> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>> 
>>> 
>>> Richard
>>> 
 Thanks a lot for your help.
 
 Qing
>> 



Re: HELP: Questions on unshare_expr

2024-01-15 Thread Jakub Jelinek
On Mon, Jan 15, 2024 at 02:54:26PM +, Qing Zhao wrote:
> So, before gimplification,  when inserting tree node, we don’t need manually
>  add unshare_expr since the gimplification will automatically unshare nodes. 

There are cases where unshare_expr is needed even then, such as the uses in
the sanitizer, because code is then modifying suboperands in place later on
and if things are shared bad things happen.  If trees can be shared until
they are unshared before gimplification, one doesn't need to worry about it,
sure.

> However, during or after gimplfication, when inserting nodes, we should 
> manually
>  add unshare_expr when we put the same “tree” into multiple operands.

Yes.

> > Using a SAVE_EXPR avoids redundant code but it also requires
> > that the SAVE_EXPR uses are ordered.
> 
> “Require the SAVE_EXPR uses are ordered”, does this mean that 
> SAVE_EXPRs for the same node should be in a correct order? Or something else?

The basic requirement is that SAVE_EXPR is evaluated somewhere in a code
which dominates all other uses of the SAVE_EXPR.
Say
SAVE_EXPR , if (x) use1 (SAVE_EXPR ); 
else use2 (SAVE_EXPR );
is fine, but
if (x) use1 (SAVE_EXPR ); else use2 (SAVE_EXPR 
);
is not.  Because in the latter case, it will be gimplified into evaluating
the complex expression in the conditional code guarded on if (x != 0), save
into some temporary variable and then in the else code just use that
temporary variable, except it is uninitialized then.

Jakub



Re: [PATCH v4 0/3] RISC-V: Add intrinsics for Bitmanip and Scalar Crypto extensions

2024-01-15 Thread Christoph Müllner
On Mon, Jan 15, 2024 at 9:35 AM Liao Shihua  wrote:
>
> Update v3 -> v4:
>   1.Typo fix.
>   2.Only test *intrinsic-32 on rv32 and *intrinsic-64 on rv64.
>   3.Update Copyright year to 2024.

Thanks, for fixing the rv32/rv64 issues!
I've tested this series: no regressions and all new tests pass.
I've also reviewed this series again, and I think it is ready.
I can push once a maintainer approves (e.g. Kito or Jeff).

Thanks for working on this!

>
> Update v2 -> v3:
>   1. Change pattern mode form X to GPR in orcb, clmul, and brev8.
>   2. Add emulated testsuite.
>   3. Removed duplicate testsuite between built-in and intrinsic.
>   4. Typo fix.
>
> Update v1 -> v2:
>   1. Rename *_intrinsic-* to *_intrinsic-XLEN.
>   2. Typo fix.
>   3. Intrinsics with immediate arguments will use marcos at O0 .
>
> It's a little patch add just provides a mapping from the RV intrinsics to the 
> builtin
> names within GCC.
>
> Liao Shihua (3):
>   RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function
> testsuites
>   RISC-V: Add C intrinsic for Scalar Crypto Extension
>   RISC-V: Add C intrinsic for Scalar Bitmanip Extension
>
>  gcc/config.gcc|   2 +-
>  gcc/config/riscv/bitmanip.md  |  10 +-
>  gcc/config/riscv/crypto.md|   4 +-
>  gcc/config/riscv/riscv-builtins.cc|  22 ++
>  gcc/config/riscv/riscv-cmo.def|  12 +-
>  gcc/config/riscv/riscv-ftypes.def |   2 +
>  gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
>  gcc/config/riscv/riscv_bitmanip.h | 297 +
>  gcc/config/riscv/riscv_crypto.h   | 309 ++
>  .../riscv/scalar_bitmanip_intrinsic-32.c  |  97 ++
>  .../scalar_bitmanip_intrinsic-64-emulated.c   |  33 ++
>  .../riscv/scalar_bitmanip_intrinsic-64.c  | 115 +++
>  .../riscv/scalar_crypto_intrinsic-32.c| 115 +++
>  .../riscv/scalar_crypto_intrinsic-64.c| 123 +++
>  .../gcc.target/riscv/zbb_32_bswap-1.c |  11 -
>  gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c  |  11 -
>  gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c  |  12 -
>  .../riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} |   3 +-
>  gcc/testsuite/gcc.target/riscv/zbbw.c |  26 --
>  gcc/testsuite/gcc.target/riscv/zbc32.c|  23 --
>  gcc/testsuite/gcc.target/riscv/zbc64.c|  23 --
>  gcc/testsuite/gcc.target/riscv/zbkb32.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zbkb64.c   |   5 -
>  gcc/testsuite/gcc.target/riscv/zbkc32.c   |  17 -
>  gcc/testsuite/gcc.target/riscv/zbkc64.c   |  17 -
>  gcc/testsuite/gcc.target/riscv/zbkx32.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zbkx64.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zknd32-2.c |  28 --
>  gcc/testsuite/gcc.target/riscv/zknd64-2.c |  42 ---
>  gcc/testsuite/gcc.target/riscv/zkne32-2.c |  28 --
>  gcc/testsuite/gcc.target/riscv/zkne64-2.c |  34 --
>  .../gcc.target/riscv/zknh-sha256-32.c |  10 -
>  .../gcc.target/riscv/zknh-sha256-64.c |  28 --
>  .../gcc.target/riscv/zknh-sha512-32.c |  42 ---
>  .../gcc.target/riscv/zknh-sha512-64.c |  31 --
>  gcc/testsuite/gcc.target/riscv/zksed32-2.c|  29 --
>  gcc/testsuite/gcc.target/riscv/zksed64-2.c|  29 --
>  gcc/testsuite/gcc.target/riscv/zksh32.c   |  19 --
>  gcc/testsuite/gcc.target/riscv/zksh64.c   |  19 --
>  39 files changed, 1149 insertions(+), 555 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv_bitmanip.h
>  create mode 100644 gcc/config/riscv/riscv_crypto.h
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-32.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-32.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-64.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c
>  rename gcc/testsuite/gcc.target/riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} 
> (59%)
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbbw.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc32.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc64.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx64.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd32-2.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zknd64-2.c
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/zkne32-2.c
>  delete mode 100644 gcc/testsuit

Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].

2024-01-15 Thread Robin Dapp
I gave it another shot now by introducing a separate function as
Richard suggested.  It's probably not at the location he intended.

The way I read the discussion there hasn't been any consensus
on how (or rather where) to properly tackle the problem.  Any
other ideas still?

Regards
 Robin


Found in PR112971 this patch adds folding support for bitwise operations
of const duplicate zero/one vectors with stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not be folded to {0, 0, 0, ...}.

gcc/ChangeLog:

PR middle-end/112971

* fold-const.cc (simplify_const_binop): New function for binop
simplification of two constant vectors when element-wise
handling is not necessary.
(const_binop): Call new function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
---
 gcc/fold-const.cc | 31 +++
 .../gcc.target/riscv/rvv/autovec/pr112971.c   | 18 +++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 385e4a69ab3..2ef425aec0f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1343,6 +1343,29 @@ distributes_over_addition_p (tree_code op, int opno)
 }
 }
 
+/* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
+   is the other operand.  Try to use the value of OP to simplify the
+   operation in one step, without having to process individual elements.  */
+static tree
+simplify_const_binop (tree_code code, tree op, tree other_op,
+ int index ATTRIBUTE_UNUSED)
+{
+  /* AND, IOR as well as XOR with a zerop can be simplified directly.  */
+  if (TREE_CODE (op) == VECTOR_CST && TREE_CODE (other_op) == VECTOR_CST)
+{
+  if (integer_zerop (other_op))
+   {
+ if (code == BIT_IOR_EXPR || code == BIT_XOR_EXPR)
+   return op;
+ else if (code == BIT_AND_EXPR)
+   return other_op;
+   }
+}
+
+  return NULL_TREE;
+}
+
+
 /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
constant.  We assume ARG1 and ARG2 have the same data type, or at least
are the same kind of constant and the same machine mode.  Return zero if
@@ -1646,6 +1669,14 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
return build_complex (type, real, imag);
 }
 
+  tree simplified;
+  if ((simplified = simplify_const_binop (code, arg1, arg2, 0)))
+return simplified;
+
+  if (commutative_tree_code (code)
+  && (simplified = simplify_const_binop (code, arg2, arg1, 1)))
+return simplified;
+
   if (TREE_CODE (arg1) == VECTOR_CST
   && TREE_CODE (arg2) == VECTOR_CST
   && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)),
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
new file mode 100644
index 000..816ebd3c493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
@@ -0,0 +1,18 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" 
}  */
+
+int a;
+short b[9];
+char c, d;
+void e() {
+  d = 0;
+  for (;; d++) {
+if (b[d])
+  break;
+a = 8;
+for (; a >= 0; a--) {
+  char *f = &c;
+  *f &= d == (a & d);
+}
+  }
+}
-- 
2.43.0




[pushed][PR113354][LRA]: Fixing LRA failure on building MIPS GCC

2024-01-15 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113354

The patch was tested on building MIPS target.

The patch was successfully tested and bootstrapped on x86-64, ppc64le, 
aarch64.


commit 5f662bce28618ea5417f68a17d5c2d34b052ecb2
Author: Vladimir N. Makarov 
Date:   Mon Jan 15 10:19:39 2024 -0500

[PR113354][LRA]: Fixing LRA failure on building MIPS GCC

My recent patch for PR112918 triggered a hidden bug in LRA on MIPS.  A
pseudo is matched to a register constraint and assigned to a hard
registers at the first constraint sub-pass but later it is matched to
X constraint.  Keeping this pseudo in the register (MD0) prevents to
use the same register for another pseudo in the insn and this results
in LRA failure.  The patch fixes this by spilling the pseudo at the
constraint subpass when the chosen alternative constraint not require
hard register anymore.

gcc/ChangeLog:

PR middle-end/113354
* lra-constraints.cc (curr_insn_transform): Spill pseudo only used
in the insn if the corresponding operand does not require hard
register anymore.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index dc41bc3d6c6..3379b88ff22 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4491,23 +4491,18 @@ curr_insn_transform (bool check_only_p)
 	{
 	  if (goal_alt[i] == NO_REGS
 	  && REG_P (op)
-	  /* When we assign NO_REGS it means that we will not
-		 assign a hard register to the scratch pseudo by
-		 assigment pass and the scratch pseudo will be
-		 spilled.  Spilled scratch pseudos are transformed
-		 back to scratches at the LRA end.  */
-	  && ira_former_scratch_operand_p (curr_insn, i)
-	  && ira_former_scratch_p (REGNO (op)))
+	  && (regno = REGNO (op)) >= FIRST_PSEUDO_REGISTER
+	  /* We assigned a hard register to the pseudo in the past but now
+		 decided to spill it for the insn.  If the pseudo is used only
+		 in this insn, it is better to spill it here as we free hard
+		 registers for other pseudos referenced in the insn.  The most
+		 common case of this is a scratch register which will be
+		 transformed to scratch back at the end of LRA.  */
+	  && lra_get_regno_hard_regno (regno) >= 0
+	  && bitmap_single_bit_set_p (&lra_reg_info[regno].insn_bitmap))
 	{
-	  int regno = REGNO (op);
 	  lra_change_class (regno, NO_REGS, "  Change to", true);
-	  if (lra_get_regno_hard_regno (regno) >= 0)
-		/* We don't have to mark all insn affected by the
-		   spilled pseudo as there is only one such insn, the
-		   current one.  */
-		reg_renumber[regno] = -1;
-	  lra_assert (bitmap_single_bit_set_p
-			  (&lra_reg_info[REGNO (op)].insn_bitmap));
+	  reg_renumber[regno] = -1;
 	}
 	  /* We can do an optional reload.  If the pseudo got a hard
 	 reg, we might improve the code through inheritance.  If


Re: [PATCH v4 0/3] RISC-V: Add intrinsics for Bitmanip and Scalar Crypto extensions

2024-01-15 Thread Kito Cheng
Ok :)


Christoph Müllner  於 2024年1月15日 週一 23:17 寫道:

> On Mon, Jan 15, 2024 at 9:35 AM Liao Shihua  wrote:
> >
> > Update v3 -> v4:
> >   1.Typo fix.
> >   2.Only test *intrinsic-32 on rv32 and *intrinsic-64 on rv64.
> >   3.Update Copyright year to 2024.
>
> Thanks, for fixing the rv32/rv64 issues!
> I've tested this series: no regressions and all new tests pass.
> I've also reviewed this series again, and I think it is ready.
> I can push once a maintainer approves (e.g. Kito or Jeff).
>
> Thanks for working on this!
>
> >
> > Update v2 -> v3:
> >   1. Change pattern mode form X to GPR in orcb, clmul, and brev8.
> >   2. Add emulated testsuite.
> >   3. Removed duplicate testsuite between built-in and intrinsic.
> >   4. Typo fix.
> >
> > Update v1 -> v2:
> >   1. Rename *_intrinsic-* to *_intrinsic-XLEN.
> >   2. Typo fix.
> >   3. Intrinsics with immediate arguments will use marcos at O0 .
> >
> > It's a little patch add just provides a mapping from the RV intrinsics
> to the builtin
> > names within GCC.
> >
> > Liao Shihua (3):
> >   RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function
> > testsuites
> >   RISC-V: Add C intrinsic for Scalar Crypto Extension
> >   RISC-V: Add C intrinsic for Scalar Bitmanip Extension
> >
> >  gcc/config.gcc|   2 +-
> >  gcc/config/riscv/bitmanip.md  |  10 +-
> >  gcc/config/riscv/crypto.md|   4 +-
> >  gcc/config/riscv/riscv-builtins.cc|  22 ++
> >  gcc/config/riscv/riscv-cmo.def|  12 +-
> >  gcc/config/riscv/riscv-ftypes.def |   2 +
> >  gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
> >  gcc/config/riscv/riscv_bitmanip.h | 297 +
> >  gcc/config/riscv/riscv_crypto.h   | 309 ++
> >  .../riscv/scalar_bitmanip_intrinsic-32.c  |  97 ++
> >  .../scalar_bitmanip_intrinsic-64-emulated.c   |  33 ++
> >  .../riscv/scalar_bitmanip_intrinsic-64.c  | 115 +++
> >  .../riscv/scalar_crypto_intrinsic-32.c| 115 +++
> >  .../riscv/scalar_crypto_intrinsic-64.c| 123 +++
> >  .../gcc.target/riscv/zbb_32_bswap-1.c |  11 -
> >  gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c  |  11 -
> >  gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c  |  12 -
> >  .../riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} |   3 +-
> >  gcc/testsuite/gcc.target/riscv/zbbw.c |  26 --
> >  gcc/testsuite/gcc.target/riscv/zbc32.c|  23 --
> >  gcc/testsuite/gcc.target/riscv/zbc64.c|  23 --
> >  gcc/testsuite/gcc.target/riscv/zbkb32.c   |  18 -
> >  gcc/testsuite/gcc.target/riscv/zbkb64.c   |   5 -
> >  gcc/testsuite/gcc.target/riscv/zbkc32.c   |  17 -
> >  gcc/testsuite/gcc.target/riscv/zbkc64.c   |  17 -
> >  gcc/testsuite/gcc.target/riscv/zbkx32.c   |  18 -
> >  gcc/testsuite/gcc.target/riscv/zbkx64.c   |  18 -
> >  gcc/testsuite/gcc.target/riscv/zknd32-2.c |  28 --
> >  gcc/testsuite/gcc.target/riscv/zknd64-2.c |  42 ---
> >  gcc/testsuite/gcc.target/riscv/zkne32-2.c |  28 --
> >  gcc/testsuite/gcc.target/riscv/zkne64-2.c |  34 --
> >  .../gcc.target/riscv/zknh-sha256-32.c |  10 -
> >  .../gcc.target/riscv/zknh-sha256-64.c |  28 --
> >  .../gcc.target/riscv/zknh-sha512-32.c |  42 ---
> >  .../gcc.target/riscv/zknh-sha512-64.c |  31 --
> >  gcc/testsuite/gcc.target/riscv/zksed32-2.c|  29 --
> >  gcc/testsuite/gcc.target/riscv/zksed64-2.c|  29 --
> >  gcc/testsuite/gcc.target/riscv/zksh32.c   |  19 --
> >  gcc/testsuite/gcc.target/riscv/zksh64.c   |  19 --
> >  39 files changed, 1149 insertions(+), 555 deletions(-)
> >  create mode 100644 gcc/config/riscv/riscv_bitmanip.h
> >  create mode 100644 gcc/config/riscv/riscv_crypto.h
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-32.c
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64.c
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-32.c
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-64.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c
> >  rename gcc/testsuite/gcc.target/riscv/{zbb_32_bswap-2.c =>
> zbb_bswap16.c} (59%)
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbbw.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc32.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc64.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbkx6

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-15 Thread Alex Coplan
On 13/01/2024 15:46, Alex Coplan wrote:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
> 
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
> 
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
> 
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
> 
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
> 
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
> 
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/113070
>   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
>   (decide_stp_strategy): Reanme to ...
>   (try_repurpose_store): ... this.
>   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
>   construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
>  1 file changed, 194 insertions(+), 54 deletions(-)
> 

> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to 
> form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +FIRST,
> +INSERT,
> +FIXUP_USE,
> +LAST,
> +DONE
> +  };
> +
> +  enum class action
> +  {
> +TOMBSTONE,
> +CHANGE,
> +INSERT,
> +FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +action type;
> +insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +   insn_info *repurpose,
> +   insn_info *dest)
> +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> +
> +  change get_change () const
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + return {
> +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[0]
> + };
> +  case state::LAST:
> + return {
> +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[1]
> + };
> +  case state::INSERT:
> + return { action::INSERT, m_dest };
> +  case state::FIXUP_USE:
> + return { action::FIXUP_USE, m_use->insn () };
> +  case state::DONE:
> + break;
> +  }
> +
> +gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + if (m_repurpose)
> +   m_state = state::LAST;
> + else
> +   m_state = state::INSERT;
> + break;
> +  case state::INSERT:
> +  {
> + def_info *def = memory_access (m_insns[0]->defs ());
> + while (*def->next_def ()->insn () <= *m_dest)
> +   def = def->next_def (

Re: [PATCH v4 0/3] RISC-V: Add intrinsics for Bitmanip and Scalar Crypto extensions

2024-01-15 Thread Christoph Müllner
On Mon, Jan 15, 2024 at 4:35 PM Kito Cheng  wrote:
>
> Ok :)

I've re-created changelog entries in commit messages (commit hook
rejected the commits)
and pushed.

Thanks,
Christoph


>
>
> Christoph Müllner  於 2024年1月15日 週一 23:17 寫道:
>>
>> On Mon, Jan 15, 2024 at 9:35 AM Liao Shihua  wrote:
>> >
>> > Update v3 -> v4:
>> >   1.Typo fix.
>> >   2.Only test *intrinsic-32 on rv32 and *intrinsic-64 on rv64.
>> >   3.Update Copyright year to 2024.
>>
>> Thanks, for fixing the rv32/rv64 issues!
>> I've tested this series: no regressions and all new tests pass.
>> I've also reviewed this series again, and I think it is ready.
>> I can push once a maintainer approves (e.g. Kito or Jeff).
>>
>> Thanks for working on this!
>>
>> >
>> > Update v2 -> v3:
>> >   1. Change pattern mode form X to GPR in orcb, clmul, and brev8.
>> >   2. Add emulated testsuite.
>> >   3. Removed duplicate testsuite between built-in and intrinsic.
>> >   4. Typo fix.
>> >
>> > Update v1 -> v2:
>> >   1. Rename *_intrinsic-* to *_intrinsic-XLEN.
>> >   2. Typo fix.
>> >   3. Intrinsics with immediate arguments will use marcos at O0 .
>> >
>> > It's a little patch add just provides a mapping from the RV intrinsics to 
>> > the builtin
>> > names within GCC.
>> >
>> > Liao Shihua (3):
>> >   RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function
>> > testsuites
>> >   RISC-V: Add C intrinsic for Scalar Crypto Extension
>> >   RISC-V: Add C intrinsic for Scalar Bitmanip Extension
>> >
>> >  gcc/config.gcc|   2 +-
>> >  gcc/config/riscv/bitmanip.md  |  10 +-
>> >  gcc/config/riscv/crypto.md|   4 +-
>> >  gcc/config/riscv/riscv-builtins.cc|  22 ++
>> >  gcc/config/riscv/riscv-cmo.def|  12 +-
>> >  gcc/config/riscv/riscv-ftypes.def |   2 +
>> >  gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
>> >  gcc/config/riscv/riscv_bitmanip.h | 297 +
>> >  gcc/config/riscv/riscv_crypto.h   | 309 ++
>> >  .../riscv/scalar_bitmanip_intrinsic-32.c  |  97 ++
>> >  .../scalar_bitmanip_intrinsic-64-emulated.c   |  33 ++
>> >  .../riscv/scalar_bitmanip_intrinsic-64.c  | 115 +++
>> >  .../riscv/scalar_crypto_intrinsic-32.c| 115 +++
>> >  .../riscv/scalar_crypto_intrinsic-64.c| 123 +++
>> >  .../gcc.target/riscv/zbb_32_bswap-1.c |  11 -
>> >  gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c  |  11 -
>> >  gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c  |  12 -
>> >  .../riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} |   3 +-
>> >  gcc/testsuite/gcc.target/riscv/zbbw.c |  26 --
>> >  gcc/testsuite/gcc.target/riscv/zbc32.c|  23 --
>> >  gcc/testsuite/gcc.target/riscv/zbc64.c|  23 --
>> >  gcc/testsuite/gcc.target/riscv/zbkb32.c   |  18 -
>> >  gcc/testsuite/gcc.target/riscv/zbkb64.c   |   5 -
>> >  gcc/testsuite/gcc.target/riscv/zbkc32.c   |  17 -
>> >  gcc/testsuite/gcc.target/riscv/zbkc64.c   |  17 -
>> >  gcc/testsuite/gcc.target/riscv/zbkx32.c   |  18 -
>> >  gcc/testsuite/gcc.target/riscv/zbkx64.c   |  18 -
>> >  gcc/testsuite/gcc.target/riscv/zknd32-2.c |  28 --
>> >  gcc/testsuite/gcc.target/riscv/zknd64-2.c |  42 ---
>> >  gcc/testsuite/gcc.target/riscv/zkne32-2.c |  28 --
>> >  gcc/testsuite/gcc.target/riscv/zkne64-2.c |  34 --
>> >  .../gcc.target/riscv/zknh-sha256-32.c |  10 -
>> >  .../gcc.target/riscv/zknh-sha256-64.c |  28 --
>> >  .../gcc.target/riscv/zknh-sha512-32.c |  42 ---
>> >  .../gcc.target/riscv/zknh-sha512-64.c |  31 --
>> >  gcc/testsuite/gcc.target/riscv/zksed32-2.c|  29 --
>> >  gcc/testsuite/gcc.target/riscv/zksed64-2.c|  29 --
>> >  gcc/testsuite/gcc.target/riscv/zksh32.c   |  19 --
>> >  gcc/testsuite/gcc.target/riscv/zksh64.c   |  19 --
>> >  39 files changed, 1149 insertions(+), 555 deletions(-)
>> >  create mode 100644 gcc/config/riscv/riscv_bitmanip.h
>> >  create mode 100644 gcc/config/riscv/riscv_crypto.h
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-32.c
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/scalar_bitmanip_intrinsic-64.c
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-32.c
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/scalar_crypto_intrinsic-64.c
>> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_32_bswap-1.c
>> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-1.c
>> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap-2.c
>> >  rename gcc/testsuite/gcc.target/riscv/{zbb_32_bswap-2.c => zbb_bswap16.c} 
>> > (59%)
>> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbbw.c
>> >  delete mode 100644 gcc/testsuite/gcc.target/riscv/zbc32.c
>> >  delete mod

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-15 Thread Radek Barton
Hello Richard.

Thank you for your suggestion. I am sending a patch update according to it.

> How about avoiding the clash by using the names HIDDEN, SYMBOL_TYPE and
> SYMBOL_SIZE, with SYMBOL_TYPE taking the symbol type as argument?

Yes, unless the symbol is explicitly exported using `__declspec(dllexport)`, it 
will be effectively hidden.

> What's the practical effect of not marking the symbols as hidden on
> mingw32?  Will they still be local to the DLL/EXE, since they haven't
>been explicitly exported?  (Sorry for the probably dumb question.)

Best regards,

Radek Bartoň

v4-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch
Description: v4-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch


[patch,avr,applied] Document -mskip-bug

2024-01-15 Thread Georg-Johann Lay

Option -mskip-bug is no more missing from the documentation.

Johann

--

AVR: Document option -mskip-bug.

gcc/
* doc/invoke.texi (AVR Options) [-mskip-bug]: Add documentation.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1773f0d3f0c..01170c0ce5c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -883,12 +883,12 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args
 -mbranch-cost=@var{cost}
--mcall-prologues  -mgas-isr-prologues  -mint8 -mflmap
--mdouble=@var{bits} -mlong-double=@var{bits}
+-mcall-prologues  -mgas-isr-prologues  -mint8  -mflmap
+-mdouble=@var{bits}  -mlong-double=@var{bits}
 -mn_flash=@var{size}  -mno-interrupts
 -mmain-is-OS_task  -mrelax  -mrmw  -mstrict-X  -mtiny-stack
--mrodata-in-ram -mfract-convert-truncate
--mshort-calls  -nodevicelib  -nodevicespecs
+-mrodata-in-ram  -mfract-convert-truncate
+-mshort-calls  -mskip-bug  -nodevicelib  -nodevicespecs
 -Waddr-space-convert  -Wmisspelled-isr}

 @emph{Blackfin Options}
@@ -24213,6 +24213,12 @@ Assume that @code{RJMP} and @code{RCALL} can 
target the whole

 program memory. This option is used for multilib generation and selection
 for the devices from architecture @code{avrxmega3}.

+@opindex mskip-bug
+@item -mskip-bug
+
+Generate code without skips (@code{CPSE}, @code{SBRS},
+@code{SBRC}, @code{SBIS}, @code{SBIC}) over 32-bit instructions.
+
 @opindex msp8
 @item -msp8
 Treat the stack pointer register as an 8-bit register,


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-15 Thread Patrick Palka
On Sat, 13 Jan 2024, Jonathan Wakely wrote:

> On Sat, 13 Jan 2024 at 00:06, Patrick Palka  wrote:
> >
> > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> >
> > > On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
> > > >
> > > > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> > > >
> > > > > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
> > > > > >
> > > > > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > > > > >
> > > > > > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > > > > > >
> > > > > > > -- >8 --
> > > > > > >
> > > > > > > This is the last part of PR libstdc++/108822 implementing 
> > > > > > > P2255R2, which
> > > > > > > makes it ill-formed to create a std::tuple that would bind a 
> > > > > > > reference
> > > > > > > to a temporary.
> > > > > > >
> > > > > > > The dangling checks are implemented as deleted constructors for 
> > > > > > > C++20
> > > > > > > and higher, and as Debug Mode static assertions in the 
> > > > > > > constructor body
> > > > > > > for older standards. This is similar to the 
> > > > > > > r13-6084-g916ce577ad109b
> > > > > > > changes for std::pair.
> > > > > > >
> > > > > > > As part of this change, I've reimplemented most of std::tuple for 
> > > > > > > C++20,
> > > > > > > making use of concepts to replace the enable_if constraints, and 
> > > > > > > using
> > > > > > > conditional explicit to avoid duplicating most constructors. We 
> > > > > > > could
> > > > > > > use conditional explicit for the C++11 implementation too (with 
> > > > > > > pragmas
> > > > > > > to disables the -Wc++17-extensions warnings), but that should be 
> > > > > > > done as
> > > > > > > a stage 1 change for GCC 15 rather than now.
> > > > > > >
> > > > > > > The partial specialization for std::tuple is no longer 
> > > > > > > used for
> > > > > > > C++20 (or more precisely, for a C++20 compiler that supports 
> > > > > > > concepts
> > > > > > > and conditional explicit). The additional constructors and 
> > > > > > > assignment
> > > > > > > operators that take std::pair arguments have been added to the 
> > > > > > > C++20
> > > > > > > implementation of the primary template, with 
> > > > > > > sizeof...(_Elements)==2
> > > > > > > constraints. This avoids reimplementing all the other 
> > > > > > > constructors in
> > > > > > > the std::tuple partial specialization to use concepts. 
> > > > > > > This way
> > > > > > > we avoid four implementations of every constructor and only have 
> > > > > > > three!
> > > > > > > (The primary template has an implementation of each constructor 
> > > > > > > for
> > > > > > > C++11 and another for C++20, and the tuple specialization 
> > > > > > > has an
> > > > > > > implementation of each for C++11, so that's three for each 
> > > > > > > constructor.)
> > > > > > >
> > > > > > > In order to make the constraints more efficient on the C++20 
> > > > > > > version of
> > > > > > > the default constructor I've also added a variable template for 
> > > > > > > the
> > > > > > > __is_implicitly_default_constructible trait, implemented using 
> > > > > > > concepts.
> > > > > > >
> > > > > > > libstdc++-v3/ChangeLog:
> > > > > > >
> > > > > > >   PR libstdc++/108822
> > > > > > >   * include/std/tuple (tuple): Add checks for dangling 
> > > > > > > references.
> > > > > > >   Reimplement constraints and constant expressions using C++20
> > > > > > >   features.
> > > > > > >   * include/std/type_traits [C++20]
> > > > > > >   (__is_implicitly_default_constructible_v): Define.
> > > > > > >   (__is_implicitly_default_constructible): Use variable 
> > > > > > > template.
> > > > > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > > > > ---
> > > > > > >  libstdc++-v3/include/std/tuple| 1021 
> > > > > > > -
> > > > > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > > > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > > > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > > > > >  create mode 100644 
> > > > > > > libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > > > > >
> > > > > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > > > > b/libstdc++-v3/include/std/tuple
> > > > > > > index 50e11843757..cd05b638923 100644
> > > > > > > --- a/libstdc++-v3/include/std/tuple
> > > > > > > +++ b/libstdc++-v3/include/std/tuple
> > > > > > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > > > >template
> > > > > > >  class tuple : public _Tuple_impl<0, _Elements...>
> > > > > > >  {
> > > > > > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > > > > > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> > > > > > >
> > > > > > >template
> > > > > > >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
> > > > > >
> > > > > > I guess this should be moved into the #else branch if it's not used 
> > > > > > in
> > > > > > the ne

[committed] testsuite: Add testcase for already fixed PR [PR113048]

2024-01-15 Thread Jakub Jelinek
Hi!

The ICE on this testcase was fixed by r14-7141.

Tested on x86_64-linux -m32/-m64 with current trunk as well as older
trunk which still ICEd, committed to trunk as obvious.

2024-01-15  Jakub Jelinek  

PR rtl-optimization/113048
* gcc.target/i386/pr113048.c: New test.

--- gcc/testsuite/gcc.target/i386/pr113048.c.jj 2024-01-15 17:25:50.186423929 
+0100
+++ gcc/testsuite/gcc.target/i386/pr113048.c2024-01-15 17:25:41.643539577 
+0100
@@ -0,0 +1,26 @@
+/* PR rtl-optimization/113048 */
+/* { dg-do compile } */
+/* { dg-options "-O -march=cascadelake -fwrapv" } */
+
+signed char a, b, c;
+int d;
+const char *e, *q;
+short f;
+int g;
+
+void
+foo (int x, long long y, long long z)
+{
+  unsigned char h = x;
+  int i = __builtin_strncmp (q, e, 2);
+  h /= g;
+  unsigned long long j = (~z & (0xfb5856dd8a4d4702ULL & f) / 0) * h;   /* { 
dg-warning "division by zero" } */
+  b += __builtin_add_overflow_p (d, c, 0);
+  signed char k = y;
+  long l = -k & sizeof (0);
+  long long m = y + j + z + h + 3 + l;
+  int n = m + i;
+  short o = n + f;
+  signed char p = o + h + k;
+  a = p;
+}

Jakub



Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 10:06 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jan 15, 2024 at 02:54:26PM +, Qing Zhao wrote:
>> So, before gimplification,  when inserting tree node, we don’t need manually
>> add unshare_expr since the gimplification will automatically unshare nodes. 
> 
> There are cases where unshare_expr is needed even then, such as the uses in
> the sanitizer, because code is then modifying suboperands in place later on
> and if things are shared bad things happen.

for my case, it’s in bound sanitizer, and the instrumentation happens 
during “c_genericize”, which seems before gimplfication. 

So,  when adding instrumentation for bound sanitizer, we still need to 
manually unshare expr even it’s before gimpflication?


If trees can be shared until
> they are unshared before gimplification, one doesn't need to worry about it,
> sure.
> 
>> However, during or after gimplfication, when inserting nodes, we should 
>> manually
>> add unshare_expr when we put the same “tree” into multiple operands.
> 
> Yes.
> 
>>> Using a SAVE_EXPR avoids redundant code but it also requires
>>> that the SAVE_EXPR uses are ordered.
>> 
>> “Require the SAVE_EXPR uses are ordered”, does this mean that 
>> SAVE_EXPRs for the same node should be in a correct order? Or something else?
> 
> The basic requirement is that SAVE_EXPR is evaluated somewhere in a code
> which dominates all other uses of the SAVE_EXPR.
> Say
> SAVE_EXPR , if (x) use1 (SAVE_EXPR ); 
> else use2 (SAVE_EXPR );
> is fine, but
> if (x) use1 (SAVE_EXPR ); else use2 (SAVE_EXPR 
> );
> is not.  Because in the latter case, it will be gimplified into evaluating
> the complex expression in the conditional code guarded on if (x != 0), save
> into some temporary variable and then in the else code just use that
> temporary variable, except it is uninitialized then.

Okay, I see.

Is there utility tool to check for any violation of this order? Or I have to 
manually check the order myself?

Thanks a lot for the help.

Qing
> 
>   Jakub
> 



Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 3:13 AM, Eric Botcazou  wrote:
> 
>> Okay, so, the "unsharing everything” is done automatically by the compiler
>> before gimplification? 
> 
> See the blurb at gimplify.cc:835 and below about this.

Thanks a lot for the info.  (I read this paragraph before sending the 
questions…)

Qing
> 
> -- 
> Eric Botcazou
> 
> 



Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-15 Thread Jonathan Wakely
On Mon, 15 Jan 2024 at 16:27, Patrick Palka  wrote:
>
> On Sat, 13 Jan 2024, Jonathan Wakely wrote:
>
> > On Sat, 13 Jan 2024 at 00:06, Patrick Palka  wrote:
> > >
> > > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> > >
> > > > On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
> > > > >
> > > > > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> > > > >
> > > > > > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > > > > > >
> > > > > > > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > > > > > > >
> > > > > > > > -- >8 --
> > > > > > > >
> > > > > > > > This is the last part of PR libstdc++/108822 implementing 
> > > > > > > > P2255R2, which
> > > > > > > > makes it ill-formed to create a std::tuple that would bind a 
> > > > > > > > reference
> > > > > > > > to a temporary.
> > > > > > > >
> > > > > > > > The dangling checks are implemented as deleted constructors for 
> > > > > > > > C++20
> > > > > > > > and higher, and as Debug Mode static assertions in the 
> > > > > > > > constructor body
> > > > > > > > for older standards. This is similar to the 
> > > > > > > > r13-6084-g916ce577ad109b
> > > > > > > > changes for std::pair.
> > > > > > > >
> > > > > > > > As part of this change, I've reimplemented most of std::tuple 
> > > > > > > > for C++20,
> > > > > > > > making use of concepts to replace the enable_if constraints, 
> > > > > > > > and using
> > > > > > > > conditional explicit to avoid duplicating most constructors. We 
> > > > > > > > could
> > > > > > > > use conditional explicit for the C++11 implementation too (with 
> > > > > > > > pragmas
> > > > > > > > to disables the -Wc++17-extensions warnings), but that should 
> > > > > > > > be done as
> > > > > > > > a stage 1 change for GCC 15 rather than now.
> > > > > > > >
> > > > > > > > The partial specialization for std::tuple is no longer 
> > > > > > > > used for
> > > > > > > > C++20 (or more precisely, for a C++20 compiler that supports 
> > > > > > > > concepts
> > > > > > > > and conditional explicit). The additional constructors and 
> > > > > > > > assignment
> > > > > > > > operators that take std::pair arguments have been added to the 
> > > > > > > > C++20
> > > > > > > > implementation of the primary template, with 
> > > > > > > > sizeof...(_Elements)==2
> > > > > > > > constraints. This avoids reimplementing all the other 
> > > > > > > > constructors in
> > > > > > > > the std::tuple partial specialization to use concepts. 
> > > > > > > > This way
> > > > > > > > we avoid four implementations of every constructor and only 
> > > > > > > > have three!
> > > > > > > > (The primary template has an implementation of each constructor 
> > > > > > > > for
> > > > > > > > C++11 and another for C++20, and the tuple 
> > > > > > > > specialization has an
> > > > > > > > implementation of each for C++11, so that's three for each 
> > > > > > > > constructor.)
> > > > > > > >
> > > > > > > > In order to make the constraints more efficient on the C++20 
> > > > > > > > version of
> > > > > > > > the default constructor I've also added a variable template for 
> > > > > > > > the
> > > > > > > > __is_implicitly_default_constructible trait, implemented using 
> > > > > > > > concepts.
> > > > > > > >
> > > > > > > > libstdc++-v3/ChangeLog:
> > > > > > > >
> > > > > > > >   PR libstdc++/108822
> > > > > > > >   * include/std/tuple (tuple): Add checks for dangling 
> > > > > > > > references.
> > > > > > > >   Reimplement constraints and constant expressions using 
> > > > > > > > C++20
> > > > > > > >   features.
> > > > > > > >   * include/std/type_traits [C++20]
> > > > > > > >   (__is_implicitly_default_constructible_v): Define.
> > > > > > > >   (__is_implicitly_default_constructible): Use variable 
> > > > > > > > template.
> > > > > > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > > > > > ---
> > > > > > > >  libstdc++-v3/include/std/tuple| 1021 
> > > > > > > > -
> > > > > > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > > > > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > > > > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > > > > > >  create mode 100644 
> > > > > > > > libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > > > > > >
> > > > > > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > > > > > b/libstdc++-v3/include/std/tuple
> > > > > > > > index 50e11843757..cd05b638923 100644
> > > > > > > > --- a/libstdc++-v3/include/std/tuple
> > > > > > > > +++ b/libstdc++-v3/include/std/tuple
> > > > > > > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > > > > >template
> > > > > > > >  class tuple : public _Tuple_impl<0, _Elements...>
> > > > > > > >  {
> > > > > > > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > > > > > > +   

Re: [PATCH v2] libstdc++: Update tzdata to 2023d

2024-01-15 Thread Jonathan Wakely
On Sat, 13 Jan 2024 at 11:18, Jonathan Wakely wrote:
>
> On Fri, 12 Jan 2024 at 22:59, Jonathan Wakely wrote:
> >
> > It would be good to update the bundled tzdata for GCC 14.1 and 13.3
>
> The expiry date for the hardcoded leapseconds list should be updated
> too, as there's a new date in the file in the tzdata distro. There are
> no new leap seconds though, just a new "this list is valid until ..."
> date.
>
> Tested x86_64-linux and aarch64-linux.

Pushed to trunk. GCC 13 backport to follow.



[committed] libstdc++: Use variable template to fix -fconcepts-ts error [PR113366]

2024-01-15 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

There's an error for -fconcepts-ts due to using a concept where a bool
NTTP is required, which is fixed by using the vraiable template that
already exists in the class scope.

This doesn't fix the problem with -fconcepts-ts as changes to the
placement of attributes is also needed.

libstdc++-v3/ChangeLog:

PR testsuite/113366
* include/std/format (basic_format_arg): Use __formattable
variable template instead of __format::__formattable_with
concept.
---
 libstdc++-v3/include/std/format | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 540f8b805f8..efc4a17ba36 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3189,8 +3189,7 @@ namespace __format
// Format as const if possible, to reduce instantiations.
template
  using __maybe_const_t
-   = __conditional_t<__format::__formattable_with<_Tp, _Context>,
- const _Tp, _Tp>;
+   = __conditional_t<__formattable<_Tp>, const _Tp, _Tp>;
 
template
  static void
@@ -3208,7 +3207,7 @@ namespace __format
  explicit
  handle(_Tp& __val) noexcept
  {
-   if constexpr (!__format::__formattable_with)
+   if constexpr (!__formattable)
  static_assert(!is_const_v<_Tp>, "std::format argument must be "
  "non-const for this type");
 
-- 
2.43.0



Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-15 Thread Radek Barton
Wrong attachment, sorry.


v4-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch
Description: v4-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch


Re: [PATCH] libsupc++: Fix UB terminating on foreign exception

2024-01-15 Thread Julia DeMille

Some more info:
On 2024-01-14 21:39, Julia DeMille wrote:
I've gotten this to work, and run into an unexpected situation. 
Something about the personality routine is causing a SIGABRT. 
Investigating further.
This occurs due to an assertion in _Unwind_SetGR. Seemingly, the 
compiler intrinsic `__builtin_eh_return_data_regno` is doing something 
it *really* should not. I'm not a compiler developer, and have no clue 
how to investigate this.


This issue does not occur with Rust.

Additionally, LLVM's libc++abi manages not only to cleanly handle a Rust 
panic, but also, through some voodoo magic that took me by surprise, 
recognize Objective-C exceptions (and provide info on them) in its 
terminate handler. Perhaps due to Objective-C++? Hell if I know.


Thought it was worth mentioning that other implementations *have* gotten 
this working, though.

--
Thanks,
Julia DeMille
she/her



Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-15 Thread Vladimir Makarov



On 1/15/24 07:56, Maxim Kuvyrkov wrote:

Hi Vladimir,
Hi Jeff,

Richard and Alexander have reviewed this patch and [I assume] have no 
further comments.  OK to merge?



I trust Richard and Alexander therefore I did not do additional review 
of the patches and have no any comment.  Richard's or Alexander's 
approval is enough for comitting the patches.





[PATCH] libstdc++: Implement P2836R1 changes to const_iterator

2024-01-15 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (const_iterator): Define
conversion operators as per P2836R1.
* include/bits/version.def (ranges_as_const): Update value.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/const_iterator/1.cc (test04): New test.
* testsuite/std/ranges/adaptors/as_const/1.cc: Adjust expected
value of __cpp_lib_ranges_as_const.
* testsuite/std/ranges/version_c++23.cc: Likewise.
---
 libstdc++-v3/include/bits/stl_iterator.h  | 12 ++
 libstdc++-v3/include/bits/version.def |  2 +-
 libstdc++-v3/include/bits/version.h   |  4 ++--
 .../24_iterators/const_iterator/1.cc  | 22 +++
 .../std/ranges/adaptors/as_const/1.cc |  2 +-
 .../testsuite/std/ranges/version_c++23.cc |  2 +-
 6 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 6434ef64750..d71a793e10d 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2775,6 +2775,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   noexcept(noexcept(_M_current == __s))
   { return _M_current == __s; }
 
+template<__detail::__not_a_const_iterator _CIt>
+  requires __detail::__constant_iterator<_CIt> && convertible_to<_It, _CIt>
+constexpr
+operator _CIt() const&
+{ return _M_current; }
+
+template<__detail::__not_a_const_iterator _CIt>
+  requires __detail::__constant_iterator<_CIt> && convertible_to<_It, _CIt>
+constexpr
+operator _CIt() &&
+{ return std::move(_M_current); }
+
 constexpr bool
 operator<(const basic_const_iterator& __y) const
 noexcept(noexcept(_M_current < __y._M_current))
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 21cdc65121b..afbec6c3e6a 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1548,7 +1548,7 @@ ftms = {
 ftms = {
   name = ranges_as_const;
   values = {
-v = 202207;
+v = 202311;
 cxxmin = 23;
   };
 };
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index f8dd16416a4..9688b246ef4 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1875,9 +1875,9 @@
 // from version.def line 1549
 #if !defined(__cpp_lib_ranges_as_const)
 # if (__cplusplus >= 202100L)
-#  define __glibcxx_ranges_as_const 202207L
+#  define __glibcxx_ranges_as_const 202311L
 #  if defined(__glibcxx_want_all) || defined(__glibcxx_want_ranges_as_const)
-#   define __cpp_lib_ranges_as_const 202207L
+#   define __cpp_lib_ranges_as_const 202311L
 #  endif
 # endif
 #endif /* !defined(__cpp_lib_ranges_as_const) && 
defined(__glibcxx_want_ranges_as_const) */
diff --git a/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc 
b/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
index 8b74d110fdf..fe952bfad14 100644
--- a/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
+++ b/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
@@ -1,6 +1,7 @@
 // { dg-do run { target c++23 } }
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -97,6 +98,26 @@ test03()
 std::unreachable_sentinel_t> );
 }
 
+void
+test04()
+{
+  // Example from P2836R1
+  auto f = [](std::vector::const_iterator i) {};
+
+  auto v = std::vector();
+  {
+auto i1 = ranges::cbegin(v); // returns vector::const_iterator
+f(i1); // okay
+  }
+
+  auto t = v | std::views::take_while([](int const x) { return x < 100; });
+  {
+auto i2 = ranges::cbegin(t); // returns 
basic_const_iterator::iterator>
+f(i2); // was an error in C++23 before P2836R1
+f(std::move(i2)); // same
+  }
+}
+
 int
 main()
 {
@@ -136,4 +157,5 @@ main()
   test02, true>();
 
   test03();
+  test04();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
index 2d36e0a4712..c36786a8c5f 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
@@ -3,7 +3,7 @@
 
 #include 
 
-#if __cpp_lib_ranges_as_const != 202207L
+#if __cpp_lib_ranges_as_const != 202311L
 # error "Feature-test macro __cpp_lib_ranges_as_const has wrong value in 
"
 #endif
 
diff --git a/libstdc++-v3/testsuite/std/ranges/version_c++23.cc 
b/libstdc++-v3/testsuite/std/ranges/version_c++23.cc
index 823264f32aa..d475d3dc114 100644
--- a/libstdc++-v3/testsuite/std/ranges/version_c++23.cc
+++ b/libstdc++-v3/testsuite/std/ranges/version_c++23.cc
@@ -45,7 +45,7 @@
 # error "Feature-test macro __cpp_lib_ranges_as_rvalue has wrong value in 
"
 #endif
 
-#if __cpp_lib_ranges_as_const != 202207L
+#if __cpp_lib_ranges_as_const != 202311L
 # error "Feature-te

Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-15 Thread Jonathan Wakely
On Mon, 15 Jan 2024 at 16:51, Jonathan Wakely  wrote:
>
> On Mon, 15 Jan 2024 at 16:27, Patrick Palka  wrote:
> >
> > On Sat, 13 Jan 2024, Jonathan Wakely wrote:
> >
> > > On Sat, 13 Jan 2024 at 00:06, Patrick Palka  wrote:
> > > >
> > > > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> > > >
> > > > > On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
> > > > > >
> > > > > > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> > > > > >
> > > > > > > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > > > > > > >
> > > > > > > > > I'd like to commit this to trunk for GCC 14. Please take a 
> > > > > > > > > look.
> > > > > > > > >
> > > > > > > > > -- >8 --
> > > > > > > > >
> > > > > > > > > This is the last part of PR libstdc++/108822 implementing 
> > > > > > > > > P2255R2, which
> > > > > > > > > makes it ill-formed to create a std::tuple that would bind a 
> > > > > > > > > reference
> > > > > > > > > to a temporary.
> > > > > > > > >
> > > > > > > > > The dangling checks are implemented as deleted constructors 
> > > > > > > > > for C++20
> > > > > > > > > and higher, and as Debug Mode static assertions in the 
> > > > > > > > > constructor body
> > > > > > > > > for older standards. This is similar to the 
> > > > > > > > > r13-6084-g916ce577ad109b
> > > > > > > > > changes for std::pair.
> > > > > > > > >
> > > > > > > > > As part of this change, I've reimplemented most of std::tuple 
> > > > > > > > > for C++20,
> > > > > > > > > making use of concepts to replace the enable_if constraints, 
> > > > > > > > > and using
> > > > > > > > > conditional explicit to avoid duplicating most constructors. 
> > > > > > > > > We could
> > > > > > > > > use conditional explicit for the C++11 implementation too 
> > > > > > > > > (with pragmas
> > > > > > > > > to disables the -Wc++17-extensions warnings), but that should 
> > > > > > > > > be done as
> > > > > > > > > a stage 1 change for GCC 15 rather than now.
> > > > > > > > >
> > > > > > > > > The partial specialization for std::tuple is no 
> > > > > > > > > longer used for
> > > > > > > > > C++20 (or more precisely, for a C++20 compiler that supports 
> > > > > > > > > concepts
> > > > > > > > > and conditional explicit). The additional constructors and 
> > > > > > > > > assignment
> > > > > > > > > operators that take std::pair arguments have been added to 
> > > > > > > > > the C++20
> > > > > > > > > implementation of the primary template, with 
> > > > > > > > > sizeof...(_Elements)==2
> > > > > > > > > constraints. This avoids reimplementing all the other 
> > > > > > > > > constructors in
> > > > > > > > > the std::tuple partial specialization to use 
> > > > > > > > > concepts. This way
> > > > > > > > > we avoid four implementations of every constructor and only 
> > > > > > > > > have three!
> > > > > > > > > (The primary template has an implementation of each 
> > > > > > > > > constructor for
> > > > > > > > > C++11 and another for C++20, and the tuple 
> > > > > > > > > specialization has an
> > > > > > > > > implementation of each for C++11, so that's three for each 
> > > > > > > > > constructor.)
> > > > > > > > >
> > > > > > > > > In order to make the constraints more efficient on the C++20 
> > > > > > > > > version of
> > > > > > > > > the default constructor I've also added a variable template 
> > > > > > > > > for the
> > > > > > > > > __is_implicitly_default_constructible trait, implemented 
> > > > > > > > > using concepts.
> > > > > > > > >
> > > > > > > > > libstdc++-v3/ChangeLog:
> > > > > > > > >
> > > > > > > > >   PR libstdc++/108822
> > > > > > > > >   * include/std/tuple (tuple): Add checks for dangling 
> > > > > > > > > references.
> > > > > > > > >   Reimplement constraints and constant expressions using 
> > > > > > > > > C++20
> > > > > > > > >   features.
> > > > > > > > >   * include/std/type_traits [C++20]
> > > > > > > > >   (__is_implicitly_default_constructible_v): Define.
> > > > > > > > >   (__is_implicitly_default_constructible): Use variable 
> > > > > > > > > template.
> > > > > > > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > > > > > > ---
> > > > > > > > >  libstdc++-v3/include/std/tuple| 1021 
> > > > > > > > > -
> > > > > > > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > > > > > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > > > > > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > > > > > > >  create mode 100644 
> > > > > > > > > libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > > > > > > >
> > > > > > > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > > > > > > b/libstdc++-v3/include/std/tuple
> > > > > > > > > index 50e11843757..cd05b638923 100644
> > > > > > > > > --- a/libstdc++-v3/include/std/tuple
> > > > > > > > > +++ b/libstdc++-v3/include/std/tuple
> >

Re: [PATCH] libstdc++: Implement P2836R1 changes to const_iterator

2024-01-15 Thread Jonathan Wakely
On Mon, 15 Jan 2024 at 18:50, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?

OK for both, thanks.


>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_iterator.h (const_iterator): Define
> conversion operators as per P2836R1.
> * include/bits/version.def (ranges_as_const): Update value.
> * include/bits/version.h: Regenerate.
> * testsuite/24_iterators/const_iterator/1.cc (test04): New test.
> * testsuite/std/ranges/adaptors/as_const/1.cc: Adjust expected
> value of __cpp_lib_ranges_as_const.
> * testsuite/std/ranges/version_c++23.cc: Likewise.
> ---
>  libstdc++-v3/include/bits/stl_iterator.h  | 12 ++
>  libstdc++-v3/include/bits/version.def |  2 +-
>  libstdc++-v3/include/bits/version.h   |  4 ++--
>  .../24_iterators/const_iterator/1.cc  | 22 +++
>  .../std/ranges/adaptors/as_const/1.cc |  2 +-
>  .../testsuite/std/ranges/version_c++23.cc |  2 +-
>  6 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> b/libstdc++-v3/include/bits/stl_iterator.h
> index 6434ef64750..d71a793e10d 100644
> --- a/libstdc++-v3/include/bits/stl_iterator.h
> +++ b/libstdc++-v3/include/bits/stl_iterator.h
> @@ -2775,6 +2775,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>noexcept(noexcept(_M_current == __s))
>{ return _M_current == __s; }
>
> +template<__detail::__not_a_const_iterator _CIt>
> +  requires __detail::__constant_iterator<_CIt> && convertible_to<_It, 
> _CIt>
> +constexpr
> +operator _CIt() const&
> +{ return _M_current; }
> +
> +template<__detail::__not_a_const_iterator _CIt>
> +  requires __detail::__constant_iterator<_CIt> && convertible_to<_It, 
> _CIt>
> +constexpr
> +operator _CIt() &&
> +{ return std::move(_M_current); }
> +
>  constexpr bool
>  operator<(const basic_const_iterator& __y) const
>  noexcept(noexcept(_M_current < __y._M_current))
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index 21cdc65121b..afbec6c3e6a 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -1548,7 +1548,7 @@ ftms = {
>  ftms = {
>name = ranges_as_const;
>values = {
> -v = 202207;
> +v = 202311;
>  cxxmin = 23;
>};
>  };
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index f8dd16416a4..9688b246ef4 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -1875,9 +1875,9 @@
>  // from version.def line 1549
>  #if !defined(__cpp_lib_ranges_as_const)
>  # if (__cplusplus >= 202100L)
> -#  define __glibcxx_ranges_as_const 202207L
> +#  define __glibcxx_ranges_as_const 202311L
>  #  if defined(__glibcxx_want_all) || defined(__glibcxx_want_ranges_as_const)
> -#   define __cpp_lib_ranges_as_const 202207L
> +#   define __cpp_lib_ranges_as_const 202311L
>  #  endif
>  # endif
>  #endif /* !defined(__cpp_lib_ranges_as_const) && 
> defined(__glibcxx_want_ranges_as_const) */
> diff --git a/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc 
> b/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
> index 8b74d110fdf..fe952bfad14 100644
> --- a/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
> +++ b/libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
> @@ -1,6 +1,7 @@
>  // { dg-do run { target c++23 } }
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -97,6 +98,26 @@ test03()
>  std::unreachable_sentinel_t> );
>  }
>
> +void
> +test04()
> +{
> +  // Example from P2836R1
> +  auto f = [](std::vector::const_iterator i) {};
> +
> +  auto v = std::vector();
> +  {
> +auto i1 = ranges::cbegin(v); // returns vector::const_iterator
> +f(i1); // okay
> +  }
> +
> +  auto t = v | std::views::take_while([](int const x) { return x < 100; });
> +  {
> +auto i2 = ranges::cbegin(t); // returns 
> basic_const_iterator::iterator>
> +f(i2); // was an error in C++23 before P2836R1
> +f(std::move(i2)); // same
> +  }
> +}
> +
>  int
>  main()
>  {
> @@ -136,4 +157,5 @@ main()
>test02, true>();
>
>test03();
> +  test04();
>  }
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
> index 2d36e0a4712..c36786a8c5f 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc
> @@ -3,7 +3,7 @@
>
>  #include 
>
> -#if __cpp_lib_ranges_as_const != 202207L
> +#if __cpp_lib_ranges_as_const != 202311L
>  # error "Feature-test macro __cpp_lib_ranges_as_const has wrong value in 
> "
>  #endif
>
> diff --git a/libstdc++-v3/testsuite/std/ranges/version_c++23.cc 
> b/libstdc++-v3/testsuite/std/ranges/version_c++23.cc
> index 82326

[PATCH] Remove --save-temps from some compile tests

2024-01-15 Thread H.J. Lu
--save-temps is needed to scan assembly outputs for assemble, link and
run tests.  Not all compile tests need --save-temps unless they used to
trigger GCC bugs.  Run --save-temps from compile tests if not needed.

PR testsuite/113369
* g++.dg/abi/ref-temp1.C: Remove --save-temps.
* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
* gcc.dg/debug/dwarf2/pr111080.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-1.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-2.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-3.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-4.c: Likewise.
---
 gcc/testsuite/g++.dg/abi/ref-temp1.C | 1 -
 gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C | 2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c | 2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-1.c| 2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-2.c| 2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-3.c| 2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-4.c| 2 +-
 7 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/g++.dg/abi/ref-temp1.C 
b/gcc/testsuite/g++.dg/abi/ref-temp1.C
index c9963ca62f9..70c9a7a431c 100644
--- a/gcc/testsuite/g++.dg/abi/ref-temp1.C
+++ b/gcc/testsuite/g++.dg/abi/ref-temp1.C
@@ -1,7 +1,6 @@
 // From ABI document
 // { dg-do compile { target c++14 } }
 // { dg-skip-if "No .weak" { { hppa*-*-hpux* } && { ! lp64 } } }
-// { dg-additional-options --save-temps }
 
 struct A { const int (&x)[3]; };
 struct B { const A (&x)[2]; };
diff --git a/gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C 
b/gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C
index 256712937d4..3a725f59a6d 100644
--- a/gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C
+++ b/gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-msse2 -O3 --save-temps" } */
+/* { dg-options "-msse2 -O3" } */
 
 void foo (void)
 {
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
index 3949d7e7c64..617e5e45f9b 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-save-temps -gdwarf-3 -dA" } */
+/* { dg-options "-gdwarf-3 -dA" } */
 
 struct foo {
 int field_number_1;
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-1.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-1.c
index 3dc8e6719bb..0777c1f3ad8 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-1.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-save-temps -gdwarf -dA" } */
+/* { dg-options "-gdwarf -dA" } */
 
 typedef struct _Harry { int dummy; } Harry_t;
 Harry_t harry;
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-2.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-2.c
index abc1dc1e6c1..932c070f162 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-2.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-save-temps -gdwarf -dA" } */
+/* { dg-options "-gdwarf -dA" } */
 
 typedef const struct _Harry { int dummy; } Harry_t;
 Harry_t harry;
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-3.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-3.c
index 78234e93d65..858432aab79 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-3.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-save-temps -gdwarf -dA" } */
+/* { dg-options "-gdwarf -dA" } */
 
 typedef struct _Harry { int dummy; } Harry_t;
 const Harry_t harry[5];
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-4.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-4.c
index 89a048df4a3..57b4c5c3a13 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-4.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr47939-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-save-temps -gdwarf -dA" } */
+/* { dg-options "-gdwarf -dA" } */
 
 typedef const struct _Harry { int dummy; } Harry_t;
 Harry_t harry[10];
-- 
2.43.0



Re: [PATCH] libstdc++: reduce std::variant template instantiation depth

2024-01-15 Thread Patrick Palka
On Sun, Jan 7, 2024 at 3:33 PM Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

Ping.

>
> -- >8 --
>
> The recursively defined constraints on _Variadic_union's user-defined
> destructor (necessary for maintaining trivial destructibility of the
> variant iff all of its alternatives are) effectively require a template
> instantiation depth of 3x the number of variants, with the instantiation
> stack looking like
>
>   ...
>   _Variadic_union
>   std::is_trivially_destructible_v<_Variadic_union>
>   _Variadic_union::~_Variadic_union()
>   _Variadic_union
>   ...
>
> Ideally the template depth should be ~equal to the number of variants
> (plus a constant).  Luckily it seems we don't need to compute trivial
> destructibility of the alternatives at all from _Variadic_union, since
> its only user _Variant_storage already has that information.  To that
> end this patch removes these recursive constraints and instead passes
> this information down from _Variant_storage.  After this patch, the
> template instantiation depth for 87619.cc is ~270 instead of ~780.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/variant (__detail::__variant::_Variadic_union):
> Add bool __trivially_destructible template parameter.
> (__detail::__variant::_Variadic_union::~_Variadic_union):
> Use __trivially_destructible in constraints instead.
> (_Variant_storage): Pass __trivially_destructible value to
> _Variadic_union.
> ---
>  libstdc++-v3/include/std/variant | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 20a76c8aa87..4b9002e0917 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -392,7 +392,7 @@ namespace __variant
>  };
>
>// Defines members and ctors.
> -  template
> +  template
>  union _Variadic_union
>  {
>_Variadic_union() = default;
> @@ -401,8 +401,8 @@ namespace __variant
> _Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
>  };
>
> -  template
> -union _Variadic_union<_First, _Rest...>
> +  template
> +union _Variadic_union<__trivially_destructible, _First, _Rest...>
>  {
>constexpr _Variadic_union() : _M_rest() { }
>
> @@ -427,13 +427,12 @@ namespace __variant
>~_Variadic_union() = default;
>
>constexpr ~_Variadic_union()
> -   requires (!is_trivially_destructible_v<_First>)
> - || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
> +   requires (!__trivially_destructible)
>{ }
>  #endif
>
>_Uninitialized<_First> _M_first;
> -  _Variadic_union<_Rest...> _M_rest;
> +  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
>  };
>
>// _Never_valueless_alt is true for variant alternatives that can
> @@ -514,7 +513,7 @@ namespace __variant
> return this->_M_index != __index_type(variant_npos);
>}
>
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
> @@ -552,7 +551,7 @@ namespace __variant
> return this->_M_index != static_cast<__index_type>(variant_npos);
>}
>
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
> --
> 2.43.0.254.ga26002b628
>



Re: [PATCH] c++: non-dep array list-init w/ non-triv dtor [PR109899]

2024-01-15 Thread Patrick Palka
On Mon, Jan 8, 2024 at 1:40 PM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk/13/12?

Ping.

>
> -- >8 --
>
> The get_target_expr call added in r12-7069-g119cea98f66476 causes us
> for the below testcase to call build_vec_delete in a template context,
> which builds a templated destructor call and checks expr_noexcept_p for
> it, which ICEs because the call has templated form.  Much of the work
> of build_vec_delete however is code generation and thus will just get
> throw away in a template context, including this expr_noexcept_p check
> and the code generation guarded by it.  So this patch narrowly fixes this
> ICE by assuming the expr_noexcept_p call returns true in a template

... returns false, rather.

> context.
>
> PR c++/109899
>
> gcc/cp/ChangeLog:
>
> * init.cc (build_vec_delete_1): Assume expr_noexcept_p is true

.. is false.

> in a template context.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp0x/initlist-array21.C: New test.
> ---
>  gcc/cp/init.cc|  3 ++-
>  gcc/testsuite/g++.dg/cpp0x/initlist-array21.C | 12 
>  2 files changed, 14 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
>
> diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> index 09584719ee6..aa0a35a3885 100644
> --- a/gcc/cp/init.cc
> +++ b/gcc/cp/init.cc
> @@ -4155,7 +4155,8 @@ build_vec_delete_1 (location_t loc, tree base, tree 
> maxindex, tree type,
>
>/* If one destructor throws, keep trying to clean up the rest, unless we're
>   already in a build_vec_init cleanup.  */
> -  if (flag_exceptions && !in_cleanup && !expr_noexcept_p (tmp, tf_none))
> +  if (flag_exceptions && !in_cleanup && !processing_template_decl
> +  && !expr_noexcept_p (tmp, tf_none))
>  {
>loop = build2 (TRY_CATCH_EXPR, void_type_node, loop,
>  unshare_expr (loop));
> diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C 
> b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
> new file mode 100644
> index 000..5e37e3de62a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
> @@ -0,0 +1,12 @@
> +// PR c++/109899
> +// { dg-do compile { target c++11 } }
> +
> +struct A { A(); ~A(); };
> +
> +template 
> +using array = T[42];
> +
> +template
> +void f() {
> +  array{};
> +}
> --
> 2.43.0.254.ga26002b628
>



Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-15 Thread Patrick Palka
On Fri, Jan 5, 2024 at 11:50 AM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk and perhaps 13?

Ping.

>
> -- >8 --
>
> invalid_tparm_referent_p was rejecting using the address of a class NTTP
> object as a template argument, but this should be fine.
>
> PR c++/113242
>
> gcc/cp/ChangeLog:
>
> * pt.cc (invalid_tparm_referent_p) : Suppress
> DECL_ARTIFICIAL rejection test for class NTTP objects.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/nontype-class61.C: New test.
> ---
>  gcc/cp/pt.cc |  3 ++-
>  gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
>  2 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 154ac76cb65..8c7d178328d 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -7219,7 +7219,8 @@ invalid_tparm_referent_p (tree type, tree expr, 
> tsubst_flags_t complain)
>* a string literal (5.13.5),
>* the result of a typeid expression (8.2.8), or
>* a predefined __func__ variable (11.4.1).  */
> -   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
> +   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
> +&& DECL_ARTIFICIAL (decl))
>   {
> if (complain & tf_error)
>   error ("the address of %qD is not a valid template argument",
> diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C 
> b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
> new file mode 100644
> index 000..90805a05ecf
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
> @@ -0,0 +1,27 @@
> +// PR c++/113242
> +// { dg-do compile { target c++20 } }
> +
> +struct wrapper {
> +  int n;
> +};
> +
> +template
> +void f1() {
> +  static_assert(X.n == 42);
> +}
> +
> +template
> +void f2() {
> +  static_assert(X->n == 42);
> +}
> +
> +template
> +void g() {
> +  f1();
> +  f2<&X>();
> +}
> +
> +int main() {
> +  constexpr wrapper X = {42};
> +  g();
> +}
> --
> 2.43.0.254.ga26002b628
>



Re: [PATCH] c++: explicit inst w/ many constrained partial specs [PR104634]

2024-01-15 Thread Patrick Palka
On Wed, Jan 3, 2024 at 1:49 PM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk and perhaps 13?

Ping.

>
> -- >8 --
>
> Here we neglect to emit the definitions of A::f2 and A::f4
> despite the explicit instantiations ultimately because TREE_PUBLIC isn't
> set on the corresponding partial specializations, the declarations of which
> are created from maybe_new_partial_specialization which is responsible for
> disambiguating them from the first and third partial specializations (which
> have the same class-head but different constraints).  This makes grokfndecl
> in turn clear TREE_PUBLIC for f2 and f4 as if they have internal linkage.
>
> This patch fixes this by setting TREE_PUBLIC appropriately for such partial
> specializations.
>
> PR c++/104634
>
> gcc/cp/ChangeLog:
>
> * pt.cc (maybe_new_partial_specialization): Propagate TREE_PUBLIC
> to the newly created partial specialization.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/concepts-explicit-inst6.C: New test.
> ---
>  gcc/cp/pt.cc  |  1 +
>  .../g++.dg/cpp2a/concepts-explicit-inst6.C| 35 +++
>  2 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index e38e7a773f0..154ac76cb65 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -980,6 +980,7 @@ maybe_new_partial_specialization (tree& type)
>DECL_SOURCE_LOCATION (d) = input_location;
>TREE_PRIVATE (d) = (current_access_specifier == access_private_node);
>TREE_PROTECTED (d) = (current_access_specifier == 
> access_protected_node);
> +  TREE_PUBLIC (d) = TREE_PUBLIC (DECL_TEMPLATE_RESULT (tmpl));
>
>set_instantiating_module (d);
>DECL_MODULE_EXPORT_P (d) = DECL_MODULE_EXPORT_P (tmpl);
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C
> new file mode 100644
> index 000..4ac0c65c490
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C
> @@ -0,0 +1,35 @@
> +// PR c++/104634
> +// { dg-do compile { target c++20 } }
> +// { dg-final { scan-assembler "_ZN1AIiE2f1Ev" } }
> +// { dg-final { scan-assembler "_ZN1AIdE2f2Ev" } }
> +// { dg-final { scan-assembler "_ZN1AIPiE2f3Ev" } }
> +// { dg-final { scan-assembler "_ZN1AIPdE2f4Ev" } }
> +
> +template
> +struct A { };
> +
> +template requires __is_same(T, int)
> +struct A {
> +  void f1() { }
> +  static inline int m1;
> +};
> +
> +template requires __is_same(T, double)
> +struct A {
> +  void f2() { }
> +};
> +
> +template requires __is_same(T, int)
> +struct A {
> +  void f3() { }
> +};
> +
> +template requires __is_same(T, double)
> +struct A {
> +  void f4() { }
> +};
> +
> +template struct A;
> +template struct A;
> +template struct A;
> +template struct A;
> --
> 2.43.0.254.ga26002b628
>



Re: [PATCH 2/1] c++: access of class-scope partial tmpl spec

2024-01-15 Thread Patrick Palka
On Wed, Jan 3, 2024 at 3:06 PM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?

Ping.

>
> -- >8 --
>
> Since partial template specializations can't be named directly, access
> control (when declared at class scope) doesn't apply to them, so we
> shouldn't have to set their TREE_PRIVATE / TREE_PROTECTED.  This code was
> added by r10-4833-gcce3c9db9e6ffa for PR92078, but it seems better to
> just disable the relevant access consistency check for partial template
> specializations so that we also accept the below testcase.
>
> gcc/cp/ChangeLog:
>
> * parser.cc (cp_parser_check_access_in_redeclaration): Don't
> check access for a partial specialization.
> * pt.cc (maybe_new_partial_specialization): Don't set TREE_PRIVATE
> or TREE_PROTECTED on the newly created partial specialization.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/partial-specialization14.C: New test.
> ---
>  gcc/cp/parser.cc  |  3 ++-
>  gcc/cp/pt.cc  |  2 --
>  .../g++.dg/template/partial-specialization14.C| 15 +++
>  3 files changed, 17 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/partial-specialization14.C
>
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 37536faf2cf..85da15651b2 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -35062,7 +35062,8 @@ static void
>  cp_parser_check_access_in_redeclaration (tree decl, location_t location)
>  {
>if (!decl
> -  || (!CLASS_TYPE_P (TREE_TYPE (decl))
> +  || (!(CLASS_TYPE_P (TREE_TYPE (decl))
> +   && !CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
>   && TREE_CODE (TREE_TYPE (decl)) != ENUMERAL_TYPE))
>  return;
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 154ac76cb65..afd1df4f3d7 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -978,8 +978,6 @@ maybe_new_partial_specialization (tree& type)
>tree d = create_implicit_typedef (DECL_NAME (tmpl), t);
>DECL_CONTEXT (d) = TYPE_CONTEXT (t);
>DECL_SOURCE_LOCATION (d) = input_location;
> -  TREE_PRIVATE (d) = (current_access_specifier == access_private_node);
> -  TREE_PROTECTED (d) = (current_access_specifier == 
> access_protected_node);
>TREE_PUBLIC (d) = TREE_PUBLIC (DECL_TEMPLATE_RESULT (tmpl));
>
>set_instantiating_module (d);
> diff --git a/gcc/testsuite/g++.dg/template/partial-specialization14.C 
> b/gcc/testsuite/g++.dg/template/partial-specialization14.C
> new file mode 100644
> index 000..ac7bc9ed7f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/partial-specialization14.C
> @@ -0,0 +1,15 @@
> +// Verify we don't care about the access specifier when declaring
> +// a partial template specialization of a member class template.
> +
> +struct A1 {
> +  template struct B { };
> +private:
> +  template struct B { }; // { dg-bogus "different access" }
> +};
> +
> +struct A2 {
> +  template struct B { };
> +  template struct B;
> +private:
> +  template struct B { }; // { dg-bogus "different access" }
> +};
> --
> 2.43.0.254.ga26002b628
>



Re: [PATCH] c++: explicit inst w/ many constrained partial specs [PR104634]

2024-01-15 Thread Jason Merrill

On 1/3/24 13:49, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk and perhaps 13?


OK for both.


-- >8 --

Here we neglect to emit the definitions of A::f2 and A::f4
despite the explicit instantiations ultimately because TREE_PUBLIC isn't
set on the corresponding partial specializations, the declarations of which
are created from maybe_new_partial_specialization which is responsible for
disambiguating them from the first and third partial specializations (which
have the same class-head but different constraints).  This makes grokfndecl
in turn clear TREE_PUBLIC for f2 and f4 as if they have internal linkage.

This patch fixes this by setting TREE_PUBLIC appropriately for such partial
specializations.

PR c++/104634

gcc/cp/ChangeLog:

* pt.cc (maybe_new_partial_specialization): Propagate TREE_PUBLIC
to the newly created partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-explicit-inst6.C: New test.
---
  gcc/cp/pt.cc  |  1 +
  .../g++.dg/cpp2a/concepts-explicit-inst6.C| 35 +++
  2 files changed, 36 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e38e7a773f0..154ac76cb65 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -980,6 +980,7 @@ maybe_new_partial_specialization (tree& type)
DECL_SOURCE_LOCATION (d) = input_location;
TREE_PRIVATE (d) = (current_access_specifier == access_private_node);
TREE_PROTECTED (d) = (current_access_specifier == 
access_protected_node);
+  TREE_PUBLIC (d) = TREE_PUBLIC (DECL_TEMPLATE_RESULT (tmpl));
  
set_instantiating_module (d);

DECL_MODULE_EXPORT_P (d) = DECL_MODULE_EXPORT_P (tmpl);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C
new file mode 100644
index 000..4ac0c65c490
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst6.C
@@ -0,0 +1,35 @@
+// PR c++/104634
+// { dg-do compile { target c++20 } }
+// { dg-final { scan-assembler "_ZN1AIiE2f1Ev" } }
+// { dg-final { scan-assembler "_ZN1AIdE2f2Ev" } }
+// { dg-final { scan-assembler "_ZN1AIPiE2f3Ev" } }
+// { dg-final { scan-assembler "_ZN1AIPdE2f4Ev" } }
+
+template
+struct A { };
+
+template requires __is_same(T, int)
+struct A {
+  void f1() { }
+  static inline int m1;
+};
+
+template requires __is_same(T, double)
+struct A {
+  void f2() { }
+};
+
+template requires __is_same(T, int)
+struct A {
+  void f3() { }
+};
+
+template requires __is_same(T, double)
+struct A {
+  void f4() { }
+};
+
+template struct A;
+template struct A;
+template struct A;
+template struct A;




Re: [PATCH 2/1] c++: access of class-scope partial tmpl spec

2024-01-15 Thread Jason Merrill

On 1/3/24 15:06, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

Since partial template specializations can't be named directly, access
control (when declared at class scope) doesn't apply to them, so we
shouldn't have to set their TREE_PRIVATE / TREE_PROTECTED.  This code was
added by r10-4833-gcce3c9db9e6ffa for PR92078, but it seems better to
just disable the relevant access consistency check for partial template
specializations so that we also accept the below testcase.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_check_access_in_redeclaration): Don't
check access for a partial specialization.
* pt.cc (maybe_new_partial_specialization): Don't set TREE_PRIVATE
or TREE_PROTECTED on the newly created partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/template/partial-specialization14.C: New test.
---
  gcc/cp/parser.cc  |  3 ++-
  gcc/cp/pt.cc  |  2 --
  .../g++.dg/template/partial-specialization14.C| 15 +++
  3 files changed, 17 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/partial-specialization14.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 37536faf2cf..85da15651b2 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -35062,7 +35062,8 @@ static void
  cp_parser_check_access_in_redeclaration (tree decl, location_t location)
  {
if (!decl
-  || (!CLASS_TYPE_P (TREE_TYPE (decl))
+  || (!(CLASS_TYPE_P (TREE_TYPE (decl))
+   && !CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
  && TREE_CODE (TREE_TYPE (decl)) != ENUMERAL_TYPE))
  return;
  
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc

index 154ac76cb65..afd1df4f3d7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -978,8 +978,6 @@ maybe_new_partial_specialization (tree& type)
tree d = create_implicit_typedef (DECL_NAME (tmpl), t);
DECL_CONTEXT (d) = TYPE_CONTEXT (t);
DECL_SOURCE_LOCATION (d) = input_location;
-  TREE_PRIVATE (d) = (current_access_specifier == access_private_node);
-  TREE_PROTECTED (d) = (current_access_specifier == access_protected_node);
TREE_PUBLIC (d) = TREE_PUBLIC (DECL_TEMPLATE_RESULT (tmpl));
  
set_instantiating_module (d);

diff --git a/gcc/testsuite/g++.dg/template/partial-specialization14.C 
b/gcc/testsuite/g++.dg/template/partial-specialization14.C
new file mode 100644
index 000..ac7bc9ed7f1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/partial-specialization14.C
@@ -0,0 +1,15 @@
+// Verify we don't care about the access specifier when declaring
+// a partial template specialization of a member class template.
+
+struct A1 {
+  template struct B { };
+private:
+  template struct B { }; // { dg-bogus "different access" }
+};
+
+struct A2 {
+  template struct B { };
+  template struct B;
+private:
+  template struct B { }; // { dg-bogus "different access" }
+};




Re: [PATCH] RISC-V: Documnet the list of supported extensions

2024-01-15 Thread Bernhard Reutner-Fischer
Hi Kito!

On Thu, 11 Jan 2024 17:06:09 +0800
Kito Cheng  wrote:

> Try to list all supported extensions: name, version and few description
> for each extension.
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi (RISC-V Options): Add list of supported
>   extensions.
> ---
>  gcc/doc/invoke.texi | 463 
>  1 file changed, 463 insertions(+)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 68d1f364ac0..58271f2f28e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -30037,6 +30037,469 @@ Generate code for given RISC-V ISA (e.g.@: 
> @samp{rv64im}).  ISA strings must be
>  lower-case.  Examples include @samp{rv64i}, @samp{rv32g}, @samp{rv32e}, and
>  @samp{rv32imaf}.
>  
> +Supported extension are list below:

are listed

> +@multitable @columnfractions .10 .10 .80
> +@headitem Extension Name @tab Supported Version @tab Description
> +@item i
> +@tab 2.0, 2.1
> +@tab Base integer extension.
> +
> +@item e
> +@tab 2.0
> +@tab Reduced base integer extension.
> +
> +@item g
> +@tab -
> +@tab General-purpose computing base extension, @samp{g} will expand to
> +@samp{i}, @samp{m}, @samp{a}, @samp{f}, @samp{d}, @samp{zicsr} and
> +@samp{zifencei}.
> +
> +@item m
> +@tab 2.0
> +@tab Integer multiplication and division extension.
> +
> +@item a
> +@tab 2.0, 2.1
> +@tab Atomic extension.
> +
> +@item f
> +@tab 2.0, 2.2
> +@tab Single-precision floating-point extension.
> +
> +@item d
> +@tab 2.0, 2.2
> +@tab Double-precision floating-point extension.
> +
> +@item c
> +@tab 2.0
> +@tab Compressed extension.
> +
> +@item h
> +@tab 1.0
> +@tab Hypervisor extension.
> +
> +@item v
> +@tab 1.0
> +@tab Vector extension.
> +
> +@item zicsr
> +@tab 2.0
> +@tab Control and status register access extension.
> +
> +@item zifencei
> +@tab 2.0
> +@tab Instruction-fetch fence extension.
> +
> +@item zicond
> +@tab 1.0
> +@tab Integer conditional operations extension.
> +
> +@item zawrs
> +@tab 1.0
> +@tab Wait-on-reservation-set extension.
> +
> +@item zba
> +@tab 1.0
> +@tab Address calculation extension.
> +
> +@item zbb
> +@tab 1.0
> +@tab Basic bit manipulation extension.
> +
> +@item zbc
> +@tab 1.0
> +@tab Carry-less multiplication extension.
> +
> +@item zbs
> +@tab 1.0
> +@tab Single-bit operation extension.
> +
> +@item zfinx
> +@tab 1.0
> +@tab Single-precision floating-ioint in integer registers extension.

s/ioint/point/g
above and below.

> +
> +@item zdinx
> +@tab 1.0
> +@tab Double-precision floating-ioint in integer registers extension.
> +
> +@item zhinx
> +@tab 1.0
> +@tab Half-precision floating-ioint in integer registers extension.
> +
> +@item zhinxmin
> +@tab 1.0
> +@tab Minimal half-precision floating-ioint in integer registers extension.
> +
> +@item zbkb
> +@tab 1.0
> +@tab Cryptography bit-manipulation extension.
> +
> +@item zbkc
> +@tab 1.0
> +@tab Cryptography carry-less multiply extension.
> +
> +@item zbkx
> +@tab 1.0
> +@tab Cryptography crossbar permutation extension.
> +
> +@item zkne
> +@tab 1.0
> +@tab AES Encryption extension.
> +
> +@item zknd
> +@tab 1.0
> +@tab AES Decryption extension.
> +
> +@item zknh
> +@tab 1.0
> +@tab Hash function extension.
> +
> +@item zkr
> +@tab 1.0
> +@tab Entropy source extension.
> +
> +@item zksed
> +@tab 1.0
> +@tab SM4 block cipher extension.
> +
> +@item zksh
> +@tab 1.0
> +@tab SM3 hash function extension.
> +
> +@item zkt
> +@tab 1.0
> +@tab Data independent execution latency extension.
> +
> +@item zk
> +@tab 1.0
> +@tab Standard scalar cryptography extension.
> +
> +@item zkn
> +@tab 1.0
> +@tab NIST algorithm suite extension.

For @item g you document which extensions this will expand to, do you
want to list the expansions here, too?

ISTM that
https://riscv.org/blog/2021/09/risc-v-cryptography-extensions-task-group-announces-public-review-of-the-scalar-cryptography-extensions/
lists
Zkn – NIST Algorithm Suite (shorthand for Zknd_Zkne_Zknh_Zbkb_Zbkc_Zbkx)
Zks – ShangMi Algorithm Suite  (shorthand for Zksed_Zksh_Zbkb_Zbkc_Zbkx)
Zk – Standard scalar cryptography extension (shorthand for Zkn_Zkt_Zkr)

> +
> +@item zks
> +@tab 1.0
> +@tab ShangMi algorithm suite extension.
> +
> +@item zihintntl
> +@tab 1.0
> +@tab Non-temporal locality hints extension.
> +
> +@item zihintpause
> +@tab 1.0
> +@tab Pause hint extension.
> +
> +@item zicboz
> +@tab 1.0
> +@tab Cache-block zero extension.
> +
> +@item zicbom
> +@tab 1.0
> +@tab Cache-block management extension.
> +
> +@item zicbop
> +@tab 1.0
> +@tab Cache-block prefetch extension.
> +
> +@item ztso
> +@tab 1.0
> +@tab Total store ordering extension.
> +
> +@item zve32x
> +@tab 1.0
> +@tab Vector extensions for embedded processors.
> +
> +@item zve32f
> +@tab 1.0
> +@tab Vector extensions for embedded processors.
> +
> +@item zve64x
> +@tab 1.0
> +@tab Vector extensions for embedded processors.
> +
> +@item zve64f
> +@tab 1.0
> +@tab Vector extensions for embedded processors.
> +
> +@item zve64d
> +@tab 1.0
> +@tab Vector extension

Re: [PATCH] libstdc++: reduce std::variant template instantiation depth

2024-01-15 Thread Jonathan Wakely
On Mon, 15 Jan 2024 at 19:32, Patrick Palka  wrote:
>
> On Sun, Jan 7, 2024 at 3:33 PM Patrick Palka  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> Ping.

Huh, I thought I'd already approved this ... sorry.

OK for trunk, with the -ftemplate-depth test change too.


>
> >
> > -- >8 --
> >
> > The recursively defined constraints on _Variadic_union's user-defined
> > destructor (necessary for maintaining trivial destructibility of the
> > variant iff all of its alternatives are) effectively require a template
> > instantiation depth of 3x the number of variants, with the instantiation
> > stack looking like
> >
> >   ...
> >   _Variadic_union
> >   std::is_trivially_destructible_v<_Variadic_union>
> >   _Variadic_union::~_Variadic_union()
> >   _Variadic_union
> >   ...
> >
> > Ideally the template depth should be ~equal to the number of variants
> > (plus a constant).  Luckily it seems we don't need to compute trivial
> > destructibility of the alternatives at all from _Variadic_union, since
> > its only user _Variant_storage already has that information.  To that
> > end this patch removes these recursive constraints and instead passes
> > this information down from _Variant_storage.  After this patch, the
> > template instantiation depth for 87619.cc is ~270 instead of ~780.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/variant (__detail::__variant::_Variadic_union):
> > Add bool __trivially_destructible template parameter.
> > (__detail::__variant::_Variadic_union::~_Variadic_union):
> > Use __trivially_destructible in constraints instead.
> > (_Variant_storage): Pass __trivially_destructible value to
> > _Variadic_union.
> > ---
> >  libstdc++-v3/include/std/variant | 15 +++
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/variant 
> > b/libstdc++-v3/include/std/variant
> > index 20a76c8aa87..4b9002e0917 100644
> > --- a/libstdc++-v3/include/std/variant
> > +++ b/libstdc++-v3/include/std/variant
> > @@ -392,7 +392,7 @@ namespace __variant
> >  };
> >
> >// Defines members and ctors.
> > -  template
> > +  template
> >  union _Variadic_union
> >  {
> >_Variadic_union() = default;
> > @@ -401,8 +401,8 @@ namespace __variant
> > _Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
> >  };
> >
> > -  template
> > -union _Variadic_union<_First, _Rest...>
> > +  template > _Rest>
> > +union _Variadic_union<__trivially_destructible, _First, _Rest...>
> >  {
> >constexpr _Variadic_union() : _M_rest() { }
> >
> > @@ -427,13 +427,12 @@ namespace __variant
> >~_Variadic_union() = default;
> >
> >constexpr ~_Variadic_union()
> > -   requires (!is_trivially_destructible_v<_First>)
> > - || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
> > +   requires (!__trivially_destructible)
> >{ }
> >  #endif
> >
> >_Uninitialized<_First> _M_first;
> > -  _Variadic_union<_Rest...> _M_rest;
> > +  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
> >  };
> >
> >// _Never_valueless_alt is true for variant alternatives that can
> > @@ -514,7 +513,7 @@ namespace __variant
> > return this->_M_index != __index_type(variant_npos);
> >}
> >
> > -  _Variadic_union<_Types...> _M_u;
> > +  _Variadic_union _M_u;
> >using __index_type = __select_index<_Types...>;
> >__index_type _M_index;
> >  };
> > @@ -552,7 +551,7 @@ namespace __variant
> > return this->_M_index != static_cast<__index_type>(variant_npos);
> >}
> >
> > -  _Variadic_union<_Types...> _M_u;
> > +  _Variadic_union _M_u;
> >using __index_type = __select_index<_Types...>;
> >__index_type _M_index;
> >  };
> > --
> > 2.43.0.254.ga26002b628
> >
>



[PATCH v3] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-15 Thread Jonathan Wakely
I think I'm happy with this now. It has tests for all the new functions,
and the performance of the charset alias match algorithm is improved by
reusing part of .

Tested x86_64-linux.

-- >8 --

This is another C++26 change, approved in Varna 2022. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.

libstdc++-v3/ChangeLog:

PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
---
 libstdc++-v3/acinclude.m4 |  30 +-
 libstdc++-v3/config.h.in  |   3 +
 libstdc++-v3/configure|  70 +-
 libstdc++-v3/configure.ac |   3 +
 libstdc++-v3/include/Makefile.am  |   2 +
 libstdc++-v3/include/Makefile.in  |   2 +
 libstdc++-v3/include/bits/locale_classes.h|  14 +
 .../include/bits/text_encoding-data.h | 902 ++
 libstdc++-v3/include/bits/unicode.h   |  53 +-
 libstdc++-v3/include/bits/version.def |  10 +
 libstdc++-v3/include/bits/version.h   |  13 +-
 libstdc++-v3/include/std/text_encoding| 704 ++
 libstdc++-v3/python/libstdcxx/v6/printers.py  |  17 +
 .../scripts/gen_text_encoding_data.py |  70 ++
 libstdc++-v3/src/Makefile.am  |   3 +-
 libstdc++-v3/src/Makefile.in  |   7 +-
 libstdc++-v3/src/c++26/Makefile.am| 109 +++
 libstdc++-v3/src/c++26/Makefile.in| 747 +++
 libstdc++-v3/src/c++26/text_encoding.cc   |  91 ++
 libstdc++-v3/src/experimental/Makefile.am |   2 +
 libstdc++-v3/src/experimental/Makefile.in |   2 +
 .../testsuite/22_locale/locale/encoding.cc|  36 +
 .../ext/unicode/charset_alias_match.cc|  18 +
 .../testsuite/std/text_encoding/cons.cc   | 113 +++
 .../testsuite/std/text_encoding/members.cc|  41 +
 .../std/text_encoding/requirements.cc |  31 +
 26 files changed, 3083 insertions(+), 10 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/text_encoding-data.h
 create mode 100644 libstdc++-v3/include/std/text_encoding
 create mode 100755 libstdc++-v3/scripts/gen_text_encoding_data.py
 create mode 100644 libstdc++-v3/src/c++26/Makefile.am
 create mode 100644 libstdc++-v3/src/c++26/Makefile.in
 create mode 100644 libstdc++-v3/src/c++26/text_encoding.cc
 create mode 100644 libstdc++-v3/testsuite/22_locale/locale/encoding.cc
 create mode 100644 libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc
 create mode 100644 libstdc++-v3/testsuite/std/text_encoding/cons.cc
 create mode 100644 libstdc++-v3/testsuite/std/text_encoding/members.cc
 create mode 100644 libstdc++-v3/testsuite/std/text_encoding/requirements.cc

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e7cbf0fcf96..f9ba7ef744b 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
   # Keep these sync'd with the list in Makefile.am.  The first provides an
   # expandable list at autoconf time; the second provides an expandable list
   # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/filesystem src/libbacktrace src/experimental 
doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/c++26 src/filesystem src/

Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-15 Thread Jason Merrill

On 1/5/24 11:50, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk and perhaps 13?

-- >8 --

invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.


Hmm, I suppose so; https://eel.is/c++draft/temp#param-8 saying "No two 
template parameter objects are template-argument-equivalent" suggests 
there can be only one.  And clang/msvc allow it.



PR c++/113242

gcc/cp/ChangeLog:

* pt.cc (invalid_tparm_referent_p) : Suppress
DECL_ARTIFICIAL rejection test for class NTTP objects.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class61.C: New test.
---
  gcc/cp/pt.cc |  3 ++-
  gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 154ac76cb65..8c7d178328d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7219,7 +7219,8 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
   * a string literal (5.13.5),
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
+&& DECL_ARTIFICIAL (decl))


If now some artificial variables are OK and others are not, perhaps we 
should enumerate them either way and abort if it's one we haven't 
specifically considered.


Jason



Re: [PATCH] c++: non-dep array list-init w/ non-triv dtor [PR109899]

2024-01-15 Thread Jason Merrill

On 1/8/24 13:40, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13/12?


OK.


-- >8 --

The get_target_expr call added in r12-7069-g119cea98f66476 causes us
for the below testcase to call build_vec_delete in a template context,
which builds a templated destructor call and checks expr_noexcept_p for
it, which ICEs because the call has templated form.  Much of the work
of build_vec_delete however is code generation and thus will just get
throw away in a template context, including this expr_noexcept_p check
and the code generation guarded by it.  So this patch narrowly fixes this
ICE by assuming the expr_noexcept_p call returns true in a template
context.

PR c++/109899

gcc/cp/ChangeLog:

* init.cc (build_vec_delete_1): Assume expr_noexcept_p is true
in a template context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-array21.C: New test.
---
  gcc/cp/init.cc|  3 ++-
  gcc/testsuite/g++.dg/cpp0x/initlist-array21.C | 12 
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-array21.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 09584719ee6..aa0a35a3885 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4155,7 +4155,8 @@ build_vec_delete_1 (location_t loc, tree base, tree 
maxindex, tree type,
  
/* If one destructor throws, keep trying to clean up the rest, unless we're

   already in a build_vec_init cleanup.  */
-  if (flag_exceptions && !in_cleanup && !expr_noexcept_p (tmp, tf_none))
+  if (flag_exceptions && !in_cleanup && !processing_template_decl
+  && !expr_noexcept_p (tmp, tf_none))
  {
loop = build2 (TRY_CATCH_EXPR, void_type_node, loop,
 unshare_expr (loop));
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
new file mode 100644
index 000..5e37e3de62a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
@@ -0,0 +1,12 @@
+// PR c++/109899
+// { dg-do compile { target c++11 } }
+
+struct A { A(); ~A(); };
+
+template 
+using array = T[42];
+
+template
+void f() {
+  array{};
+}




Re: [PATCH] c++: Fix ENABLE_SCOPE_CHECKING printing

2024-01-15 Thread Jason Merrill

On 1/15/24 04:41, Nathaniel Shead wrote:

While working on another bug, I noticed the ENABLE_SCOPE_CHECKING macro
and thought to try it out. It caused selftest to ICE. This patch is a
minimal fix to get it working again.

Probably this should use a test to stop this regressing again in the
future the next time new scope-kinds are added, but given it's dependent
on a (almost certainly rarely-used) build-time macro I'm not sure
exactly how you would do that?

Or alternatively I could add a `sk_count` to the end of the scope kind
list and `static_assert` that the size of the descriptor list matches?


That sounds good.


(Also not sure if this would be appropriate for stage 4 or if it should
wait till next stage 1. I suppose this fixes a regression but I suspect
this has been broken for a very long time.)


I think it's OK now since it doesn't affect the normal codepath.


-- >8 --

The lists of scope kinds used by ENABLE_SCOPE_CHECKING don't seem to
have been updated in a long while, causing ICEs and confusing output.
This patch brings the list into line.

gcc/cp/ChangeLog:

* name-lookup.cc (cp_binding_level_descriptor): Add missing
scope kinds.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index d827d337d3b..2e93ed183f1 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4464,11 +4464,16 @@ cp_binding_level_descriptor (cp_binding_level *scope)
  "try-scope",
  "catch-scope",
  "for-scope",
+"cond-init-scope",
+"stmt-expr-scope",
  "function-parameter-scope",
  "class-scope",
+"enum-scope",
  "namespace-scope",
  "template-parameter-scope",
-"template-explicit-spec-scope"
+"template-explicit-spec-scope",
+"transaction-scope",
+"openmp-scope"
};
const scope_kind kind = scope->explicit_spec_p
  ? sk_template_spec : scope->kind;




Re: [commit] MIPS: Add ATTRIBUTE_UNUSED to mips_start_function_definition

2024-01-15 Thread rep . dot . nop
On 11 January 2024 10:59:21 CET, YunQiang Su  wrote:
>Fix build warning:
>  mips.cc: warning: unused parameter 'decl'.
>
>gcc
>   * config/mips/mips.cc (mips_start_function_definition):
>   Add ATTRIBUTE_UNUSED.
>---
> gcc/config/mips/mips.cc | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
>index 60b336e43d0..e752019b5e2 100644
>--- a/gcc/config/mips/mips.cc
>+++ b/gcc/config/mips/mips.cc
>@@ -7330,7 +7330,8 @@ mips_start_unique_function (const char *name)
>function contains MIPS16 code.  */
> 
> static void
>-mips_start_function_definition (const char *name, bool mips16_p, tree decl)
>+mips_start_function_definition (const char *name, bool mips16_p,
>+  tree decl ATTRIBUTE_UNUSED)

Nowadays in C++ you can just remove the identifier name:

+mips_start_function_definition (const char *name, bool mips16_p, tree)

> {
>   if (mips16_p)
> fprintf (asm_out_file, "\t.set\tmips16\n");



Ping^2: [PATCH] toplevel: don't override gettext-runtime/configure-discovered build args

2024-01-15 Thread Arsen Arsenović
Evening folks,

Hope you had wonderful holidays.

Gentle ping on this patch.

Have a lovely night!
--
Arsen Arsenović


signature.asc
Description: PGP signature


[PATCH] c++: ICE with auto in template arg [PR110065]

2024-01-15 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

Here we started crashing with r14-1659 because that removed the
auto checking in cp_parser_template_type_arg which seemed like
dead code.  But the attached test shows that the code can still
be reached because cp_parser_type_id_1 checks auto only when
auto_is_implicit_function_template_parm_p is on.

Then I noticed that we're still crashing in C++20, and that ICE
started with r12-4772.  So I changed the reemerged check to use
flag_concepts_ts rather than flag_concepts on the basis that
check_auto_in_tmpl_args also checks flag_concepts_ts.

PR c++/110065

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_type_arg): Add auto checking.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto8.C: New test.
* g++.dg/concepts/auto8a.C: New test.
---
 gcc/cp/parser.cc   | 12 ++--
 gcc/testsuite/g++.dg/concepts/auto8.C  | 17 +
 gcc/testsuite/g++.dg/concepts/auto8a.C | 18 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/auto8.C
 create mode 100644 gcc/testsuite/g++.dg/concepts/auto8a.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 8ab98cc0c23..e92309b8960 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25063,12 +25063,20 @@ cp_parser_type_id (cp_parser *parser, cp_parser_flags 
flags,
 static tree
 cp_parser_template_type_arg (cp_parser *parser)
 {
-  tree r;
   const char *saved_message = parser->type_definition_forbidden_message;
   parser->type_definition_forbidden_message
 = G_("types may not be defined in template arguments");
-  r = cp_parser_type_id_1 (parser, CP_PARSER_FLAGS_NONE, true, false, NULL);
+  tree r = cp_parser_type_id_1 (parser, CP_PARSER_FLAGS_NONE,
+   /*is_template_arg=*/true,
+   /*is_trailing_return=*/false, nullptr);
   parser->type_definition_forbidden_message = saved_message;
+  /* cp_parser_type_id_1 checks for auto, but only for
+ ->auto_is_implicit_function_template_parm_p.  */
+  if (cxx_dialect >= cxx14 && !flag_concepts_ts && type_uses_auto (r))
+{
+  error ("invalid use of % in template argument");
+  r = error_mark_node;
+}
   return r;
 }
 
diff --git a/gcc/testsuite/g++.dg/concepts/auto8.C 
b/gcc/testsuite/g++.dg/concepts/auto8.C
new file mode 100644
index 000..f9d98b2ec0f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto8.C
@@ -0,0 +1,17 @@
+// PR c++/110065
+// { dg-do compile { target c++17 } }
+
+template 
+inline constexpr bool t = false;
+
+int
+f ()
+{
+  return t auto&>; // { dg-error "template argument" }
+}
+
+void
+g ()
+{
+  t auto&>; // { dg-error "template argument" }
+}
diff --git a/gcc/testsuite/g++.dg/concepts/auto8a.C 
b/gcc/testsuite/g++.dg/concepts/auto8a.C
new file mode 100644
index 000..fc60dc871c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto8a.C
@@ -0,0 +1,18 @@
+// PR c++/110065
+// { dg-do compile { target c++17 } }
+// { dg-additional-options -fconcepts-ts }
+
+template 
+inline constexpr bool t = false;
+
+int
+f ()
+{
+  return t auto&>; // { dg-error "template argument" }
+}
+
+void
+g ()
+{
+  t auto&>; // { dg-error "template argument" }
+}

base-commit: 731444b3c39e3dc3dd8778f430a38742861dcca1
-- 
2.43.0



Re: [PATCH] c++: ICE with auto in template arg [PR110065]

2024-01-15 Thread Jason Merrill

On 1/15/24 17:14, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


-- >8 --

Here we started crashing with r14-1659 because that removed the
auto checking in cp_parser_template_type_arg which seemed like
dead code.  But the attached test shows that the code can still
be reached because cp_parser_type_id_1 checks auto only when
auto_is_implicit_function_template_parm_p is on.

Then I noticed that we're still crashing in C++20, and that ICE
started with r12-4772.  So I changed the reemerged check to use
flag_concepts_ts rather than flag_concepts on the basis that
check_auto_in_tmpl_args also checks flag_concepts_ts.

PR c++/110065

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_type_arg): Add auto checking.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto8.C: New test.
* g++.dg/concepts/auto8a.C: New test.
---
  gcc/cp/parser.cc   | 12 ++--
  gcc/testsuite/g++.dg/concepts/auto8.C  | 17 +
  gcc/testsuite/g++.dg/concepts/auto8a.C | 18 ++
  3 files changed, 45 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/concepts/auto8.C
  create mode 100644 gcc/testsuite/g++.dg/concepts/auto8a.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 8ab98cc0c23..e92309b8960 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25063,12 +25063,20 @@ cp_parser_type_id (cp_parser *parser, cp_parser_flags 
flags,
  static tree
  cp_parser_template_type_arg (cp_parser *parser)
  {
-  tree r;
const char *saved_message = parser->type_definition_forbidden_message;
parser->type_definition_forbidden_message
  = G_("types may not be defined in template arguments");
-  r = cp_parser_type_id_1 (parser, CP_PARSER_FLAGS_NONE, true, false, NULL);
+  tree r = cp_parser_type_id_1 (parser, CP_PARSER_FLAGS_NONE,
+   /*is_template_arg=*/true,
+   /*is_trailing_return=*/false, nullptr);
parser->type_definition_forbidden_message = saved_message;
+  /* cp_parser_type_id_1 checks for auto, but only for
+ ->auto_is_implicit_function_template_parm_p.  */
+  if (cxx_dialect >= cxx14 && !flag_concepts_ts && type_uses_auto (r))
+{
+  error ("invalid use of % in template argument");
+  r = error_mark_node;
+}
return r;
  }
  
diff --git a/gcc/testsuite/g++.dg/concepts/auto8.C b/gcc/testsuite/g++.dg/concepts/auto8.C

new file mode 100644
index 000..f9d98b2ec0f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto8.C
@@ -0,0 +1,17 @@
+// PR c++/110065
+// { dg-do compile { target c++17 } }
+
+template 
+inline constexpr bool t = false;
+
+int
+f ()
+{
+  return t auto&>; // { dg-error "template argument" }
+}
+
+void
+g ()
+{
+  t auto&>; // { dg-error "template argument" }
+}
diff --git a/gcc/testsuite/g++.dg/concepts/auto8a.C 
b/gcc/testsuite/g++.dg/concepts/auto8a.C
new file mode 100644
index 000..fc60dc871c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto8a.C
@@ -0,0 +1,18 @@
+// PR c++/110065
+// { dg-do compile { target c++17 } }
+// { dg-additional-options -fconcepts-ts }
+
+template 
+inline constexpr bool t = false;
+
+int
+f ()
+{
+  return t auto&>; // { dg-error "template argument" }
+}
+
+void
+g ()
+{
+  t auto&>; // { dg-error "template argument" }
+}

base-commit: 731444b3c39e3dc3dd8778f430a38742861dcca1




[COMMITTED] Add myself to the DCO section

2024-01-15 Thread Andrew Pinski
It is time to add myself to DCO section for my quicinc email account.

ChangeLog:

* MAINTAINERS (DCO): Add myself.

Signed-off-by: Andrew Pinski 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 882694cc47d..cb5a42501dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -766,6 +766,7 @@ Jeff Law

 Jeff Law   
 Immad Mir  
 Gaius Mulley   
+Andrew Pinski  
 Siddhesh Poyarekar 
 Navid Rahimi   
 Rishi Raj  

-- 
2.39.3



Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello Richard:

On 15/01/24 6:25 pm, Ajit Agarwal wrote:
> 
> 
> On 15/01/24 6:14 pm, Ajit Agarwal wrote:
>> Hello Richard:
>>
>> On 15/01/24 3:03 pm, Richard Biener wrote:
>>> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal  wrote:

 Hello All:

 This patch add the vecload pass to replace adjacent memory accesses lxv 
 with lxvp
 instructions. This pass is added before ira pass.

 vecload pass removes one of the defined adjacent lxv (load) and replace 
 with lxvp.
 Due to removal of one of the defined loads the allocno is has only uses but
 not defs.

 Due to this IRA pass doesn't assign register pairs like registers in 
 sequence.
 Changes are made in IRA register allocator to assign sequential registers 
 to
 adjacent loads.

 Some of the registers are cleared and are not set as profitable registers 
 due
 to zero cost is greater than negative costs and checks are added to compare
 positive costs.

 LRA register is changed not to reassign them to different register and form
 the sequential register pairs intact.


 contrib/check_GNU_style.sh run on patch looks good.

 Bootstrapped and regtested for powerpc64-linux-gnu.

 Spec2017 benchmarks are run and I get impressive benefits for some of the 
 FP
 benchmarks.
>>> i
>>> I want to point out the aarch64 target recently got a ld/st fusion
>>> pass which sounds
>>> related.  It would be nice to have at least common infrastructure for
>>> this (the aarch64
>>> one also looks quite more powerful)
>>
>> load/store fusion pass in aarch64 is scheduled to use before peephole2 pass 
>> and after register allocator pass. In our case, if we do after register 
>> allocator
>> then we should keep register assigned to lower offset load and other load
>> that is adjacent to previous load with offset difference of 16 is removed.
>>
>> Then we are left with one load with lower offset and register assigned 
>> by register allocator for lower offset load should be lower than other
>> adjacent load. If not, we need to change it to lower register and 
>> propagate them with all the uses of the variable. Similary for other
>> adjacent load that we are removing, register needs to be propagated to
>> all the uses.
>>
>> In that case we are doing the work of register allocator. In most of our
>> example testcases the lower offset load is assigned greater register 
>> than other adjacent load by register allocator and hence we are left
>> with propagating them always and almost redoing the register allocator
>> work.
>>
>> Is it same/okay to use load/store fusion pass as on aarch64 for our cases
>> considering the above scenario.
>>
>> Please let me know what do you think. 

I have gone through the implementation of ld/st fusion in aarch64.

Here is my understanding:

First all its my mistake that I have mentioned in my earlier mail that 
this pass is done before peephole2 after RA-pass.

This pass does it before RA-pass early before early-remat and 
also before peephole2 after RA-pass.

This pass does load fusion 2 ldr instruction with adjacent accesses
into ldp instruction.

The assembly syntax of ldp instruction is

ldp w3, w7, [x0]

It loads [X0] into w3 and [X0+4] into W7.

Both registers that forms pairs are mentioned in ldp instructions
and might not be in sequntial order like first register is W3 and
then next register would be W3+1.

Thats why the pass before RA-pass works as it has both the defs
and may not be required in sequential order like first_reg and then
first_reg+1. It can be any valid registers.


But in lxvp instructions:

lxv vs32, 0(r2)
lxv vs45, 16(r2)

When we combine above lxv instruction into lxvp, lxvp instruction
becomes

lxvp vs32, 0(r2)

wherein in lxvp  r2+0 is loaded into vs32 and r2+16 is loaded into vs33 
register (sequential registers). vs33 is hidden in lxvp instruction.
This is mandatory requirement for lxvp instruction and cannot be in 
any other sequence. register assignment difference should be 1.

All the uses of r45 has to be propagated with r33.

And also register allocator can allocate two lxv instructions
in the following registers.

lxv vs33, 0(r2)
lxv vs32, 16(r2)

To generate lxvp for above lxv instructions 

lxvp vs32, 0(r2).

And all the registers vs33 has to be propagated with vs32 and vs32
has to be propagated with vs33 if we do vecload pass after RA-pass.

If we do before RA-pass the IRA and LRA register allocation cannot
assign register with a difference of 1 and the order difference can
be anything with a positive difference.

IRA allocated one in vs32 and other can in vs45.

In vecload pass we remove one lxv from 2 lxv instruction and 2nd
lxv instruction with offset of 16 is removed and the use of register
with 2nd lxv's will not have defs and IRA pass cannot allocate
them in order with a difference of 1.

Thats why we need to make changes in IRA and LRA pass to assign
register with a differen

Re: [PATCH v3] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-15 Thread Ulrich Drepper
On Mon, Jan 15, 2024 at 9:45 PM Jonathan Wakely  wrote:
> I think I'm happy with this now. It has tests for all the new functions,
> and the performance of the charset alias match algorithm is improved by
> reusing part of .
>
> Tested x86_64-linux.

Looks good to me.  Good work, Jon.


Re: [PATCH] c++/modules: Support thread_local statics in header modules [PR113292]

2024-01-15 Thread Jason Merrill

On 1/11/24 01:12, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk?

-- >8 --

Currently, thread_locals in header modules cause ICEs. This patch makes
the required changes for them to work successfully.

Functions exported by a module need DECL_CONTEXT to be set, so we
inherit it from the variable we're wrapping.

We additionally require writing the DECL_TLS_MODEL for thread-local
variables to the module interface, and the TLS wrapper function needs to
have its DECL_BEFRIENDING_CLASSES written too as this is used to
retrieve what VAR_DECL it's a wrapper for when emitting a definition at
end of TU processing.

PR c++/113292

gcc/cp/ChangeLog:
* decl2.cc (get_tls_wrapper_fn): Set DECL_CONTEXT.
(c_parse_final_cleanups): Suppress warning for no definition of
TLS wrapper functions in header modules.
* module.cc (trees_out::lang_decl_vals): Write wrapped variable
for TLS wrapper functions.
(trees_in::lang_decl_vals): Read it.
(trees_out::decl_value): Write TLS model for thread-local vars.
(trees_in::decl_value): Read it for new decls. Remember to emit
definitions of TLS wrapper functions later.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113292_a.H: New test.
* g++.dg/modules/pr113292_b.C: New test.
* g++.dg/modules/pr113292_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl2.cc   | 10 ---
  gcc/cp/module.cc  | 22 +++
  gcc/testsuite/g++.dg/modules/pr113292_a.H | 34 +++
  gcc/testsuite/g++.dg/modules/pr113292_b.C | 13 +
  gcc/testsuite/g++.dg/modules/pr113292_c.C | 11 
  5 files changed, 86 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr113292_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr113292_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr113292_c.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index fb996561f1b..ab348f8ecb7 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -3860,6 +3860,7 @@ get_tls_wrapper_fn (tree var)
TREE_PUBLIC (fn) = TREE_PUBLIC (var);
DECL_ARTIFICIAL (fn) = true;
DECL_IGNORED_P (fn) = 1;
+  DECL_CONTEXT (fn) = DECL_CONTEXT (var);
/* The wrapper is inline and emitted everywhere var is used.  */
DECL_DECLARED_INLINE_P (fn) = true;
if (TREE_PUBLIC (var))
@@ -5289,10 +5290,11 @@ c_parse_final_cleanups (void)
 #pragma interface, etc.) we decided not to emit the
 definition here.  */
  && !DECL_INITIAL (decl)
- /* A defaulted fn in a header module can be synthesized on
-demand later.  (In non-header modules we should have
-synthesized it above.)  */
- && !(DECL_DEFAULTED_FN (decl) && header_module_p ())
+ /* A defaulted fn or TLS wrapper in a header module can be
+synthesized on demand later.  (In non-header modules we
+should have synthesized it above.)  */
+ && !(header_module_p ()


Hmm, should this be !module_attach_p instead of header_module_p?

The patch is OK, that can change separately if appropriate.

Jason



Re: [PATCH] c-family: copy attribute diagnostic fixes [PR113262]

2024-01-15 Thread Jason Merrill

On 1/9/24 03:52, Jakub Jelinek wrote:

Hi!

The copy attributes is allowed on decls as well as types and even has
checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
if it applies to a decl.


In the C++ front-end location_of checks to see if the type has a 
TYPE_MAIN_DECL to get a location from, you might do that if !DECL_P?  OK 
either way.



The following patch fixes that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2024-01-09  Jakub Jelinek  

PR c/113262
* c-attribs.cc (handle_copy_attribute): Don't use
DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use input_location
instead.  Formatting fixes.

* gcc.dg/pr113262.c: New test.

--- gcc/c-family/c-attribs.cc.jj2024-01-03 12:07:02.020736256 +0100
+++ gcc/c-family/c-attribs.cc   2024-01-08 22:10:04.789616664 +0100
@@ -3143,13 +3143,14 @@ handle_copy_attribute (tree *node, tree
if (ref == error_mark_node)
  return NULL_TREE;
  
+  location_t loc = input_location;

+  if (DECL_P (decl))
+loc = DECL_SOURCE_LOCATION (decl);
if (TREE_CODE (ref) == STRING_CST)
  {
/* Explicitly handle this case since using a string literal
 as an argument is a likely mistake.  */
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute argument cannot be a string",
-   name);
+  error_at (loc, "%qE attribute argument cannot be a string", name);
return NULL_TREE;
  }
  
@@ -3160,10 +3161,8 @@ handle_copy_attribute (tree *node, tree

/* Similar to the string case, since some function attributes
 accept literal numbers as arguments (e.g., alloc_size or
 nonnull) using one here is a likely mistake.  */
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute argument cannot be a constant arithmetic "
-   "expression",
-   name);
+  error_at (loc, "%qE attribute argument cannot be a constant arithmetic "
+   "expression", name);
return NULL_TREE;
  }
  
@@ -3171,12 +3170,11 @@ handle_copy_attribute (tree *node, tree

  {
/* Another possible mistake (but indirect self-references aren't
 and diagnosed and shouldn't be).  */
-  if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+  if (warning_at (loc, OPT_Wattributes,
  "%qE attribute ignored on a redeclaration "
- "of the referenced symbol",
- name))
-   inform (DECL_SOURCE_LOCATION (node[1]),
-   "previous declaration here");
+ "of the referenced symbol", name)
+ && DECL_P (node[1]))
+   inform (DECL_SOURCE_LOCATION (node[1]), "previous declaration here");
return NULL_TREE;
  }
  
@@ -3196,7 +3194,8 @@ handle_copy_attribute (tree *node, tree

ref = TREE_OPERAND (ref, 1);
else
break;
-} while (!DECL_P (ref));
+}
+  while (!DECL_P (ref));
  
/* For object pointer expressions, consider those to be requests

   to copy from their type, such as in:
@@ -3228,8 +3227,7 @@ handle_copy_attribute (tree *node, tree
 to a variable, or variable attributes to a function.  */
  if (warning (OPT_Wattributes,
   "%qE attribute ignored on a declaration of "
-  "a different kind than referenced symbol",
-  name)
+  "a different kind than referenced symbol", name)
  && DECL_P (ref))
inform (DECL_SOURCE_LOCATION (ref),
"symbol %qD referenced by %qD declared here", ref, decl);
@@ -3279,9 +3277,7 @@ handle_copy_attribute (tree *node, tree
  }
else if (!TYPE_P (decl))
  {
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute must apply to a declaration",
-   name);
+  error_at (loc, "%qE attribute must apply to a declaration", name);
return NULL_TREE;
  }
  
--- gcc/testsuite/gcc.dg/pr113262.c.jj	2024-01-08 22:19:07.414588762 +0100

+++ gcc/testsuite/gcc.dg/pr113262.c 2024-01-08 22:18:51.327815573 +0100
@@ -0,0 +1,6 @@
+/* PR c/113262 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int [[gnu::copy ("")]] a;/* { dg-error "'copy' attribute argument cannot be a 
string" } */
+

Jakub





Re: [PATCH v2] c++/modules: Differentiate extern templates and TYPE_DECL_SUPPRESS_DEBUG [PR112820]

2024-01-15 Thread Jason Merrill

On 1/8/24 10:27, Patrick Palka wrote:

On Mon, 8 Jan 2024, Nathaniel Shead wrote:

On Thu, Jan 04, 2024 at 03:39:15PM -0500, Patrick Palka wrote:

On Sun, 3 Dec 2023, Nathaniel Shead wrote:


The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same
underlying bit. This is causing confusion when attempting to determine
the interface for a streamed-in class type, since the modules code
currently assumes that all DECL_EXTERNAL types are extern templates.
However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence
DECL_EXTERNAL) is marked on various other kinds of declarations, such as
vtables, which causes them to never be emitted.


But a vtable isn't a TYPE_DECL?

I suspect what you mean is that maybe_suppress_debug_info is setting 
TYPE_DECL_SUPPRESS_DEBUG to try to avoid duplication of debug info for 
classes with vtables, and then the modules code is wrongly assuming that 
you can check DECL_EXTERNAL for TYPE_DECL, and that it's set only if 
CLASSTYPE_INTERFACE_ONLY is also set, which is wrong in this case, so we 
avoid emitting the vtable or anything else for that class.


It seems unnecessary to start setting DECL_EXTERNAL on the TYPE_DECL to 
mean the exact same thing as CLASSTYPE_INTERFACE_ONLY.  Rather, the 
modules code should stop trying to check DECL_EXTERNAL on a TYPE_DECL.


Under what circumstances does it make sense for CLASSTYPE_INTERFACE_ONLY 
to be set in the context of modules, anyway?  We probably want to 
propagate it for things in the global module so that various libstdc++ 
explicit instantiations work the same with import std.


For an class imported from a named module, this ties into the earlier 
discussion about vtables and inlines that hasn't resolved yet in the ABI 
committee.  But it's certainly significantly interface-like.  And I 
would expect maybe_suppress_debug_info to suppress the debug info for 
such a class on the assumption that the module unit has the needed debug 
info.


Jason



[pushed] analyzer: casting all zeroes should give all zeroes [PR113333]

2024-01-15 Thread David Malcolm
In particular, accessing the result of *calloc (1, SZ) (if non-NULL)
should be known to be all zeroes.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-7265-gd235bf2e807c5f.

gcc/analyzer/ChangeLog:
PR analyzer/11
* region-model-manager.cc
(region_model_manager::maybe_fold_unaryop): Casting all zeroes
should give all zeroes.

gcc/testsuite/ChangeLog:
PR analyzer/11
* c-c++-common/analyzer/calloc-1.c: Add tests.
* c-c++-common/analyzer/pr96639.c: Update expected results.
* gcc.dg/analyzer/data-model-9.c: Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-manager.cc  |  6 
 .../c-c++-common/analyzer/calloc-1.c  | 34 +++
 gcc/testsuite/c-c++-common/analyzer/pr96639.c |  2 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-9.c  |  6 ++--
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index fc3523f8815c..62f808a81c20 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -457,6 +457,12 @@ region_model_manager::maybe_fold_unaryop (tree type, enum 
tree_code op,
  && region_sval->get_type ()
  && POINTER_TYPE_P (region_sval->get_type ()))
return get_ptr_svalue (type, region_sval->get_pointee ());
+
+   /* Casting all zeroes should give all zeroes.  */
+   if (type
+   && arg->all_zeroes_p ()
+   && (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type)))
+ return get_or_create_int_cst (type, 0);
   }
   break;
 case TRUTH_NOT_EXPR:
diff --git a/gcc/testsuite/c-c++-common/analyzer/calloc-1.c 
b/gcc/testsuite/c-c++-common/analyzer/calloc-1.c
index 6bd658ec94a4..cb93fa8987f0 100644
--- a/gcc/testsuite/c-c++-common/analyzer/calloc-1.c
+++ b/gcc/testsuite/c-c++-common/analyzer/calloc-1.c
@@ -22,3 +22,37 @@ char *test_1 (size_t sz)
 
   return p;
 }
+
+char **
+test_pr11_1 (void)
+{
+  char **p = (char **)calloc (1, sizeof(char *));
+  if (p)
+{
+  __analyzer_eval (*p == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p[0] == 0); /* { dg-warning "TRUE" } */
+}
+  return p;
+}
+
+char **
+test_pr11_2 (void)
+{
+  char **p = (char **)calloc (2, sizeof(char *));
+  if (p)
+{
+  __analyzer_eval (*p == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p[0] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p[1] == 0); /* { dg-warning "TRUE" } */
+}
+  return p;
+}
+
+char **
+test_pr11_3 (void)
+{
+  char **vec = (char **)calloc (1, sizeof(char *));
+  if (vec)
+for (char **p=vec ; *p ; p++); /* { dg-bogus "heap-based buffer over-read" 
} */
+  return vec;
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/pr96639.c 
b/gcc/testsuite/c-c++-common/analyzer/pr96639.c
index b95217df6c41..2610ce8d602a 100644
--- a/gcc/testsuite/c-c++-common/analyzer/pr96639.c
+++ b/gcc/testsuite/c-c++-common/analyzer/pr96639.c
@@ -6,5 +6,5 @@ x7 (void)
   int **md = (int **) calloc (1, sizeof (void *));
 
   return md[0][0]; /* { dg-warning "possibly-NULL" "unchecked deref" } */
-  /* { dg-warning "leak of 'md'" "leak" { target *-*-* } .-1 } */
+  /* { dg-warning "Wanalyzer-null-dereference" "deref of NULL" { target *-*-* 
} .-1 } */
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/data-model-9.c 
b/gcc/testsuite/gcc.dg/analyzer/data-model-9.c
index 159bc612576c..2121f20c4f02 100644
--- a/gcc/testsuite/gcc.dg/analyzer/data-model-9.c
+++ b/gcc/testsuite/gcc.dg/analyzer/data-model-9.c
@@ -14,8 +14,7 @@ void test_1 (void)
   struct foo *f = calloc (1, sizeof (struct foo));
   if (f == NULL)
 return;
-  __analyzer_eval (f->i == 0); /* { dg-warning "TRUE" "desired" { xfail *-*-* 
} } */
-  /* { dg-bogus "UNKNOWN" "status quo" { xfail *-*-* } .-1 } */
+  __analyzer_eval (f->i == 0); /* { dg-warning "TRUE" } */
   free (f);
 }
 
@@ -27,7 +26,6 @@ void test_2 (void)
   if (f == NULL)
 return;
   memset (f, 0, sizeof (struct foo));
-  __analyzer_eval (f->i == 0); /* { dg-warning "TRUE" "desired" { xfail *-*-* 
} } */
-  /* { dg-bogus "UNKNOWN" "status quo" { xfail *-*-* } .-1 } */
+  __analyzer_eval (f->i == 0); /* { dg-warning "TRUE" } */
   free (f);
 }
-- 
2.26.3



[pushed] analyzer: fix false +ves from -Wanalyzer-tainted-array-index with unsigned char index [PR106229]

2024-01-15 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-7266-gce27b66d952127.

gcc/analyzer/ChangeLog:
PR analyzer/106229
* analyzer.h (compare_constants): New decl.
* constraint-manager.cc (compare_constants): Make non-static.
* sm-taint.cc: Add include "fold-const.h".
(class concrete_range): New.
(get_possible_range): New.
(index_can_be_out_of_bounds_p): New.
(region_model::check_region_for_taint): Reject
-Wanalyzer-tainted-array-index if the type of the value makes it
impossible for it to be out-of-bounds of the array.

gcc/testsuite/ChangeLog:
PR analyzer/106229
* c-c++-common/analyzer/taint-index-pr106229.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |   3 +
 gcc/analyzer/constraint-manager.cc|   2 +-
 gcc/analyzer/sm-taint.cc  | 114 +-
 .../analyzer/taint-index-pr106229.c   | 109 +
 4 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/taint-index-pr106229.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 8dec9649f2fb..23e3f71df0af 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -427,6 +427,9 @@ bit_offset_to_json (const bit_offset_t &offset);
 extern json::value *
 byte_offset_to_json (const byte_offset_t &offset);
 
+extern tristate
+compare_constants (tree lhs_const, enum tree_code op, tree rhs_const);
+
 } // namespace ana
 
 extern bool is_special_named_call_p (const gcall *call, const char *funcname,
diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 2db6c1734638..e8bcabeb0cd5 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -54,7 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
-static tristate
+tristate
 compare_constants (tree lhs_const, enum tree_code op, tree rhs_const)
 {
   tree comparison
diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 3f7e5cd55837..dc4b078c411f 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "digraph.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "fold-const.h"
 #include "analyzer/supergraph.h"
 #include "analyzer/call-string.h"
 #include "analyzer/program-point.h"
@@ -1369,6 +1370,104 @@ make_taint_state_machine (logger *logger)
   return new taint_state_machine (logger);
 }
 
+/* A closed concrete range.  */
+
+class concrete_range
+{
+public:
+  /* Return true iff THIS is fully within OTHER
+ i.e.
+ - m_min must be >= OTHER.m_min
+ - m_max must be <= OTHER.m_max.  */
+  bool within_p (const concrete_range &other) const
+  {
+if (compare_constants (m_min, GE_EXPR, other.m_min).is_true ())
+  if (compare_constants (m_max, LE_EXPR, other.m_max).is_true ())
+   return true;
+return false;
+  }
+
+  tree m_min;
+  tree m_max;
+};
+
+/* Attempt to get a closed concrete range for SVAL based on types.
+   If found, write to *OUT and return true.
+   Otherwise return false.  */
+
+static bool
+get_possible_range (const svalue *sval, concrete_range *out)
+{
+  if (const svalue *inner = sval->maybe_undo_cast ())
+{
+  concrete_range inner_range;
+  if (!get_possible_range (inner, &inner_range))
+   return false;
+
+  if (sval->get_type ()
+ && inner->get_type ()
+ && INTEGRAL_TYPE_P (sval->get_type ())
+ && INTEGRAL_TYPE_P (inner->get_type ())
+ && TYPE_UNSIGNED (inner->get_type ())
+ && (TYPE_PRECISION (sval->get_type ())
+ > TYPE_PRECISION (inner->get_type (
+   {
+ /* We have a cast from an unsigned type to a wider integral type.
+Assuming this is zero-extension, we can inherit the range from
+the inner type.  */
+ enum tree_code op = ((const unaryop_svalue *)sval)->get_op ();
+ out->m_min = fold_unary (op, sval->get_type (), inner_range.m_min);
+ out->m_max = fold_unary (op, sval->get_type (), inner_range.m_max);
+ return true;
+   }
+}
+
+  if (sval->get_type ()
+  && INTEGRAL_TYPE_P (sval->get_type ()))
+{
+  out->m_min = TYPE_MIN_VALUE (sval->get_type ());
+  out->m_max = TYPE_MAX_VALUE (sval->get_type ());
+  return true;
+}
+
+  return false;
+}
+
+/* Determine if it's possible for tainted array access ELEMENT_REG to
+   actually be a problem.
+
+   Check here for index being from e.g. unsigned char when the array
+   contains >= 255 elements.
+
+   Return true if out-of-bounds is possible, false if it's impossible
+   (for suppressing false positives).  */
+
+static bool
+index_can_be_out_of_bounds_p (const element

Re: [PATCH v3] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-15 Thread Patrick Palka
On Mon, 15 Jan 2024, Jonathan Wakely wrote:

> I think I'm happy with this now. It has tests for all the new functions,
> and the performance of the charset alias match algorithm is improved by
> reusing part of .
> 
> Tested x86_64-linux.
> 
> -- >8 --
> 
> This is another C++26 change, approved in Varna 2022. We require a new

2023?

> static array of data that is extracted from the IANA Character Sets
> database. A new Python script to generate a header from the IANA CSV
> file is added.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/113318
>   * acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
>   (GLIBCXX_CHECK_TEXT_ENCODING): Define.
>   * config.h.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
>   * include/Makefile.am: Add new headers.
>   * include/Makefile.in: Regenerate.
>   * include/bits/locale_classes.h (locale::encoding): Declare new
>   member function.
>   * include/bits/unicode.h (__charset_alias_match): New function.
>   * include/bits/text_encoding-data.h: New file.
>   * include/bits/version.def (text_encoding): Define.
>   * include/bits/version.h: Regenerate.
>   * include/std/text_encoding: New file.
>   * src/Makefile.am: Add new subdirectory.
>   * src/Makefile.in: Regenerate.
>   * src/c++26/Makefile.am: New file.
>   * src/c++26/Makefile.in: New file.
>   * src/c++26/text_encoding.cc: New file.
>   * src/experimental/Makefile.am: Include c++26 convenience
>   library.
>   * src/experimental/Makefile.in: Regenerate.
>   * python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
>   printer.
>   * scripts/gen_text_encoding_data.py: New file.
>   * testsuite/22_locale/locale/encoding.cc: New test.
>   * testsuite/ext/unicode/charset_alias_match.cc: New test.
>   * testsuite/std/text_encoding/cons.cc: New test.
>   * testsuite/std/text_encoding/members.cc: New test.
>   * testsuite/std/text_encoding/requirements.cc: New test.
> ---
>  libstdc++-v3/acinclude.m4 |  30 +-
>  libstdc++-v3/config.h.in  |   3 +
>  libstdc++-v3/configure|  70 +-
>  libstdc++-v3/configure.ac |   3 +
>  libstdc++-v3/include/Makefile.am  |   2 +
>  libstdc++-v3/include/Makefile.in  |   2 +
>  libstdc++-v3/include/bits/locale_classes.h|  14 +
>  .../include/bits/text_encoding-data.h | 902 ++
>  libstdc++-v3/include/bits/unicode.h   |  53 +-
>  libstdc++-v3/include/bits/version.def |  10 +
>  libstdc++-v3/include/bits/version.h   |  13 +-
>  libstdc++-v3/include/std/text_encoding| 704 ++
>  libstdc++-v3/python/libstdcxx/v6/printers.py  |  17 +
>  .../scripts/gen_text_encoding_data.py |  70 ++
>  libstdc++-v3/src/Makefile.am  |   3 +-
>  libstdc++-v3/src/Makefile.in  |   7 +-
>  libstdc++-v3/src/c++26/Makefile.am| 109 +++
>  libstdc++-v3/src/c++26/Makefile.in| 747 +++
>  libstdc++-v3/src/c++26/text_encoding.cc   |  91 ++
>  libstdc++-v3/src/experimental/Makefile.am |   2 +
>  libstdc++-v3/src/experimental/Makefile.in |   2 +
>  .../testsuite/22_locale/locale/encoding.cc|  36 +
>  .../ext/unicode/charset_alias_match.cc|  18 +
>  .../testsuite/std/text_encoding/cons.cc   | 113 +++
>  .../testsuite/std/text_encoding/members.cc|  41 +
>  .../std/text_encoding/requirements.cc |  31 +
>  26 files changed, 3083 insertions(+), 10 deletions(-)
>  create mode 100644 libstdc++-v3/include/bits/text_encoding-data.h
>  create mode 100644 libstdc++-v3/include/std/text_encoding
>  create mode 100755 libstdc++-v3/scripts/gen_text_encoding_data.py
>  create mode 100644 libstdc++-v3/src/c++26/Makefile.am
>  create mode 100644 libstdc++-v3/src/c++26/Makefile.in
>  create mode 100644 libstdc++-v3/src/c++26/text_encoding.cc
>  create mode 100644 libstdc++-v3/testsuite/22_locale/locale/encoding.cc
>  create mode 100644 libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc
>  create mode 100644 libstdc++-v3/testsuite/std/text_encoding/cons.cc
>  create mode 100644 libstdc++-v3/testsuite/std/text_encoding/members.cc
>  create mode 100644 libstdc++-v3/testsuite/std/text_encoding/requirements.cc
> 
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index e7cbf0fcf96..f9ba7ef744b 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
># Keep these sync'd with the list in Makefile.am.  The first provides an
># expandable list at autoconf time; the second provides an expandable list
># (i.e., shell variable) at configure time.
> -  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
> src/c++17 src/c++20 src/c++23 src/filesystem src/libbackt

[PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-15 Thread Roger Sayle

This patch resolves PR rtl-optimization/111267 by improving RTL-level
forward propagation.  This x86_64 code quality regression was caused
(exposed) by my changes to improve how x86's (TImode) argument passing
is represented at the RTL-level (reducing the use of SUBREGs to catch
more optimization opportunities in combine).  The pitfall is that the
more complex RTL representations expose a limitation in RTL's fwprop
pass.

At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
be summarized as three steps.  Step 1 is a heuristic that rejects the
propagation attempt if the expression is too complex, step 2 calls
the backend's recog to see if the propagated/simplified instruction
is recognizable/valid, and step 3 then calls src_cost to compare
the rtx costs of the replacement vs. the original, and accepts the
transformation if the final cost is the same of better.

The logic error (or missed optimization opportunity) is that the
step 1 heuristic that attempts to predict (second guess) the
process is flawed.  Ultimately the decision on whether to fwprop
or not should depend solely on actual improvement, as measured
by RTX costs.  Hence the prototype fix in the bugzilla PR removes
the heuristic of calling prop.profitable_p entirely, relying
entirely on the cost comparison in step 3.

Unfortunately, things are a tiny bit more complicated.  The cost
comparison in fwprop uses the older set_src_cost API and not the
newer (preffered) insn_cost API as currently used in combine.
This means that the cost improvement comparisons are only done
for single_set instructions (more complex PARALLELs etc. aren't
supported).  Hence we can only rely on skipping step 1 for that
subset of instructions actually evaluated by step 3.

The other subtlety is that to avoid potential infinite loops
in fwprop we should only reply purely on rtx costs when the
transformation is obviously an improvement.  If the replacement
has the same cost as the original, we can use the prop.profitable_p
test to preserve the current behavior.

Finally, to answer Richard Biener's remaining question about this
approach: yes, there is an asymmetry between how patterns are
handled and how REG_EQUAL notes are handled.  For example, at
the moment propagation into notes doesn't use rtx costs at all,
and ultimately when fwprop is updated to use insn_cost, this
(and recog) obviously isn't applicable to notes.  There's no reason
the logic need be identical between patterns and notes, and during
stage4 we only need update propagation into patterns to fix this
P1 regression (notes and use of cost_insn can be done for GCC 15).

For Jakub's reduced testcase:

struct S { float a, b, c, d; };
int bar (struct S x, struct S y) {
  return x.b <= y.d && x.c >= y.a;
}

On x86_64-pc-linux-gnu with -O2 gcc currently generates:

bar:movq%xmm2, %rdx
movq%xmm3, %rax
movq%xmm0, %rsi
xchgq   %rdx, %rax
movq%rsi, %rcx
movq%rax, %rsi
movq%rdx, %rax
shrq$32, %rcx
shrq$32, %rax
movd%ecx, %xmm4
movd%eax, %xmm0
comiss  %xmm4, %xmm0
jb  .L6
movd%esi, %xmm0
xorl%eax, %eax
comiss  %xmm0, %xmm1
setnb   %al
ret
.L6:xorl%eax, %eax
ret

with this simple patch to fwprop, we now generate:

bar:shufps  $85, %xmm0, %xmm0
shufps  $85, %xmm3, %xmm3
comiss  %xmm0, %xmm3
jb  .L6
xorl%eax, %eax
comiss  %xmm2, %xmm1
setnb   %al
ret
.L6:xorl%eax, %eax
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Additionally, it also resolves the FAIL for
gcc.target/i386/pr82580.c.  Ok for mainline?


2024-01-16  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/111267
* fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when
!prop.profitable_p for instructions that are not single sets.
When comparing costs, bail-out if the cost is unchanged and
!prop.profitable_p.

gcc/testsuite/ChangeLog
PR rtl-optimization/111267
* gcc.target/i386/pr111267.c: New test case.


Thanks in advance (and to Jeff Law for his guidance/help),
Roger
--

diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0c588f8..f06225a 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -449,7 +449,10 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   if (prop.num_replacements == 0)
 return false;
 
-  if (!prop.profitable_p ())
+  if (!prop.profitable_p ()
+  && (prop.changed_mem_p ()
+ || use_insn->is_asm ()
+ || !single_set (use_rtl)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "cannot propagate from insn %d into"
@@ -481,7 +484,8 @@ try_fwprop_subst_pattern (obstack_watermark &attem

[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-01-15 Thread HAO CHEN GUI
Hi,
  This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

  The test cases will be added in subsequent target specific patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
expand: Add const0 move checking for CLEAR_BY_PIECES optabs

vec_duplicate handles duplicates of non-constant inputs.  The 0 is a
constant.  So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move.  This patch adds
the checking.

gcc/
* expr.cc (by_pieces_mode_supported_p): Add const0 move checking
for CLEAR_BY_PIECES.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 34f5ff90a9f..cd960349a53 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op)
 static bool
 by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
-  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+  enum insn_code icode = optab_handler (mov_optab, mode);
+  if (icode == CODE_FOR_nothing)
 return false;

-  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  if (op == SET_BY_PIECES
   && VECTOR_MODE_P (mode)
   && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
 return false;

+  if (op == CLEAR_BY_PIECES
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing
+  && !insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+return false;
+
   if (op == COMPARE_BY_PIECES
   && !can_compare_p (EQ, mode, ccp_jump))
 return false;


[PATCH] LoongArch: Split vec_selects of bottom elements into simple move

2024-01-15 Thread Jiahao Xu
For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.

(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
  (vec_select:SF (reg:V8SF 32 $f0 [115])
  (parallel [
  (const_int 0 [0])
  ])))

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_extract_0):
New define_insn_and_split patten.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-extract.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 72f7161311c..90f66ee4d24 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -761,6 +761,21 @@ (define_expand "vec_extract"
   DONE;
 })
 
+(define_insn_and_split "vec_extract_0"
+  [(set (match_operand: 0 "register_operand" "=f")
+(vec_select:
+  (match_operand:FLASX 1 "register_operand" "f")
+  (parallel [(const_int 0)])))]
+  "ISA_HAS_LSX"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+}
+  [(set_attr "move_type" "fmove")
+   (set_attr "mode" "")])
+
 (define_expand "vec_perm"
  [(match_operand:LASX 0 "register_operand")
   (match_operand:LASX 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-extract.c 
b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
new file mode 100644
index 000..ce126e3a4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mlasx -fno-vect-cost-model 
-fno-unroll-loops" } */
+/* { dg-final { scan-assembler-not "xvpickve.w" } } */
+/* { dg-final { scan-assembler-not "xvpickve.d" } } */
+
+float
+sum_float (float *a, int n) {
+  float res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}
+
+double
+sum_double (double *a, int n) {
+  double res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}
-- 
2.20.1



[PATCH] LoongArch: Fix pattern vec_concatz

2024-01-15 Thread Jiahao Xu
In r14-7022-34d339bbd0c1f5b4ad9587e7ae8387c912cb028b I implement pattern
vec_concatz, the reg+reg addressing mode is not supported in
vec_concatz. This patch fixes that.

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_concatz): Fix pattern to
support reg+reg addressing mode.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-concatz.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 90f66ee4d24..77ab754fa9e 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -589,10 +589,8 @@ (define_insn "@vec_concatz"
   (match_operand: 2 "const_0_operand")))]
   "ISA_HAS_LASX"
 {
-  if (MEM_P (operands[1]))
-return "vld\t%w0,%1";
-  else
-return "vori.b\t%w0,%w1,0";
+  return loongarch_output_move (gen_lowpart (mode,
+operands[0]), operands[1]);
 }
   [(set_attr "type" "simd_splat")
(set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-concatz.c 
b/gcc/testsuite/gcc.target/loongarch/vect-concatz.c
new file mode 100644
index 000..45aa776c11b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-concatz.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
 
+/* { dg-options "-O3 -mlasx -fno-vect-cost-model" } */
+
+#include 
+
+typedef struct
+{
+  int *rect;
+  float *rect_float;
+  unsigned int x;
+  unsigned int y;
+} ImBuf;
+
+ImBuf *
+IMB_double_fast_x(ImBuf *ibuf1, ImBuf *ibuf2)
+{
+  int *p1, *dest, i, col, do_rect, do_float;
+  float *p1f, *destf;
+
+  if (ibuf1 == NULL) return (NULL);
+  if (ibuf1->rect == NULL && ibuf1->rect_float == NULL) return (NULL);
+
+  do_rect = (ibuf1->rect != NULL);
+  do_float = (ibuf1->rect_float != NULL);
+
+
+  p1 = (int *) ibuf1->rect;
+  dest = (int *) ibuf2->rect;
+  p1f = (float *)ibuf1->rect_float;
+  destf = (float *)ibuf2->rect_float;
+
+  for (i = ibuf1->y * ibuf1->x; i > 0; i--) {
+  if (do_rect) {
+ col = *p1++;
+ *dest++ = col;
+ *dest++ = col;
+  }
+  if (do_float) {
+ destf[0] = destf[4] = p1f[0];
+ destf[1] = destf[5] = p1f[1];
+ destf[2] = destf[6] = p1f[2];
+ destf[3] = destf[7] = p1f[3];
+ destf += 8;
+ p1f += 4;
+  }
+  }
+
+  return (ibuf2);
+}
+
+int
+main()
+{
+  return 0;
+}
-- 
2.20.1



[PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-15 Thread Jiahao Xu
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.

SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.

gcc/ChangeLog:

* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/short-circuit.c: New test.

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 4e6ede926d3..8b453ab3140 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -869,6 +869,7 @@ typedef struct {
1 is the default; other values are interpreted relative to that.  */
 
 #define BRANCH_COST(speed_p, predictable_p) la_branch_cost
+#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
 
 /* Return the asm template for a conditional branch instruction.
OPCODE is the opcode's mnemonic and OPERANDS is the asm template for
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c 
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 000..bed585ee172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
+
+int
+short_circuit (float *a)
+{
+  float t1x = a[0];
+  float t2x = a[1];
+  float t1y = a[2];
+  float t2y = a[3];
+  float t1z = a[4];
+  float t2z = a[5];
+
+  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y < 
t1z)
+return 0;
+
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */
-- 
2.20.1



[PATCH] testsuite: Fix vect_long_mult on Power [PR109705]

2024-01-15 Thread Kewen.Lin
Hi,

As pointed out by the discussion in PR109705, the current
vect_long_mult effective target check on Power is broken.
This patch is to fix it accordingly.

With additional change by adding a guard vect_long_mult
in gcc.dg/vect/pr25413a.c , it's tested well on Power{8,9}
LE & BE (also on Power10 LE as before).

I'm going to push this soon.

BR,
Kewen
-
PR testsuite/109705

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_long_mult):
Fix powerpc*-*-* checks.
---
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 81ae92a0266..fac32fb3d0e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9073,9 +9073,9 @@ proc check_effective_target_vect_int_mult { } {

 proc check_effective_target_vect_long_mult { } {
 if { [istarget i?86-*-*] || [istarget x86_64-*-*]
-|| (([istarget powerpc*-*-*]
-  && ![istarget powerpc-*-linux*paired*])
-  && [check_effective_target_ilp32])
+|| ([istarget powerpc*-*-*]
+ && [check_effective_target_powerpc_vsx_ok]
+ && [check_effective_target_has_arch_pwr10])
 || [is-effective-target arm_neon]
 || ([istarget sparc*-*-*] && [check_effective_target_ilp32])
 || [istarget aarch64*-*-*]
--
2.39.1


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-15 Thread chenxiaolong
在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道:
> On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote:
> > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote:
> > > On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> > > > Xi Ruoyao  wrote at 12:11pm on Monday,
> > > > January
> > > > 15, 2024:
> > > > > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > > > > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > > > > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong
> > > > > > > wrote:
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > 
> > > > > > > >* gcc.dg/pr104992.c: Added additional "-mlsx"
> > > > > > > > compilation
> > > > > > > > options.
> > > > > > > >* gcc.dg/signbit-2.c: Dito.
> > > > > > > >* gcc.dg/tree-ssa/scev-16.c: Dito.
> > > > > > > >* gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > > > > > > >* gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > > > > > > 
> > > > > > > I don't feel it right about the changes to pr104992.c and
> > > > > > > scev-16.c
> > > > > > > because no other architectures add special options
> > > > > > > there. 
> > > > > > > Why are we
> > > > > > > so special?
> > > > > > Because on the LoongArch architecture, GCC requires the
> > > > > > addition of
> > > > > > vectorization options in order to generate vector code. Use
> > > > > > the
> > > > > > check_effective_target_vect_cmdline_needed command in the
> > > > > > lib/target-
> > > > > > supports.exp file to set whether the command line option is
> > > > > > needed to
> > > > > > enable vectorizations. For example, ia64,x86,aarch64, and
> > > > > > riscv
> > > > > > architectures, vectorization is enabled by default.
> > > > > 
> > > > > But no.  The default baseline of 32-bit x86 is i686, which is
> > > > > basically
> > > > > a Pentium III launched in 1999 without any vector
> > > > > instructions.
> > > > > 
> > > > > We are still missing something here.
> > > > > 
> > > > There is a line
> > > >   #define vector
> > > > __attribute__((vector_size(4*sizeof(int
> > > > I guess it is the syntax needs to be supported.
> > > 
> > > This is always supported.  If the target does not have vector
> > > instructions GCC will just expand vector arithmetic as a loop.
> > > 
> > > Maybe we should just move this test into gcc.dg/vect where the
> > > framework
> > > automatically add options like -mlsx or -msse2?
> > > 
> > 
> > The "-mlsx" option is turned on by default after vectorization
> > testing
> > is turned on. However, the use of dg-options in some files resets
> > the
> > compilation options for testing this file. Therefore, to detect
> > vectorization on LoongArch, it is necessary to add an additional "-
> > mlsx" option.
> 
> Then it should use dg-additional-options instead of dg-options.
> 
According to your advice, I have tried the following two ways:

(1)Replace dg-options directly with dg-additional-options. The "-ansi-
pedantic-errors" set in the dg.exp file is used, and the following
problems occur:

gcc.dg/pr104992.c:ISO C90 does not support complex types.
gcc.dg/tree-ssa/scev-16.c:‘for’ loop initial declarations are only
allowed in C99 or C11 mode

Note: The ISO required by the program is inconsistent with the default
standard, resulting in an error.

(2)Move pr104992.c and scev-16.c to the gcc.dg/vect directory and
replace dg-options with dg-additional-options. The problems are as
follows:

gcc.dg/vect/scev-16.c: Because there is no test rule starting with
scev* in the vect.exp file, you need to add a new test rule or change
the file name before the test can be performed.

Summary: It is more appropriate to add the additional "-mlsx" option
directly to the pr104992.c and scev-16.c files. This supports
vectorization  testing of the LoongArch architecture and does not
modify the testing behavior of other architectures.



[PATCH] RISC-V: Report Sorry when users enable RVV in big-endian mode [PR113404]

2024-01-15 Thread Juzhe-Zhong
As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404

We have ICE when we enable RVV in big-endian mode:

during RTL pass: expand
a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant, 
at poly-int.h:588
0xab4c2c poly_int<2u, unsigned short>::to_constant() const
/repo/gcc-trunk/gcc/poly-int.h:588
0xab4de1 poly_int<2u, unsigned short>::to_constant() const
/repo/gcc-trunk/gcc/tree.h:4055
0xab4de1 default_function_arg_padding(machine_mode, tree_node const*)
/repo/gcc-trunk/gcc/targhooks.cc:844
0x12e2327 locate_and_pad_parm(machine_mode, tree_node*, int, int, int, 
tree_node*, args_size*, locate_and_pad_arg_data*)
/repo/gcc-trunk/gcc/function.cc:4061
0x12e2aca assign_parm_find_entry_rtl
/repo/gcc-trunk/gcc/function.cc:2614
0x12e2c89 assign_parms
/repo/gcc-trunk/gcc/function.cc:3693
0x12e59df expand_function_start(tree_node*)
/repo/gcc-trunk/gcc/function.cc:5152
0x112fafb execute
/repo/gcc-trunk/gcc/cfgexpand.cc:6739

Report users that we don't support RVV in big-endian mode for the following 
reasons:
1. big-endian in RISC-V is pretty rare case.
2. We didn't test RVV in big-endian and we don't have enough time to test it 
since it's stage 4 now.

Naive disallow RVV in big-endian.

Tested no regression, ok for trunk ?

PR target/113404

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_override_options_internal): Report sorry 
for RVV in big-endian mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/big_endian-1.c: New test.
* gcc.target/riscv/rvv/base/big_endian-2.c: New test.

---
 gcc/config/riscv/riscv.cc  | 5 +
 gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-1.c | 5 +
 gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-2.c | 5 +
 3 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 89caf156f03..41626fa34e4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8787,6 +8787,11 @@ riscv_override_options_internal (struct gcc_options 
*opts)
 sorry ("Current RISC-V GCC cannot support VLEN greater than 4096bit for "
   "'V' Extension");
 
+  /* FIXME: We don't support RVV in big-endian for now, we may enable RVV with
+ big-endian after finishing full coverage testing.  */
+  if (TARGET_VECTOR && TARGET_BIG_ENDIAN)
+sorry ("Current RISC-V GCC cannot support RVV in big-endian mode");
+
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits (opts);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-1.c
new file mode 100644
index 000..9eaf7ad33b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-1.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -mbig-endian -O3" } */
+
+#pragma riscv intrinsic "vector"
+vfloat32m1_t foo (vfloat32m1_t) {} // { dg-excess-errors "sorry, 
unimplemented: Current RISC-V GCC cannot support RVV in big-endian mode" }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-2.c
new file mode 100644
index 000..86cf58370bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/big_endian-2.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32x -mabi=lp64d -mbig-endian -O3" } */
+
+#pragma riscv intrinsic "vector"
+vint32m1_t foo (vint32m1_t) {} // { dg-excess-errors "sorry, unimplemented: 
Current RISC-V GCC cannot support RVV in big-endian mode" }
-- 
2.36.3



  1   2   >