[PATCH v1] RISC-V: Fix ICE for incorrect mode attr in V_F2DI_CONVERT_BRIDGE

2023-12-08 Thread pan2 . li
From: Pan Li 

The mode attr V_F2DI_CONVERT_BRIDGE converts the floating-point mode
to the widden floating-point by design. But we take (RVVM1HF "RVVM2SI") by
mistake.

This patch would like to fix it by replacing the
(RVVM1HF "RVVM2SI") to (RVVM1HF "RVVM2SF") as design.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Replace RVVM2SI to RVVM2SF
for mode attr V_F2DI_CONVERT_BRIDGE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c: New 
test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector-iterators.md   | 2 +-
 .../riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 56080ed1f5f..5f5f7b5b986 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3267,7 +3267,7 @@ (define_mode_attr v_f2di_convert [
 ])
 
 (define_mode_attr V_F2DI_CONVERT_BRIDGE [
-  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SI") (RVVMF2HF "RVVM1SF")
+  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SF") (RVVMF2HF "RVVM1SF")
   (RVVMF4HF "RVVMF2SF")
 
   (RVVM4SF "VOID") (RVVM2SF "VOID") (RVVM1SF "VOID")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
new file mode 100644
index 000..5fb61c7b44c
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
@@ -0,0 +1,7 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "--param=riscv-autovec-lmul=m4 -march=rv64gcv_zvfh_zfh 
-mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "test-math.h"
+
+TEST_UNARY_CALL_CVT (_Float16, long, __builtin_lroundf16)
-- 
2.34.1



Re: [PATCH] vr-values: Avoid ICEs on large _BitInt cast to floating point [PR112901]

2023-12-08 Thread Richard Biener
On Fri, 8 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> For casts from integers to floating point,
> simplify_float_conversion_using_ranges uses SCALAR_INT_TYPE_MODE
> and queries optabs on the optimization it wants to make.
> 
> That doesn't really work for large/huge BITINT_TYPE, those have BLKmode
> which is not scalar int mode.  Querying an optab is not useful for that
> either.
> 
> I think it is best to just skip this optimization for those bitints,
> after all, bitint lowering uses ranges already to determine minimum
> precision for bitint operands of the integer to float casts.
> 
> Bootstrapped/regrtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-12-08  Jakub Jelinek  
> 
>   PR tree-optimization/112901
>   * vr-values.cc
>   (simplify_using_ranges::simplify_float_conversion_using_ranges):
>   Return false if rhs1 has BITINT_TYPE type with BLKmode TYPE_MODE.
> 
>   * gcc.dg/bitint-51.c: New test.
> 
> --- gcc/vr-values.cc.jj   2023-09-06 17:28:24.240977329 +0200
> +++ gcc/vr-values.cc  2023-12-07 14:34:36.935121459 +0100
> @@ -1656,6 +1656,11 @@ simplify_using_ranges::simplify_float_co
>|| vr.undefined_p ())
>  return false;
>  
> +  /* The code below doesn't work for large/huge _BitInt, nor is really
> + needed for those, bitint lowering does use ranges already.  */
> +  if (TREE_CODE (TREE_TYPE (rhs1)) == BITINT_TYPE
> +  && TYPE_MODE (TREE_TYPE (rhs1)) == BLKmode)
> +return false;
>/* First check if we can use a signed type in place of an unsigned.  */
>scalar_int_mode rhs_mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs1));
>if (TYPE_UNSIGNED (TREE_TYPE (rhs1))
> --- gcc/testsuite/gcc.dg/bitint-51.c.jj   2023-12-07 15:10:20.500384705 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-51.c  2023-12-07 15:09:54.159750006 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/112901 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +
> +float f;
> +#if __BITINT_MAXWIDTH__ >= 256
> +_BitInt(256) i;
> +
> +void
> +foo (void)
> +{
> +  f *= 4 * i;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-08 Thread Ajit Agarwal
Hello Kewen:

On 07/12/23 4:31 pm, Ajit Agarwal wrote:
> Hello Kewen:
> 
> On 06/12/23 7:52 am, Kewen.Lin wrote:
>> on 2023/12/6 02:01, Ajit Agarwal wrote:
>>> Hello Kewen:
>>>
>>>
>>> On 05/12/23 7:13 pm, Ajit Agarwal wrote:
 Hello Kewen:

 On 04/12/23 7:31 am, Kewen.Lin wrote:
> Hi Ajit,
>
> on 2023/12/1 17:10, Ajit Agarwal wrote:
>> Hello Kewen:
>>
>> On 24/11/23 3:01 pm, Kewen.Lin wrote:
>>> Hi Ajit,
>>>
>>> Don't forget to CC David (CC-ed) :), some comments are inlined below.
>>>
>>> on 2023/10/8 03:04, Ajit Agarwal wrote:
 Hello All:

 This patch add new pass to replace contiguous addresses vector load 
 lxv with mma instruction
 lxvp.
>>>
>>> IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks 
>>> wrong, it's only
>>> Power10 and VSX required, these instructions should perform well 
>>> without MMA support.
>>> So one patch to separate their support from MMA seems to go first.
>>>
>>
>> I will make the changes for Power10 and VSX.
>>
 This patch addresses one regressions failure in ARM architecture.
>>>
>>> Could you explain this?  I don't see any test case for this.
>>
>> I have submitted v1 of the patch and there were regressions failure for 
>> Linaro.
>> I have fixed in version V2.
>
> OK, thanks for clarifying.  So some unexpected changes on generic code in 
> v1
> caused the failure exposed on arm.
>
>>
>>  
>>> Besides, it seems a bad idea to put this pass after reload? as register 
>>> allocation
>>> finishes, this pairing has to be restricted by the reg No. (I didn't 
>>> see any
>>> checking on the reg No. relationship for paring btw.)
>>>
>>
>> Adding before reload pass deletes one of the lxv and replaced with lxvp. 
>> This
>> fails in reload pass while freeing reg_eqivs as ira populates them and 
>> then
>
> I can't find reg_eqivs, I guessed you meant reg_equivs and moved this 
> pass right before
> pass_reload (between pass_ira and pass_reload)?  IMHO it's unexpected as 
> those two passes
> are closely correlated.  I was expecting to put it somewhere before ira.

 Yes they are tied together and moving before reload will not work.

>
>> vecload pass deletes some of insns and while freeing in reload pass as 
>> insn
>> is already deleted in vecload pass reload pass segfaults.
>>
>> Moving vecload pass before ira will not make register pairs with lxvp and
>> in ira and that will be a problem.
>
> Could you elaborate the obstacle for moving such pass before pass_ira?
>
> Basing on the status quo, the lxvp is bundled with OOmode, then I'd expect
> we can generate OOmode move (load) and use the components with unspec (or
> subreg with Peter's patch) to replace all the previous use places, it 
> looks
> doable to me.

 Moving before ira passes, we delete the offset lxv and generate lxvp and 
 replace all
 the uses, that I am doing. But the offset lxvp register generated by ira 
 are not
 register pair and generate random register and hence we cannot generate 
 lxvp.

 For example one lxv is generated with register 32 and other pair is 
 generated
 with register 45 by ira if we move it before ira passes.
>>>
>>> It generates the following.
>>> lxvp %vs32,0(%r4)
>>> xvf32ger 0,%vs34,%vs32
>>> xvf32gerpp 0,%vs34,%vs45
>>
>> What do the RTL insns for these insns look like?
>>
>> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of 
>> lxvp,
>> the current define_insn_and_split "*vsx_disassemble_pair" should be able to 
>> take
>> care of it further (eg: reg and regoff).
>>
> 
> Yes with UNSPEC_MMA_EXTRACT it generates lxvp with register pair instead of 
> random
> register by ira and reload pass. But there is an extra moves that gets 
> generated.
> 

With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally 
here is the
below code which is incorrect.

 l  lxvp %vs0,0(%r4)
xxlor %vs32,%vs0,%vs0
xvf32ger 0,%vs34,%vs32
xvf32gerpp 0,%vs34,%vs33
xxmfacc 0
stxvp %vs2,0(%r3)
stxvp %vs0,32(%r3)
blr


Here is the RTL Code:

(insn 19 4 20 2 (set (reg:OO 124 [ *ptr_4(D) ])
(mem:OO (reg/v/f:DI 122 [ ptr ]) [0 *ptr_4(D)+0 S16 A128])) -1
 (nil))
(insn 20 19 9 2 (set (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
(subreg:V16QI (reg:OO 124 [ *ptr_4(D) ]) 0)) -1
 (nil))
(insn 9 20 11 2 (set (reg:XO 119 [ _7 ])
(unspec:XO [
(reg/v:V16QI 123 [ src ])
(reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
] UNSPEC_MMA_XVF32GER)) 2195 {mma_xvf32ger}
 (expr_list:REG_DEAD (reg:OO 124 [ *ptr_4(D) ])
(nil)))
(insn 11 9 12 2 (set (reg:XO 120 [

Re: [PATCH] lower-bitint: Avoid merging non-mergeable stmt with cast and mergeable stmt [PR112902]

2023-12-08 Thread Richard Biener
On Fri, 8 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> Before bitint lowering, the IL has:
>   b.0_1 = b;
>   _2 = -b.0_1;
>   _3 = (unsigned _BitInt(512)) _2;
>   a.1_4 = a;
>   a.2_5 = (unsigned _BitInt(512)) a.1_4;
>   _6 = _3 * a.2_5;
> on the first function.  Now, gimple_lower_bitint has an optimization
> (when not -O0) that it avoids assigning underlying VAR_DECLs for certain
> SSA_NAMEs where it is possible to lower it in a single loop (or straight
> line code) rather than in multiple loops.
> So, e.g. the multiplication above uses handle_operand_addr, which can deal
> with INTEGER_CST arguments, loads but also casts, so it is fine
> not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as
> the multiplication can handle it fine.
> The more problematic case is the other multiplication operand.
> It is again a result of a (in this case narrowing) cast, so it is fine
> not to assign VAR_DECL for _3.  Normally we can merge the load (b.0_1)
> with the negation (_2) and even with the following cast (_3).  If _3
> was used in a mergeable operation like addition, subtraction, negation,
> &|^ or equality comparison, all of b.0_1, _2 and _3 could be without
> underlying VAR_DECLs.
> The problem is that the current code does that even when the cast is used
> by a non-mergeable operation, and handle_operand_addr certainly can't handle
> the mergeable operations feeding the rhs1 of the cast, for multiplication
> we don't emit any loop in which it could appear, for other operations like
> shifts or non-equality comparisons we emit loops, but either in the reverse
> direction or with unpredictable indexes (for shifts).
> So, in order to lower the above correctly, we need to have an underlying
> VAR_DECL for either _2 or _3; if we choose _2, then the load and negation
> would be done in one loop and extension handled as part of the
> multiplication, if we choose _3, then the load, negation and cast are done
> in one loop and the multiplication just uses the underlying VAR_DECL
> computed by that.
> It is far easier to do this for _3, which is what the following patch
> implements.
> It actually already had code for most of it, just it did that for widening
> casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME
> defined in some other bb, or with more than one use, etc.).
> This falls through into such code even for the narrowing or same precision
> casts, unless the cast is used in a mergeable operation.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-12-08  Jakub Jelinek  
> 
>   PR tree-optimization/112902
>   * gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing
>   or same precision cast don't set SSA_NAME_VERSION in m_names only
>   if use_stmt is mergeable_op or fall through into the check that
>   use is a store or rhs1 is not mergeable or other reasons prevent
>   merging.
> 
>   * gcc.dg/bitint-52.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2023-12-06 09:55:18.522993378 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-07 18:05:17.183692049 +0100
> @@ -5989,10 +5989,11 @@ gimple_lower_bitint (void)
>   {
> if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
> || (bitint_precision_kind (TREE_TYPE (rhs1))
> -   < bitint_prec_large)
> -   || (TYPE_PRECISION (TREE_TYPE (rhs1))
> -   >= TYPE_PRECISION (TREE_TYPE (s)))
> -   || mergeable_op (SSA_NAME_DEF_STMT (s)))
> +   < bitint_prec_large))
> + continue;
> +   if ((TYPE_PRECISION (TREE_TYPE (rhs1))
> +>= TYPE_PRECISION (TREE_TYPE (s)))
> +   && mergeable_op (use_stmt))
>   continue;
> /* Prevent merging a widening non-mergeable cast
>on result of some narrower mergeable op
> @@ -6011,7 +6012,9 @@ gimple_lower_bitint (void)
> || !mergeable_op (SSA_NAME_DEF_STMT (rhs1))
> || gimple_store_p (use_stmt))
>   continue;
> -   if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
> +   if ((TYPE_PRECISION (TREE_TYPE (rhs1))
> +< TYPE_PRECISION (TREE_TYPE (s)))
> +   && gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
>   {
> /* Another exception is if the widening cast is
>from mergeable same precision cast from something
> --- gcc/testsuite/gcc.dg/bitint-52.c.jj   2023-12-08 00:35:39.970953164 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-52.c  2023-12-08 00:35:21.983205440 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/112902 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +

Re: [PATCH v1] RISC-V: Fix ICE for incorrect mode attr in V_F2DI_CONVERT_BRIDGE

2023-12-08 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-12-08 16:00
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix ICE for incorrect mode attr in 
V_F2DI_CONVERT_BRIDGE
From: Pan Li 
 
The mode attr V_F2DI_CONVERT_BRIDGE converts the floating-point mode
to the widden floating-point by design. But we take (RVVM1HF "RVVM2SI") by
mistake.
 
This patch would like to fix it by replacing the
(RVVM1HF "RVVM2SI") to (RVVM1HF "RVVM2SF") as design.
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Replace RVVM2SI to RVVM2SF
for mode attr V_F2DI_CONVERT_BRIDGE.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/vector-iterators.md   | 2 +-
.../riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c | 7 +++
2 files changed, 8 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 56080ed1f5f..5f5f7b5b986 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3267,7 +3267,7 @@ (define_mode_attr v_f2di_convert [
])
(define_mode_attr V_F2DI_CONVERT_BRIDGE [
-  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SI") (RVVMF2HF "RVVM1SF")
+  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SF") (RVVMF2HF "RVVM1SF")
   (RVVMF4HF "RVVMF2SF")
   (RVVM4SF "VOID") (RVVM2SF "VOID") (RVVM1SF "VOID")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
new file mode 100644
index 000..5fb61c7b44c
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
@@ -0,0 +1,7 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "--param=riscv-autovec-lmul=m4 -march=rv64gcv_zvfh_zfh 
-mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "test-math.h"
+
+TEST_UNARY_CALL_CVT (_Float16, long, __builtin_lroundf16)
-- 
2.34.1
 
 


Re:[pushed] [PATCH] LoongArch: Add support for xorsign.

2023-12-08 Thread chenglulu

Pushed to r14-6308.

在 2023/11/17 下午5:00, Jiahao Xu 写道:

This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle 
xorsign.

On LoongArch64, floating-point registers and vector registers share the same 
register,
so this patch also allows conversion between LSX vector mode and scalar fp mode 
to
avoid unnecessary instruction generation.

gcc/ChangeLog:

* config/loongarch/lasx.md (xorsign3): New expander.
* config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow
conversion between LSX vector mode and scalar fp mode.
* config/loongarch/loongarch.md (@xorsign3): New expander.
* config/loongarch/lsx.md (@xorsign3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test.
* gcc.target/loongarch/xorsign-run.c: New test.
* gcc.target/loongarch/xorsign.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f0f2dd08dd8..5a4be588fb4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1120,10 +1120,10 @@ (define_insn "umod3"
 (set_attr "mode" "")])
  
  (define_insn "xor3"

-  [(set (match_operand:ILASX 0 "register_operand" "=f,f,f")
-   (xor:ILASX
- (match_operand:ILASX 1 "register_operand" "f,f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_val_operand" 
"f,YC,Urv8")))]
+  [(set (match_operand:LASX 0 "register_operand" "=f,f,f")
+   (xor:LASX
+ (match_operand:LASX 1 "register_operand" "f,f,f")
+ (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
"ISA_HAS_LASX"
"@
 xvxor.v\t%u0,%u1,%u2
@@ -3147,6 +3147,20 @@ (define_expand "copysign3"
operands[5] = gen_reg_rtx (mode);
  })
  
+(define_expand "xorsign3"

+  [(set (match_dup 4)
+(and:FLASX (match_dup 3)
+(match_operand:FLASX 2 "register_operand")))
+   (set (match_operand:FLASX 0 "register_operand")
+(xor:FLASX (match_dup 4)
+ (match_operand:FLASX 1 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  operands[3] = loongarch_build_signbit_mask (mode, 1, 0);
+
+  operands[4] = gen_reg_rtx (mode);
+})
+
  
  (define_insn "absv4df2"

[(set (match_operand:V4DF 0 "register_operand" "=f")
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..e4cdbcf0f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6687,6 +6687,11 @@ loongarch_can_change_mode_class (machine_mode from, 
machine_mode to,
if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to))
  return true;
  
+  /* Allow conversion between LSX vector mode and scalar fp mode. */

+  if ((LSX_SUPPORTED_MODE_P (from) && SCALAR_FLOAT_MODE_P (to))
+  || ((SCALAR_FLOAT_MODE_P (from) && LSX_SUPPORTED_MODE_P (to
+return true;
+
return !reg_classes_intersect_p (FP_REGS, rclass);
  }
  
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md

index 22814a3679c..117c0924a85 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1146,6 +1146,23 @@ (define_insn "copysign3"
"fcopysign.\t%0,%1,%2"
[(set_attr "type" "fcopysign")
 (set_attr "mode" "")])
+
+(define_expand "@xorsign3"
+  [(match_operand:ANYF 0 "register_operand")
+   (match_operand:ANYF 1 "register_operand")
+   (match_operand:ANYF 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  machine_mode lsx_mode
+= mode == SFmode ? V4SFmode : V2DFmode;
+  rtx tmp = gen_reg_rtx (lsx_mode);
+  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
+  emit_move_insn (operands[0],
+  lowpart_subreg (mode, tmp, lsx_mode));
+  DONE;
+})
  
  ;;
  ;;  
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 55c7d79a030..40500363dc0 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1027,10 +1027,10 @@ (define_insn "umod3"
 (set_attr "mode" "")])
  
  (define_insn "xor3"

-  [(set (match_operand:ILSX 0 "register_operand" "=f,f,f")
-   (xor:ILSX
- (match_operand:ILSX 1 "register_operand" "f,f,f")
- (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
+  [(set (match_operand:LSX 0 "register_operand" "=f,f,f")
+   (xor:LSX
+ (match_operand:LSX 1 "register_operand" "f,f,f")
+ (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
"ISA_HAS_LSX"
"@
 vxor.v\t%w0,%w1,%w2
@@ -2884,6 +2884,21 @@ (define_expand "copysign3"
operands[5] = gen_reg_r

[PATCH] Shrink out-of-SSA dump

2023-12-08 Thread Richard Biener
The following removes the second GIMPLE function dump after
remove_ssa_form which used to rewrite the IL with the coalescing
result but doesn't do so since a long time now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-outof-ssa.cc (rewrite_out_of_ssa): Dump GIMPLE once only,
after final IL adjustments.
---
 gcc/tree-outof-ssa.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index 767623ab8ea..5dc58f1b808 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -1352,8 +1352,5 @@ rewrite_out_of_ssa (struct ssaexpand *sa)
 
   remove_ssa_form (flag_tree_ter, sa);
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
-gimple_dump_cfg (dump_file, dump_flags & ~TDF_DETAILS);
-
   return 0;
 }
-- 
2.35.3


RE: [PATCH v1] RISC-V: Fix ICE for incorrect mode attr in V_F2DI_CONVERT_BRIDGE

2023-12-08 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, December 8, 2023 4:03 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Fix ICE for incorrect mode attr in 
V_F2DI_CONVERT_BRIDGE

LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-12-08 16:00
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Fix ICE for incorrect mode attr in 
V_F2DI_CONVERT_BRIDGE
From: Pan Li mailto:pan2...@intel.com>>

The mode attr V_F2DI_CONVERT_BRIDGE converts the floating-point mode
to the widden floating-point by design. But we take (RVVM1HF "RVVM2SI") by
mistake.

This patch would like to fix it by replacing the
(RVVM1HF "RVVM2SI") to (RVVM1HF "RVVM2SF") as design.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Replace RVVM2SI to RVVM2SF
for mode attr V_F2DI_CONVERT_BRIDGE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/vector-iterators.md   | 2 +-
.../riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c | 7 +++
2 files changed, 8 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 56080ed1f5f..5f5f7b5b986 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3267,7 +3267,7 @@ (define_mode_attr v_f2di_convert [
])
(define_mode_attr V_F2DI_CONVERT_BRIDGE [
-  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SI") (RVVMF2HF "RVVM1SF")
+  (RVVM2HF "RVVM4SF") (RVVM1HF "RVVM2SF") (RVVMF2HF "RVVM1SF")
   (RVVMF4HF "RVVMF2SF")
   (RVVM4SF "VOID") (RVVM2SF "VOID") (RVVM1SF "VOID")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
new file mode 100644
index 000..5fb61c7b44c
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c
@@ -0,0 +1,7 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "--param=riscv-autovec-lmul=m4 -march=rv64gcv_zvfh_zfh 
-mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "test-math.h"
+
+TEST_UNARY_CALL_CVT (_Float16, long, __builtin_lroundf16)
--
2.34.1




Re: [PATCH] testsuite: add missing dg-require ifunc in pr105554.c

2023-12-08 Thread Marc Poulhiès


Jakub Jelinek  writes:

> On Thu, Dec 07, 2023 at 05:25:39PM +0100, Marc Poulhiès wrote:
>> The 'target_clones' attribute depends on the ifunc support.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/i386/pr105554.c: Add dg-require ifunc.
>> ---
>> Tested on x86_64-linux and x86_64-elf.
>>
>> Ok for master?
>>
>>  gcc/testsuite/gcc.target/i386/pr105554.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr105554.c 
>> b/gcc/testsuite/gcc.target/i386/pr105554.c
>> index e9ef494270a..420987e5df8 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr105554.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr105554.c
>> @@ -2,6 +2,7 @@
>>  /* { dg-do compile } */
>>  /* { dg-require-ifunc "" } */
>>  /* { dg-options "-O2 -Wno-psabi -mno-sse3" } */
>> +/* { dg-require-ifunc "" } */
>
> That is 2 lines above this already...

Oh right, sorry about that. I didn't catch this when rebasing.

Marc


Re: [pushed][PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-08 Thread chenglulu

Pushed to r14-6311...r14-6315.

在 2023/12/6 下午3:04, Jiahao Xu 写道:

LoongArch V1.1 adds support for approximate instructions, which are utilized 
along with additional
Newton-Raphson steps implement single precision floating-point division, square 
root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

Jiahao Xu (5):
   LoongArch: Add support for LoongArch V1.1 approximate instructions.
   LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
 instructions.
   LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
   LoongArch: New options -mrecip and -mrecip= with ffast-math.
   LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
 when -mrecip is enabled.

  gcc/config/loongarch/genopts/isa-evolution.in |   1 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
  gcc/config/loongarch/larchintrin.h|  38 +++
  gcc/config/loongarch/lasx.md  |  89 ++-
  gcc/config/loongarch/lasxintrin.h |  34 +++
  gcc/config/loongarch/loongarch-builtins.cc|  66 +
  gcc/config/loongarch/loongarch-c.cc   |   3 +
  gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
  gcc/config/loongarch/loongarch-def.cc |   3 +-
  gcc/config/loongarch/loongarch-protos.h   |   2 +
  gcc/config/loongarch/loongarch-str.h  |   1 +
  gcc/config/loongarch/loongarch.cc | 252 +-
  gcc/config/loongarch/loongarch.h  |  18 ++
  gcc/config/loongarch/loongarch.md | 104 ++--
  gcc/config/loongarch/loongarch.opt|  15 ++
  gcc/config/loongarch/lsx.md   |  89 ++-
  gcc/config/loongarch/lsxintrin.h  |  34 +++
  gcc/config/loongarch/predicates.md|   8 +
  gcc/doc/extend.texi   |  35 +++
  gcc/doc/invoke.texi   |  54 
  gcc/testsuite/gcc.target/loongarch/divf.c |  10 +
  .../loongarch/larch-frecipe-builtin.c |  28 ++
  .../gcc.target/loongarch/recip-divf.c |   9 +
  .../gcc.target/loongarch/recip-sqrtf.c|  23 ++
  gcc/testsuite/gcc.target/loongarch/sqrtf.c|  24 ++
  .../loongarch/vector/lasx/lasx-divf.c |  13 +
  .../vector/lasx/lasx-frecipe-builtin.c|  30 +++
  .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
  .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
  .../loongarch/vector/lasx/lasx-recip.c|  24 ++
  .../loongarch/vector/lasx/lasx-rsqrt.c|  26 ++
  .../loongarch/vector/lasx/lasx-sqrtf.c|  29 ++
  .../loongarch/vector/lsx/lsx-divf.c   |  13 +
  .../vector/lsx/lsx-frecipe-builtin.c  |  30 +++
  .../loongarch/vector/lsx/lsx-recip-divf.c |  12 +
  .../loongarch/vector/lsx/lsx-recip-sqrtf.c|  28 ++
  .../loongarch/vector/lsx/lsx-recip.c  |  24 ++
  .../loongarch/vector/lsx/lsx-rsqrt.c  |  26 ++
  .../loongarch/vector/lsx/lsx-sqrtf.c  |  29 ++
  39 files changed, 1234 insertions(+), 42 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c





Re:[pushed] [PATCH] LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611]

2023-12-08 Thread chenglulu

Pushed to r14-6316.

在 2023/11/29 上午11:16, Jiahao Xu 写道:

For [x]vshuf instructions, if the index value in the selector exceeds 63, it 
triggers
undefined behavior on LA464, but not on LA664. To ensure compatibility of these 
two
tests on both LA464 and LA664, we have modified both tests to ensure that the 
index
value in the selector does not exceed 63.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Sure index less 
than 64.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Ditto.

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
index 641ea2315ff..03c479a085c 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
@@ -44,9 +44,9 @@ main ()
*((unsigned long *)&__m256i_op1[1]) = 0xfefefefe;
*((unsigned long *)&__m256i_op1[0]) = 0xfefefefe;
*((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0xfff8fff8;
+  *((unsigned long *)&__m256i_op2[2]) = 0x3f3f3f383f3f3f38;
*((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0xfff8fc00;
+  *((unsigned long *)&__m256i_op2[0]) = 0x3f3f3f383c00;
*((unsigned long *)&__m256i_result[3]) = 0xfafafafafafafafa;
*((unsigned long *)&__m256i_result[2]) = 0x;
*((unsigned long *)&__m256i_result[1]) = 0xfefefefefefefefe;
@@ -138,33 +138,14 @@ main ()
*((unsigned long *)&__m256i_op1[2]) = 0x;
*((unsigned long *)&__m256i_op1[1]) = 0x;
*((unsigned long *)&__m256i_op1[0]) = 0x;
-  *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x;
-  *((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0x;
+  *((unsigned long *)&__m256i_op2[3]) = 0x;
+  *((unsigned long *)&__m256i_op2[2]) = 0x;
+  *((unsigned long *)&__m256i_op2[1]) = 0x;
+  *((unsigned long *)&__m256i_op2[0]) = 0x;
*((unsigned long *)&__m256i_result[3]) = 0x;
-  *((unsigned long *)&__m256i_result[2]) = 0x;
+  *((unsigned long *)&__m256i_result[2]) = 0x;
*((unsigned long *)&__m256i_result[1]) = 0x;
-  *((unsigned long *)&__m256i_result[0]) = 0x;
-  __m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
-  ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
-
-  *((unsigned long *)&__m256i_op0[3]) = 0x;
-  *((unsigned long *)&__m256i_op0[2]) = 0x;
-  *((unsigned long *)&__m256i_op0[1]) = 0x;
-  *((unsigned long *)&__m256i_op0[0]) = 0x;
-  *((unsigned long *)&__m256i_op1[3]) = 0x;
-  *((unsigned long *)&__m256i_op1[2]) = 0x;
-  *((unsigned long *)&__m256i_op1[1]) = 0x;
-  *((unsigned long *)&__m256i_op1[0]) = 0x;
-  *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x;
-  *((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0x;
-  *((unsigned long *)&__m256i_result[3]) = 0x;
-  *((unsigned long *)&__m256i_result[2]) = 0x;
-  *((unsigned long *)&__m256i_result[1]) = 0x;
-  *((unsigned long *)&__m256i_result[0]) = 0x;
+  *((unsigned long *)&__m256i_result[0]) = 0x;
__m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
@@ -177,7 +158,7 @@ main ()

*((unsigned long *)&__m256i_op1[1]) = 0x;
*((unsigned long *)&__m256i_op1[0]) = 0x;
*((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x00077fff;
+  *((unsigned long *)&__m256i_op2[2]) = 0x00032f1f;
*((unsigned long *)&__m256i_op2[1]) = 0x;
*((unsigned long *)&__m256i_op2[0]) = 0x;
*((unsigned long *)&__m256i_result[3]) = 0x;
@@ -187,9 +168,9 @@ main ()
__m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
-  *((unsigned long *)&__m256i_op0[3]) = 0xfefe;

-  *((unsigned long *)&__m256i_op0[2]) = 0x0101;
-  *((unsigned long *)&__m256i_op0[1]) = 0xfefe;
+  *((unsigned long *)&__m256i_op0[3]) = 0x0011001100110011;
+  *((unsigned long *)&__m256i_op0[2]) = 0x0001;
+  *((unsigned long *)&__m256i_op0[1]) = 0x0011001100110011;
*((unsigned long *)&__m256i_o

Re:[pushed] [PATCH] LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly.

2023-12-08 Thread chenglulu

Pushed to r14-6317.

在 2023/11/29 上午11:18, Jiahao Xu 写道:

loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are 
not supported
in gcc, it causes an ICE:

ice.c:55:1: error: unrecognizable insn:
55 | }
   | ^
(insn 63 62 64 8 (set (reg:V4DI 278)
 (subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1
  (nil))
during RTL pass: vregs
ice.c:55:1: internal compiler error: in extract_insn, at recog.cc:2804

Last time, Ruoyao has fixed a similar ICE:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636156.html

This patch fixes ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG as 
much as possible
to avoid the same ice happening again.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const): 
Use
simplify_gen_subreg instead of gen_rtx_SUBREG.
(loongarch_expand_vec_perm_const_2): Ditto.
(loongarch_expand_vec_cond_expr): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr112476-3.c: New test.
* gcc.target/loongarch/pr112476-4.c: New test.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e8a2584ac97..69fcb0aa6fb 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8799,13 +8799,13 @@ loongarch_try_expand_lsx_vshuf_const (struct 
expand_vec_perm_d *d)
if (d->vmode == E_V2DFmode)
{
  sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
- tmp = gen_rtx_SUBREG (E_V2DImode, d->target, 0);
+ tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
  emit_move_insn (tmp, sel);
}
else if (d->vmode == E_V4SFmode)
{
  sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
- tmp = gen_rtx_SUBREG (E_V4SImode, d->target, 0);
+ tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
  emit_move_insn (tmp, sel);
}
else
@@ -9584,8 +9584,8 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  /* Adjust op1 for selecting correct value in high 128bit of target
 register.
 op1: E_V4DImode, { 4, 5, 6, 7 } -> { 2, 3, 4, 5 }.  */
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, d->op0, d->vmode, 0);
  emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1,
  conv_op0, GEN_INT (0x21)));
  
@@ -9614,8 +9614,8 @@ loongarch_expand_vec_perm_const_2 (struct expand_vec_perm_d *d)

  emit_move_insn (op0_alt, d->op0);
  
  	  /* Generate subreg for fitting into insn gen function.  */

- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, op0_alt, d->vmode, 0);
  
  	  /* Adjust op value in temp register.

 op0 = {0,1,2,3}, op1 = {4,5,0,1}  */
@@ -9661,9 +9661,10 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  emit_move_insn (op1_alt, d->op1);
  emit_move_insn (op0_alt, d->op0);
  
-	  rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);

- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0);
- rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, op0_alt, d->vmode, 0);
+ rtx conv_target = simplify_gen_subreg (E_V4DImode, d->target,
+d->vmode, 0);
  
  	  emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1,

  conv_op0, GEN_INT (0x02)));
@@ -9695,9 +9696,10 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
 Selector sample: E_V4DImode, { 0, 1, 4 ,5 }  */
if (!d->testing_p)
{
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
- rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, d->op1, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, d->op0, d->vmode, 0);
+ rtx conv_target = simplify_gen_subreg (E_V4DImode, d->target,
+d->vmode, 0);
  
  	  /* We can achieve the expectation by using sinple xvpermi.q insn.  */

  emit_move_insn (conv_target, conv_op1);
@@ -9722,8 +9724,8 @@ loongarch_expand_vec_perm_const_2 (struct 
ex

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-08 Thread Tamar Christina
> --param vect-partial-vector-usage=2 would, no?
> 
I.. didn't even know it went to 2!

> > In principal I suppose I could mask the individual stmts, that should 
> > handle the
> future case when
> > This is relaxed to supposed non-fix length buffers?
> 
> Well, it looks wrong - either put in an assert that we start with a
> single stmt or assert !masked_loop_p instead?  Better ICE than
> generate wrong code.
> 
> That said, I think you need to apply the masking on the original
> stmts[], before reducing them, no?

Yeah, I've done so now.  For simplicity I've just kept the final masking always 
as well
and just leave it up to the optimizers to drop it when it's superfluous.

Simple testcase:

#ifndef N
#define N 837
#endif
float vect_a[N];
unsigned vect_b[N];

unsigned test4(double x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;

 }
 return ret;
}

Looks good now. After this one there's only one patch left, the dependency 
analysis.
I'm almost done with the cleanup/respin, but want to take the weekend to double 
check and will post it first thing Monday morning.

Did you want to see the testsuite changes as well again? I've basically just 
added the right dg-requires-effective and add-options etc.

Thanks for all the reviews!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.


--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
   gcc_assert (!vectype
+ || is_a  (pattern_stmt)
  || (VECTOR_BOOLEAN_TYPE_P (vectype)
  == vect_use_mask_type_p (orig_stmt_info)));
   STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
true if bool VAR can and should be optimized that way.  Assume it shouldn't
in case it's a result of a comparison which can be directly vectorized into
a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
+   codegen associated with the boolean condition.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
+   bool analyze_only)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !analyze_only)
 return false;
+  else if (!def_stmt_info)
+/* If we're a only analyzing we won't be codegen-ing the statements and are
+   only after if the types match.  In that case we can accept loop 
invariant
+   values.  */
+def_stmt = dyn_cast  (SSA_NAME_DEF_STMT (var));
+  else
+def_stmt = dyn_cast  (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast  (def_stmt_info->stmt);
   if (!def_stmt)
 return false;
 
@@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, 
hash_set &stmts)
   switch (rhs_code)
 {
 case SSA_NAME:
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
return false;
   break;
 
 CASE_CONVERT:
   if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
return false;
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
return false;
   break;
 
 case BIT_NOT_EXPR:
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
return false;
   break;
 
 case BIT_AND_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
-  if (! check_bool_pattern (rhs1, vinfo, stmts)
- || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
+ || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+  analyze_only))
 

Re: [PATCH] testsuite: require avx_runtime for vect-simd-clone-17f

2023-12-08 Thread Marc Poulhiès


Jakub Jelinek  writes:

> This looks wrong, then it won't be tested at all on non-x86 targets.

Right, I'll look for a better fix.

Should I revert r14-6272 that has the same issue of disabling the
modified tests on non-x86?

Marc


Re: [PATCH] testsuite: require avx_runtime for some tests

2023-12-08 Thread Thomas Schwinge
Hi Marc!

On 2023-11-06T11:59:18+0100, Marc Poulhiès  wrote:
> These 3 tests fails parsing the 'vect' dump when not using -mavx. Make
> the dependency explicit.

But that means that the tests are now enabled *only* for
effective-target 'avx_runtime', so, for example, on GCN I see:

-PASS: gcc.dg/vect/vect-ifcvt-18.c (test for excess errors)
-PASS: gcc.dg/vect/vect-ifcvt-18.c execution test
+UNSUPPORTED: gcc.dg/vect/vect-ifcvt-18.c

-PASS: gcc.dg/vect/vect-simd-clone-16f.c (test for excess errors)
-PASS: gcc.dg/vect/vect-simd-clone-16f.c execution test
-PASS: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect 
"[\\n\\r] [^\\n]* = foo\\.simdclone" 2
+UNSUPPORTED: gcc.dg/vect/vect-simd-clone-16f.c

-PASS: gcc.dg/vect/vect-simd-clone-18f.c (test for excess errors)
-PASS: gcc.dg/vect/vect-simd-clone-18f.c execution test
-PASS: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect 
"[\\n\\r] [^\\n]* = foo\\.simdclone" 2
+UNSUPPORTED: gcc.dg/vect/vect-simd-clone-18f.c

..., which was not the intention, I suppose?


Grüße
 Thomas


> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-ifcvt-18.c: Add dep on avx_runtime.
>   * gcc.dg/vect/vect-simd-clone-16f.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> ---
> Tested on x86_64-linux and x86_64-elf.
>
> Ok for master?
>
>  gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c   | 3 ++-
>  gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c | 4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c | 4 ++--
>  3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c 
> b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> index c1d3c27d819..607194496e9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> @@ -1,6 +1,7 @@
>  /* { dg-require-effective-target vect_condition } */
>  /* { dg-require-effective-target vect_float } */
> -/* { dg-additional-options "-Ofast -mavx" { target avx_runtime } } */
> +/* { dg-require-effective-target avx_runtime } */
> +/* { dg-additional-options "-Ofast -mavx" } */
>
>
>  int A0[4] = {36,39,42,45};
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c 
> b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
> index 7cd29e894d0..c6615dc626d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target vect_simd_clones } */
> -/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } 
> */
> -/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0 
> -mavx" } */
> +/* { dg-require-effective-target avx_runtime } */
>  /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* 
> } && { ! lp64 } } } } */
>
>  #define TYPE __INT64_TYPE__
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c 
> b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
> index 4dd51381d73..787b918d0c4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target vect_simd_clones } */
> -/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } 
> */
> -/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0 
> -mavx" } */
> +/* { dg-require-effective-target  avx_runtime } */
>  /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* 
> } && { ! lp64 } } } } */
>
>  #define TYPE __INT64_TYPE__
> --
> 2.42.0
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] testsuite: require avx_runtime for some tests

2023-12-08 Thread Marc Poulhiès


Thomas Schwinge  writes:

> Hi Marc!
>
> On 2023-11-06T11:59:18+0100, Marc Poulhiès  wrote:
>> These 3 tests fails parsing the 'vect' dump when not using -mavx. Make
>> the dependency explicit.
>
> But that means that the tests are now enabled *only* for
> effective-target 'avx_runtime', so, for example, on GCN I see:
>
> -PASS: gcc.dg/vect/vect-ifcvt-18.c (test for excess errors)
> -PASS: gcc.dg/vect/vect-ifcvt-18.c execution test
> +UNSUPPORTED: gcc.dg/vect/vect-ifcvt-18.c
>
> -PASS: gcc.dg/vect/vect-simd-clone-16f.c (test for excess errors)
> -PASS: gcc.dg/vect/vect-simd-clone-16f.c execution test
> -PASS: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect 
> "[\\n\\r] [^\\n]* = foo\\.simdclone" 2
> +UNSUPPORTED: gcc.dg/vect/vect-simd-clone-16f.c
>
> -PASS: gcc.dg/vect/vect-simd-clone-18f.c (test for excess errors)
> -PASS: gcc.dg/vect/vect-simd-clone-18f.c execution test
> -PASS: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect 
> "[\\n\\r] [^\\n]* = foo\\.simdclone" 2
> +UNSUPPORTED: gcc.dg/vect/vect-simd-clone-18f.c
>
> ..., which was not the intention, I suppose?

Hello Thomas,

No, that was an oversight, Jakub also spotted that in another patch.
I'll revert it now.

Sorry for the inconvenience,
Marc


Re: [V2 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-08 Thread Richard Biener
On Thu, Nov 16, 2023 at 11:49 AM liuhongt  wrote:
>
> Update in V2:
> 1) Add some comments before the pattern.
> 2) Remove ? from view_convert.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> When I'm working on PR112443, I notice there's some misoptimizations:
> after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
> fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
> too complicated, so I think maybe we should hanlde it in the gimple
> level.
>
> The dump is like
>
>   _1 = c_3(D) >= { 0, 0, 0, 0 };
>   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>   _7 = VIEW_CONVERT_EXPR(_2);
>   _8 = VIEW_CONVERT_EXPR(b_6(D));
>   _9 = VIEW_CONVERT_EXPR(a_5(D));
>   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _11 = VEC_COND_EXPR <_10, _8, _9>;
>
> It can be optimized to
>
>   _1 = c_2(D) >= { 0, 0, 0, 0 };
>   _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;
>
> since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
> be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
> of the second VEC_COND_EXPR is less equal to the first one.
> The patch add a gimple pattern to handle that.
>
> gcc/ChangeLog:
>
> * match.pd (VCE (a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE ((a
> cmp b) ? (VCE:c) : (VCE:d))): New gimple simplication.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vl-blendv-3.c: New test.
> * gcc.target/i386/blendv-3.c: New test.
> ---
>  gcc/match.pd  | 22 +
>  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
>  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
>  3 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index dbc811b2b38..2a69622a300 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5170,6 +5170,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
>(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
>
> +/*  ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d is just
> +(VCE ((a cmp b) ? (VCE c) : (VCE d))) when TYPE_PRECISION of the
> +component type of the outer vec_cond is greater equal the inner one.  */
> +(for cmp (simple_comparison)
> + (simplify
> +  (vec_cond
> +(lt (view_convert@5 (vec_cond@6 (cmp@4 @0 @1)
> +   integer_all_onesp
> +   integer_zerop))
> + integer_zerop) @2 @3)
> +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> +   && !TYPE_UNSIGNED (TREE_TYPE (@5))
> +   && VECTOR_TYPE_P (TREE_TYPE (@6))
> +   && VECTOR_TYPE_P (type)

since you are looking at TYPE_PRECISION below you want
VECTOR_INTIEGER_TYPE_P here as well?  The alternative
would be to compare TYPE_SIZE.

Some of the checks feel redundant but are probably good for
documentation purposes.

OK with using VECTOR_INTIEGER_TYPE_P

Thanks,
Richard.

> +   && (TYPE_PRECISION (TREE_TYPE (type))
> + <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
> +   && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (@6)))
> +   (with { tree vtype = TREE_TYPE (@6);}
> + (view_convert:type
> +   (vec_cond @4 (view_convert:vtype @2) (view_convert:vtype @3)))
> +
>  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
>  (simplify
>   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> new file mode 100644
> index 000..2777e72ab5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include "blendv-3.c"
> diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/blendv-3.c
> new file mode 100644
> index 000..fa0fb067a73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include 
> +
> +__m256i
> +foo (__m256i a, __m256i b, __m256i c)
> +{
> +  return _mm256_blendv_epi8 (a, b, ~c < 0);
> +}
> +
> +__m256d
> +foo1 (__m256d a, __m256d b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_pd (a, b, (__m256d)d);
> +}
> +
> +__m256
> +foo2 (__m256 a, __m256 b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_ps (a, b, (__m256)d);

Re: [PATCH] testsuite: require avx_runtime for vect-simd-clone-17f

2023-12-08 Thread Marc Poulhiès


Marc Poulhiès  writes:

> Should I revert r14-6272 that has the same issue of disabling the
> modified tests on non-x86?

I've reverted the r14-6272.

Marc


Re: [PATCH] strub: skip emutls after strubm errors

2023-12-08 Thread Thomas Schwinge
Hi Alexandre!

On 2023-12-07T14:52:19-0300, Alexandre Oliva  wrote:
> On Dec  7, 2023, Thomas Schwinge  wrote:
>> during IPA pass: emutls
>> [...]/source-gcc/gcc/testsuite/c-c++-common/strub-unsupported-3.c:18:1: 
>> internal compiler error: in verify_curr_properties, at passes.cc:2198
>
> Aah, this smells a lot like the issue that François-Xavier reported,
> that the following patch is expected to fix.  I'm still regstrapping it
> on x86_64-linux-gnu, after checking that it addressed the symptom on a
> cross compiler to the target for which it had originally been reported.
> Ok to install, once you confirm that it cures these ICEs?

Yes, GCC/nvptx ICEs gone with that, thanks!


Grüße
 Thomas


> strub: skip emutls after strubm errors
>
> The emutls pass requires PROP_ssa, but if the strubm pass (or any
> other pre-SSA pass) issues errors, all of the build_ssa_passes are
> skipped, so the property is not set, but emutls still attempts to run,
> on targets that use it, despite earlier errors, so it hits the
> unsatisfied requirement.
>
> Adjust emutls to be skipped in case of earlier errors.
>
>
> for  gcc/ChangeLog
>
>   * tree-emutls.cc: Include diagnostic-core.h.
>   (pass_ipa_lower_emutls::gate): Skip if errors were seen.
> ---
>  gcc/tree-emutls.cc |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-emutls.cc b/gcc/tree-emutls.cc
> index 5dca5a8291356..38de202717a1a 100644
> --- a/gcc/tree-emutls.cc
> +++ b/gcc/tree-emutls.cc
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  #include "tree-iterator.h"
>  #include "gimplify.h"
> +#include "diagnostic-core.h" /* for seen_error */
>
>  /* Whenever a target does not support thread-local storage (TLS) natively,
> we can emulate it with some run-time support in libgcc.  This will in
> @@ -841,7 +842,7 @@ public:
>bool gate (function *) final override
>  {
>/* If the target supports TLS natively, we need do nothing here.  */
> -  return !targetm.have_tls;
> +  return !targetm.have_tls && !seen_error ();
>  }
>
>unsigned int execute (function *) final override
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-08 Thread Kewen.Lin
Hi Ajit,

on 2023/12/8 16:01, Ajit Agarwal wrote:
> Hello Kewen:
> 
> On 07/12/23 4:31 pm, Ajit Agarwal wrote:
>> Hello Kewen:
>>
>> On 06/12/23 7:52 am, Kewen.Lin wrote:
>>> on 2023/12/6 02:01, Ajit Agarwal wrote:
 Hello Kewen:


 On 05/12/23 7:13 pm, Ajit Agarwal wrote:
> Hello Kewen:
>
> On 04/12/23 7:31 am, Kewen.Lin wrote:
>> Hi Ajit,
>>
>> on 2023/12/1 17:10, Ajit Agarwal wrote:
>>> Hello Kewen:
>>>
>>> On 24/11/23 3:01 pm, Kewen.Lin wrote:
 Hi Ajit,

 Don't forget to CC David (CC-ed) :), some comments are inlined below.

 on 2023/10/8 03:04, Ajit Agarwal wrote:
> Hello All:
>
> This patch add new pass to replace contiguous addresses vector load 
> lxv with mma instruction
> lxvp.

 IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks 
 wrong, it's only
 Power10 and VSX required, these instructions should perform well 
 without MMA support.
 So one patch to separate their support from MMA seems to go first.

>>>
>>> I will make the changes for Power10 and VSX.
>>>
> This patch addresses one regressions failure in ARM architecture.

 Could you explain this?  I don't see any test case for this.
>>>
>>> I have submitted v1 of the patch and there were regressions failure for 
>>> Linaro.
>>> I have fixed in version V2.
>>
>> OK, thanks for clarifying.  So some unexpected changes on generic code 
>> in v1
>> caused the failure exposed on arm.
>>
>>>
>>>  
 Besides, it seems a bad idea to put this pass after reload? as 
 register allocation
 finishes, this pairing has to be restricted by the reg No. (I didn't 
 see any
 checking on the reg No. relationship for paring btw.)

>>>
>>> Adding before reload pass deletes one of the lxv and replaced with 
>>> lxvp. This
>>> fails in reload pass while freeing reg_eqivs as ira populates them and 
>>> then
>>
>> I can't find reg_eqivs, I guessed you meant reg_equivs and moved this 
>> pass right before
>> pass_reload (between pass_ira and pass_reload)?  IMHO it's unexpected as 
>> those two passes
>> are closely correlated.  I was expecting to put it somewhere before ira.
>
> Yes they are tied together and moving before reload will not work.
>
>>
>>> vecload pass deletes some of insns and while freeing in reload pass as 
>>> insn
>>> is already deleted in vecload pass reload pass segfaults.
>>>
>>> Moving vecload pass before ira will not make register pairs with lxvp 
>>> and
>>> in ira and that will be a problem.
>>
>> Could you elaborate the obstacle for moving such pass before pass_ira?
>>
>> Basing on the status quo, the lxvp is bundled with OOmode, then I'd 
>> expect
>> we can generate OOmode move (load) and use the components with unspec (or
>> subreg with Peter's patch) to replace all the previous use places, it 
>> looks
>> doable to me.
>
> Moving before ira passes, we delete the offset lxv and generate lxvp and 
> replace all
> the uses, that I am doing. But the offset lxvp register generated by ira 
> are not
> register pair and generate random register and hence we cannot generate 
> lxvp.
>
> For example one lxv is generated with register 32 and other pair is 
> generated
> with register 45 by ira if we move it before ira passes.

 It generates the following.
lxvp %vs32,0(%r4)
 xvf32ger 0,%vs34,%vs32
 xvf32gerpp 0,%vs34,%vs45
>>>
>>> What do the RTL insns for these insns look like?
>>>
>>> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of 
>>> lxvp,
>>> the current define_insn_and_split "*vsx_disassemble_pair" should be able to 
>>> take
>>> care of it further (eg: reg and regoff).
>>>
>>
>> Yes with UNSPEC_MMA_EXTRACT it generates lxvp with register pair instead of 
>> random
>> register by ira and reload pass. But there is an extra moves that gets 
>> generated.
>>
> 
> With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally 
> here is the
> below code which is incorrect.> 
>  llxvp %vs0,0(%r4)
> xxlor %vs32,%vs0,%vs0
> xvf32ger 0,%vs34,%vs32
> xvf32gerpp 0,%vs34,%vs33
> xxmfacc 0
> stxvp %vs2,0(%r3)
> stxvp %vs0,32(%r3)
> blr
> 
> 
> Here is the RTL Code:
> 
> (insn 19 4 20 2 (set (reg:OO 124 [ *ptr_4(D) ])
> (mem:OO (reg/v/f:DI 122 [ ptr ]) [0 *ptr_4(D)+0 S16 A128])) -1
>  (nil))
> (insn 20 19 9 2 (set (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
> (subreg:V16QI (reg:OO 124 [ *ptr_4(D) ]) 0)) -1
>  (nil))
> (insn 9 20 11 2 (set (reg:XO 119 [ _7 ])
> (unspec:XO [
> (reg/v:V1

[PATCH v4] LoongArch: Fix eh_return epilogue for normal returns

2023-12-08 Thread Yang Yujie
On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
and restored in the function prologue and epilogue if the given function calls
__builtin_eh_return.  This causes the return value to be overwritten on normal
return paths and breaks a rare case of libgcc's _Unwind_RaiseException.

gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/eh_return-normal-return.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  2 +-
 gcc/config/loongarch/loongarch.cc | 41 ---
 gcc/config/loongarch/loongarch.md | 18 +++-
 .../loongarch/eh_return-normal-return.c   | 32 +++
 4 files changed, 76 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/eh_return-normal-return.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..af20b5d7132 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -60,7 +60,7 @@ enum loongarch_symbol_type {
 extern rtx loongarch_emit_move (rtx, rtx);
 extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
 extern void loongarch_expand_prologue (void);
-extern void loongarch_expand_epilogue (bool);
+extern void loongarch_expand_epilogue (int);
 extern bool loongarch_can_use_return_insn (void);
 
 extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3545e66a10e..9c0e0dd1b73 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1015,20 +1015,30 @@ loongarch_save_restore_reg (machine_mode mode, int 
regno, HOST_WIDE_INT offset,
 
 static void
 loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
- loongarch_save_restore_fn fn)
+ loongarch_save_restore_fn fn,
+ bool skip_eh_data_regs_p)
 {
   HOST_WIDE_INT offset;
 
   /* Save the link register and s-registers.  */
   offset = cfun->machine->frame.gp_sp_offset - sp_offset;
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
-if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
-  {
-   if (!cfun->machine->reg_is_wrapped_separately[regno])
- loongarch_save_restore_reg (word_mode, regno, offset, fn);
+{
+  /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
+when returning normally from a function that calls __builtin_eh_return.
+In this case, these registers are saved but should not be restored,
+or the return value may be clobbered.  */
 
-   offset -= UNITS_PER_WORD;
-  }
+  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
+   {
+ if (!(cfun->machine->reg_is_wrapped_separately[regno]
+   || (skip_eh_data_regs_p
+   && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
+   loongarch_save_restore_reg (word_mode, regno, offset, fn);
+
+ offset -= UNITS_PER_WORD;
+   }
+}
 
   /* This loop must iterate over the same space as its companion in
  loongarch_compute_frame_info.  */
@@ -1297,7 +1307,7 @@ loongarch_expand_prologue (void)
GEN_INT (-step1));
   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
   size -= step1;
-  loongarch_for_each_saved_reg (size, loongarch_save_reg);
+  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
 }
 
   /* Set up the frame pointer, if we're using one.  */
@@ -1382,11 +1392,11 @@ loongarch_can_use_return_insn (void)
   return reload_completed && cfun->machine->frame.total_size == 0;
 }
 
-/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
-   says which.  */
+/* Expand function epilogue for the following insn patterns:
+   "epilogue" (style == 0) / "sibcall_epilogue" (1) / "eh_return" (2).  */
 
 void
-loongarch_expand_epilogue (bool sibcall_p)
+loongarch_expand_epilogue (int style)
 {
   /* Split the frame into two.  STEP1 is the amount of stack we should
  deallocate before restoring the registers.  STEP2 is the amount we
@@ -1403,7 +1413,8 @@ loongarch_expand_epilogue (bool sibcall_p)
   bool need_barrier_p
 = (get_frame_size () + cfun->machine->frame.arg_pointer_offset) != 0;
 
-  if (!sibcall_p && loongarch_can_use_return_insn ())
+  /* Handle simple returns.  */
+  if (style == 0 && loongarch_can_use_return_insn ())
 {
   emit_jump_insn (gen_return ());
   return;
@@ -1479,7 +1490,8 @@ loongarch_expand_epilogue (bool sibcall_p)
 
   /* Restore the register

Re: [PATCH v4] LoongArch: Fix eh_return epilogue for normal returns

2023-12-08 Thread Yang Yujie
Updates:
v1 -> v2: Add a test case.
v2 -> v3: Fix code format.
v3 -> v4: Fix code format.  Avoid unwanted optimization in the test.

On Fri, Dec 08, 2023 at 05:54:46PM +0800, Yang Yujie wrote:
> On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
> and restored in the function prologue and epilogue if the given function calls
> __builtin_eh_return.  This causes the return value to be overwritten on normal
> return paths and breaks a rare case of libgcc's _Unwind_RaiseException.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.cc: Do not restore the saved eh_return
>   data registers ($r4-$r7) for a normal return of a function that calls
>   __builtin_eh_return elsewhere.
>   * config/loongarch/loongarch-protos.h: Same.
>   * config/loongarch/loongarch.md: Same.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/eh_return-normal-return.c: New test.
> ---
>  gcc/config/loongarch/loongarch-protos.h   |  2 +-
>  gcc/config/loongarch/loongarch.cc | 41 ---
>  gcc/config/loongarch/loongarch.md | 18 +++-
>  .../loongarch/eh_return-normal-return.c   | 32 +++
>  4 files changed, 76 insertions(+), 17 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/eh_return-normal-return.c
> 
> diff --git a/gcc/config/loongarch/loongarch-protos.h 
> b/gcc/config/loongarch/loongarch-protos.h
> index cb8fc36b086..af20b5d7132 100644
> --- a/gcc/config/loongarch/loongarch-protos.h
> +++ b/gcc/config/loongarch/loongarch-protos.h
> @@ -60,7 +60,7 @@ enum loongarch_symbol_type {
>  extern rtx loongarch_emit_move (rtx, rtx);
>  extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
>  extern void loongarch_expand_prologue (void);
> -extern void loongarch_expand_epilogue (bool);
> +extern void loongarch_expand_epilogue (int);
>  extern bool loongarch_can_use_return_insn (void);
>  
>  extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type 
> *);
> diff --git a/gcc/config/loongarch/loongarch.cc 
> b/gcc/config/loongarch/loongarch.cc
> index 3545e66a10e..9c0e0dd1b73 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -1015,20 +1015,30 @@ loongarch_save_restore_reg (machine_mode mode, int 
> regno, HOST_WIDE_INT offset,
>  
>  static void
>  loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
> -   loongarch_save_restore_fn fn)
> +   loongarch_save_restore_fn fn,
> +   bool skip_eh_data_regs_p)
>  {
>HOST_WIDE_INT offset;
>  
>/* Save the link register and s-registers.  */
>offset = cfun->machine->frame.gp_sp_offset - sp_offset;
>for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> -if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> -  {
> - if (!cfun->machine->reg_is_wrapped_separately[regno])
> -   loongarch_save_restore_reg (word_mode, regno, offset, fn);
> +{
> +  /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
> +  when returning normally from a function that calls __builtin_eh_return.
> +  In this case, these registers are saved but should not be restored,
> +  or the return value may be clobbered.  */
>  
> - offset -= UNITS_PER_WORD;
> -  }
> +  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> + {
> +   if (!(cfun->machine->reg_is_wrapped_separately[regno]
> + || (skip_eh_data_regs_p
> + && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
> + loongarch_save_restore_reg (word_mode, regno, offset, fn);
> +
> +   offset -= UNITS_PER_WORD;
> + }
> +}
>  
>/* This loop must iterate over the same space as its companion in
>   loongarch_compute_frame_info.  */
> @@ -1297,7 +1307,7 @@ loongarch_expand_prologue (void)
>   GEN_INT (-step1));
>RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>size -= step1;
> -  loongarch_for_each_saved_reg (size, loongarch_save_reg);
> +  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
>  }
>  
>/* Set up the frame pointer, if we're using one.  */
> @@ -1382,11 +1392,11 @@ loongarch_can_use_return_insn (void)
>return reload_completed && cfun->machine->frame.total_size == 0;
>  }
>  
> -/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
> -   says which.  */
> +/* Expand function epilogue for the following insn patterns:
> +   "epilogue" (style == 0) / "sibcall_epilogue" (1) / "eh_return" (2).  */
>  
>  void
> -loongarch_expand_epilogue (bool sibcall_p)
> +loongarch_expand_epilogue (int style)
>  {
>/* Split the frame into two.  STEP1 is the amount of stack we should
>   deallocate before restoring the registers.  STEP2 is the amount we
> @@ -1403,7 +1413,8 @@ loongarch_expand_epilogue (bool sibcall_p)
>bool need_barrier_p
>

Re: [PATCH v4] LoongArch: Fix eh_return epilogue for normal returns

2023-12-08 Thread Yang Yujie
Sorry, this is the wrong patch.  I will post it again.

On Fri, Dec 08, 2023 at 05:57:12PM +0800, Yang Yujie wrote:
> Updates:
> v1 -> v2: Add a test case.
> v2 -> v3: Fix code format.
> v3 -> v4: Fix code format.  Avoid unwanted optimization in the test.
> 
> On Fri, Dec 08, 2023 at 05:54:46PM +0800, Yang Yujie wrote:
> > On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
> > and restored in the function prologue and epilogue if the given function 
> > calls
> > __builtin_eh_return.  This causes the return value to be overwritten on 
> > normal
> > return paths and breaks a rare case of libgcc's _Unwind_RaiseException.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc: Do not restore the saved eh_return
> > data registers ($r4-$r7) for a normal return of a function that calls
> > __builtin_eh_return elsewhere.
> > * config/loongarch/loongarch-protos.h: Same.
> > * config/loongarch/loongarch.md: Same.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/loongarch/eh_return-normal-return.c: New test.
> > ---
> >  gcc/config/loongarch/loongarch-protos.h   |  2 +-
> >  gcc/config/loongarch/loongarch.cc | 41 ---
> >  gcc/config/loongarch/loongarch.md | 18 +++-
> >  .../loongarch/eh_return-normal-return.c   | 32 +++
> >  4 files changed, 76 insertions(+), 17 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/loongarch/eh_return-normal-return.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch-protos.h 
> > b/gcc/config/loongarch/loongarch-protos.h
> > index cb8fc36b086..af20b5d7132 100644
> > --- a/gcc/config/loongarch/loongarch-protos.h
> > +++ b/gcc/config/loongarch/loongarch-protos.h
> > @@ -60,7 +60,7 @@ enum loongarch_symbol_type {
> >  extern rtx loongarch_emit_move (rtx, rtx);
> >  extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
> >  extern void loongarch_expand_prologue (void);
> > -extern void loongarch_expand_epilogue (bool);
> > +extern void loongarch_expand_epilogue (int);
> >  extern bool loongarch_can_use_return_insn (void);
> >  
> >  extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type 
> > *);
> > diff --git a/gcc/config/loongarch/loongarch.cc 
> > b/gcc/config/loongarch/loongarch.cc
> > index 3545e66a10e..9c0e0dd1b73 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -1015,20 +1015,30 @@ loongarch_save_restore_reg (machine_mode mode, int 
> > regno, HOST_WIDE_INT offset,
> >  
> >  static void
> >  loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
> > - loongarch_save_restore_fn fn)
> > + loongarch_save_restore_fn fn,
> > + bool skip_eh_data_regs_p)
> >  {
> >HOST_WIDE_INT offset;
> >  
> >/* Save the link register and s-registers.  */
> >offset = cfun->machine->frame.gp_sp_offset - sp_offset;
> >for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> > -if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > -  {
> > -   if (!cfun->machine->reg_is_wrapped_separately[regno])
> > - loongarch_save_restore_reg (word_mode, regno, offset, fn);
> > +{
> > +  /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
> > +when returning normally from a function that calls __builtin_eh_return.
> > +In this case, these registers are saved but should not be restored,
> > +or the return value may be clobbered.  */
> >  
> > -   offset -= UNITS_PER_WORD;
> > -  }
> > +  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > +   {
> > + if (!(cfun->machine->reg_is_wrapped_separately[regno]
> > +   || (skip_eh_data_regs_p
> > +   && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
> > +   loongarch_save_restore_reg (word_mode, regno, offset, fn);
> > +
> > + offset -= UNITS_PER_WORD;
> > +   }
> > +}
> >  
> >/* This loop must iterate over the same space as its companion in
> >   loongarch_compute_frame_info.  */
> > @@ -1297,7 +1307,7 @@ loongarch_expand_prologue (void)
> > GEN_INT (-step1));
> >RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
> >size -= step1;
> > -  loongarch_for_each_saved_reg (size, loongarch_save_reg);
> > +  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
> >  }
> >  
> >/* Set up the frame pointer, if we're using one.  */
> > @@ -1382,11 +1392,11 @@ loongarch_can_use_return_insn (void)
> >return reload_completed && cfun->machine->frame.total_size == 0;
> >  }
> >  
> > -/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
> > -   says which.  */
> > +/* Expand function epilogue for the following insn patterns:
> > +   "epilogue" (style == 0) / "sibcall_epilogue" (1) / "eh_return" (2).  */
> >  
> >  void
> > -loongarch_expand_epilogue (bool sib

[PATCH v5] LoongArch: Fix eh_return epilogue for normal returns.

2023-12-08 Thread Yang Yujie
On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
and restored in the function prologue and epilogue if the given function calls
__builtin_eh_return.  This causes the return value to be overwritten on normal
return paths and breaks a rare case of libgcc's _Unwind_RaiseException.

gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/eh_return-normal-return.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  2 +-
 gcc/config/loongarch/loongarch.cc | 34 -
 gcc/config/loongarch/loongarch.md | 23 ++-
 .../loongarch/eh_return-normal-return.c   | 38 +++
 4 files changed, 84 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/eh_return-normal-return.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..af20b5d7132 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -60,7 +60,7 @@ enum loongarch_symbol_type {
 extern rtx loongarch_emit_move (rtx, rtx);
 extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
 extern void loongarch_expand_prologue (void);
-extern void loongarch_expand_epilogue (bool);
+extern void loongarch_expand_epilogue (int);
 extern bool loongarch_can_use_return_insn (void);
 
 extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3545e66a10e..1277c0e9f72 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1015,7 +1015,8 @@ loongarch_save_restore_reg (machine_mode mode, int regno, 
HOST_WIDE_INT offset,
 
 static void
 loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
- loongarch_save_restore_fn fn)
+ loongarch_save_restore_fn fn,
+ bool skip_eh_data_regs_p)
 {
   HOST_WIDE_INT offset;
 
@@ -1024,7 +1025,14 @@ loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
   {
-   if (!cfun->machine->reg_is_wrapped_separately[regno])
+   /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
+  when returning normally from a function that calls
+  __builtin_eh_return.  In this case, these registers are saved but
+  should not be restored, or the return value may be clobbered.  */
+
+   if (!(cfun->machine->reg_is_wrapped_separately[regno]
+ || (skip_eh_data_regs_p
+ && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
  loongarch_save_restore_reg (word_mode, regno, offset, fn);
 
offset -= UNITS_PER_WORD;
@@ -1297,7 +1305,7 @@ loongarch_expand_prologue (void)
GEN_INT (-step1));
   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
   size -= step1;
-  loongarch_for_each_saved_reg (size, loongarch_save_reg);
+  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
 }
 
   /* Set up the frame pointer, if we're using one.  */
@@ -1382,11 +1390,13 @@ loongarch_can_use_return_insn (void)
   return reload_completed && cfun->machine->frame.total_size == 0;
 }
 
-/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
-   says which.  */
+/* Expand function epilogue using the following insn patterns:
+   "epilogue"(style == NORMAL_RETURN)
+   "sibcall_epilogue" (style == SIBCALL_RETURN)
+   "eh_return"   (style == EXCEPTION_RETURN) */
 
 void
-loongarch_expand_epilogue (bool sibcall_p)
+loongarch_expand_epilogue (int style)
 {
   /* Split the frame into two.  STEP1 is the amount of stack we should
  deallocate before restoring the registers.  STEP2 is the amount we
@@ -1403,7 +1413,8 @@ loongarch_expand_epilogue (bool sibcall_p)
   bool need_barrier_p
 = (get_frame_size () + cfun->machine->frame.arg_pointer_offset) != 0;
 
-  if (!sibcall_p && loongarch_can_use_return_insn ())
+  /* Handle simple returns.  */
+  if (style == NORMAL_RETURN && loongarch_can_use_return_insn ())
 {
   emit_jump_insn (gen_return ());
   return;
@@ -1479,7 +1490,9 @@ loongarch_expand_epilogue (bool sibcall_p)
 
   /* Restore the registers.  */
   loongarch_for_each_saved_reg (frame->total_size - step2,
-   loongarch_restore_reg);
+   loongarch_restore_reg,
+   crtl->calls_eh_return
+   && styl

[PATCH v3 0/2] LoongArch D support

2023-12-08 Thread Yang Yujie
This patchset is based on Zixing Liu's initial support patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631260.html

Updates
v1 -> v2: Rebased onto the dmd/druntime upstream state.
v2 -> v3: Dropped unnecessary changes.

Regtested on loongarch64-linux-gnu with the following result:

=== libphobos Summary ===

FAIL: libphobos.config/test22523.d -- --DRT-testmode=run-main execution test
FAIL: libphobos.gc/precisegc.d execution test
FAIL: libphobos.phobos/std/datetime/systime.d (test for excess errors)
UNRESOLVED: libphobos.phobos/std/datetime/systime.d compilation failed to 
produce executable
UNSUPPORTED: libphobos.phobos/std/net/curl.d: skipped test
UNSUPPORTED: libphobos.phobos_shared/std/net/curl.d: skipped test
FAIL: libphobos.shared/loadDR.c -ldl -pthread -g execution test (out-of-tree 
testing)

# of expected passes1024
# of unexpected failures4
# of unresolved testcases   1
# of unsupported tests  2

=== gdc Summary ===

FAIL: gdc.test/runnable/testaa.d   execution test
FAIL: gdc.test/runnable/testaa.d -fPIC   execution test

# of expected passes10353
# of unexpected failures2
# of unsupported tests  631


Yang Yujie (2):
  libruntime: Add fiber context switch code for LoongArch.
  libphobos: Update build scripts for LoongArch64.

 libphobos/configure   |  21 ++-
 libphobos/libdruntime/Makefile.am |   3 +
 libphobos/libdruntime/Makefile.in |  98 -
 .../config/loongarch/switchcontext.S  | 133 ++
 libphobos/m4/druntime/cpu.m4  |   5 +
 5 files changed, 220 insertions(+), 40 deletions(-)
 create mode 100644 libphobos/libdruntime/config/loongarch/switchcontext.S

-- 
2.43.0



[PATCH v3 1/2] libruntime: Add fiber context switch code for LoongArch.

2023-12-08 Thread Yang Yujie
libphobos/ChangeLog:

* libdruntime/config/loongarch/switchcontext.S: New file.
---
 .../config/loongarch/switchcontext.S  | 133 ++
 1 file changed, 133 insertions(+)
 create mode 100644 libphobos/libdruntime/config/loongarch/switchcontext.S

diff --git a/libphobos/libdruntime/config/loongarch/switchcontext.S 
b/libphobos/libdruntime/config/loongarch/switchcontext.S
new file mode 100644
index 000..edfb9b67e8f
--- /dev/null
+++ b/libphobos/libdruntime/config/loongarch/switchcontext.S
@@ -0,0 +1,133 @@
+/* LoongArch support code for fibers and multithreading.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "../common/threadasm.S"
+
+/**
+ * Performs a context switch.
+ *
+ * $a0 - void** - ptr to old stack pointer
+ * $a1 - void*  - new stack pointer
+ *
+ */
+
+#if defined(__loongarch_lp64)
+#  define GPR_L ld.d
+#  define GPR_S st.d
+#  define SZ_GPR 8
+#  define ADDSP(si)   addi.d  $sp, $sp, si
+#elif defined(__loongarch64_ilp32)
+#  define GPR_L ld.w
+#  define GPR_S st.w
+#  define SZ_GPR 4
+#  define ADDSP(si)   addi.w  $sp, $sp, si
+#else
+#  error Unsupported GPR size (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__loongarch_double_float)
+#  define FPR_L fld.d
+#  define FPR_S fst.d
+#  define SZ_FPR 8
+#elif defined(__loongarch_single_float)
+#  define FPR_L fld.s
+#  define FPR_S fst.s
+#  define SZ_FPR 4
+#else
+#  define SZ_FPR 0
+#endif
+
+.text
+.align 2
+.global fiber_switchContext
+.type   fiber_switchContext, @function
+fiber_switchContext:
+.cfi_startproc
+ADDSP(-11 * SZ_GPR)
+
+// fp regs and return address are stored below the stack
+// because we don't want the GC to scan them.
+
+// return address (r1)
+GPR_S  $r1, $sp, -SZ_GPR
+
+#if SZ_FPR != 0
+// callee-saved scratch FPRs (f24-f31)
+FPR_S  $f24, $sp, -SZ_GPR-1*SZ_FPR
+FPR_S  $f25, $sp, -SZ_GPR-2*SZ_FPR
+FPR_S  $f26, $sp, -SZ_GPR-3*SZ_FPR
+FPR_S  $f27, $sp, -SZ_GPR-4*SZ_FPR
+FPR_S  $f28, $sp, -SZ_GPR-5*SZ_FPR
+FPR_S  $f29, $sp, -SZ_GPR-6*SZ_FPR
+FPR_S  $f30, $sp, -SZ_GPR-7*SZ_FPR
+FPR_S  $f31, $sp, -SZ_GPR-8*SZ_FPR
+#endif
+
+// callee-saved GPRs (r21, fp (r22), r23-r31)
+GPR_S $r21, $sp, 0*SZ_GPR
+GPR_S  $fp, $sp, 1*SZ_GPR
+GPR_S  $s0, $sp, 2*SZ_GPR
+GPR_S  $s1, $sp, 3*SZ_GPR
+GPR_S  $s2, $sp, 4*SZ_GPR
+GPR_S  $s3, $sp, 5*SZ_GPR
+GPR_S  $s4, $sp, 6*SZ_GPR
+GPR_S  $s5, $sp, 7*SZ_GPR
+GPR_S  $s6, $sp, 8*SZ_GPR
+GPR_S  $s7, $sp, 9*SZ_GPR
+GPR_S  $s8, $sp, 10*SZ_GPR
+
+// swap stack pointer
+GPR_S $sp, $a0, 0
+move $sp, $a1
+
+GPR_L  $r1, $sp, -SZ_GPR
+
+#if SZ_FPR != 0
+FPR_L  $f24, $sp, -SZ_GPR-1*SZ_FPR
+FPR_L  $f25, $sp, -SZ_GPR-2*SZ_FPR
+FPR_L  $f26, $sp, -SZ_GPR-3*SZ_FPR
+FPR_L  $f27, $sp, -SZ_GPR-4*SZ_FPR
+FPR_L  $f28, $sp, -SZ_GPR-5*SZ_FPR
+FPR_L  $f29, $sp, -SZ_GPR-6*SZ_FPR
+FPR_L  $f30, $sp, -SZ_GPR-7*SZ_FPR
+FPR_L  $f31, $sp, -SZ_GPR-8*SZ_FPR
+#endif
+
+GPR_L $r21, $sp, 0*SZ_GPR
+GPR_L  $fp, $sp, 1*SZ_GPR
+GPR_L  $s0, $sp, 2*SZ_GPR
+GPR_L  $s1, $sp, 3*SZ_GPR
+GPR_L  $s2, $sp, 4*SZ_GPR
+GPR_L  $s3, $sp, 5*SZ_GPR
+GPR_L  $s4, $sp, 6*SZ_GPR
+GPR_L  $s5, $sp, 7*SZ_GPR
+GPR_L  $s6, $sp, 8*SZ_GPR
+GPR_L  $s7, $sp, 9*SZ_GPR
+GPR_L  $s8, $sp, 10*SZ_GPR
+
+ADDSP(11 * SZ_GPR)
+
+jr $r1 // return
+.cfi_endproc
+.size fiber_switchContext,.-fiber_switchContext
-- 
2.43.0



[PATCH v3 2/2] libphobos: Update build scripts for LoongArch64.

2023-12-08 Thread Yang Yujie
libphobos/ChangeLog:

* m4/druntime/cpu.m4: Support loongarch* targets.
* libdruntime/Makefile.am: Same.
* libdruntime/Makefile.in: Regenerate.
* configure: Regenerate.
---
 libphobos/configure   | 21 ++-
 libphobos/libdruntime/Makefile.am |  3 +
 libphobos/libdruntime/Makefile.in | 98 +++
 libphobos/m4/druntime/cpu.m4  |  5 ++
 4 files changed, 87 insertions(+), 40 deletions(-)

diff --git a/libphobos/configure b/libphobos/configure
index 25b13bdd93e..9a59bad34ac 100755
--- a/libphobos/configure
+++ b/libphobos/configure
@@ -696,6 +696,8 @@ DRUNTIME_CPU_POWERPC_FALSE
 DRUNTIME_CPU_POWERPC_TRUE
 DRUNTIME_CPU_MIPS_FALSE
 DRUNTIME_CPU_MIPS_TRUE
+DRUNTIME_CPU_LOONGARCH_FALSE
+DRUNTIME_CPU_LOONGARCH_TRUE
 DRUNTIME_CPU_ARM_FALSE
 DRUNTIME_CPU_ARM_TRUE
 DRUNTIME_CPU_AARCH64_FALSE
@@ -11865,7 +11867,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11868 "configure"
+#line 11870 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11971,7 +11973,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11974 "configure"
+#line 11976 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -14305,6 +14307,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
;;
   arm*)druntime_target_cpu_parsed="arm"
;;
+  loongarch*)
+   druntime_target_cpu_parsed="loongarch"
+   ;;
   mips*)   druntime_target_cpu_parsed="mips"
;;
   powerpc*)
@@ -14336,6 +14341,14 @@ else
   DRUNTIME_CPU_ARM_FALSE=
 fi
 
+   if test "$druntime_target_cpu_parsed" = "loongarch"; then
+  DRUNTIME_CPU_LOONGARCH_TRUE=
+  DRUNTIME_CPU_LOONGARCH_FALSE='#'
+else
+  DRUNTIME_CPU_LOONGARCH_TRUE='#'
+  DRUNTIME_CPU_LOONGARCH_FALSE=
+fi
+
if test "$druntime_target_cpu_parsed" = "mips"; then
   DRUNTIME_CPU_MIPS_TRUE=
   DRUNTIME_CPU_MIPS_FALSE='#'
@@ -15997,6 +16010,10 @@ if test -z "${DRUNTIME_CPU_ARM_TRUE}" && test -z 
"${DRUNTIME_CPU_ARM_FALSE}"; th
   as_fn_error $? "conditional \"DRUNTIME_CPU_ARM\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${DRUNTIME_CPU_LOONGARCH_TRUE}" && test -z 
"${DRUNTIME_CPU_LOONGARCH_FALSE}"; then
+  as_fn_error $? "conditional \"DRUNTIME_CPU_LOONGARCH\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${DRUNTIME_CPU_MIPS_TRUE}" && test -z 
"${DRUNTIME_CPU_MIPS_FALSE}"; then
   as_fn_error $? "conditional \"DRUNTIME_CPU_MIPS\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libphobos/libdruntime/Makefile.am 
b/libphobos/libdruntime/Makefile.am
index 23205fd3301..ca43a0753c4 100644
--- a/libphobos/libdruntime/Makefile.am
+++ b/libphobos/libdruntime/Makefile.am
@@ -83,6 +83,9 @@ endif
 if DRUNTIME_CPU_ARM
 DRUNTIME_SOURCES_CONFIGURED += config/arm/switchcontext.S
 endif
+if DRUNTIME_CPU_LOONGARCH
+DRUNTIME_SOURCES_CONFIGURED += config/loongarch/switchcontext.S
+endif
 if DRUNTIME_CPU_MIPS
 DRUNTIME_SOURCES_CONFIGURED += config/mips/switchcontext.S
 endif
diff --git a/libphobos/libdruntime/Makefile.in 
b/libphobos/libdruntime/Makefile.in
index 410245d71ca..f52bf36c282 100644
--- a/libphobos/libdruntime/Makefile.in
+++ b/libphobos/libdruntime/Makefile.in
@@ -124,12 +124,13 @@ target_triplet = @target@
 # CPU specific sources
 @DRUNTIME_CPU_AARCH64_TRUE@am__append_11 = config/aarch64/switchcontext.S
 @DRUNTIME_CPU_ARM_TRUE@am__append_12 = config/arm/switchcontext.S
-@DRUNTIME_CPU_MIPS_TRUE@am__append_13 = config/mips/switchcontext.S
-@DRUNTIME_CPU_POWERPC_TRUE@am__append_14 = config/powerpc/switchcontext.S
-@DRUNTIME_CPU_X86_TRUE@@DRUNTIME_OS_MINGW_TRUE@am__append_15 = 
config/mingw/switchcontext.S
-@DRUNTIME_CPU_X86_TRUE@@DRUNTIME_OS_MINGW_FALSE@am__append_16 = 
config/x86/switchcontext.S
-@DRUNTIME_CPU_SYSTEMZ_TRUE@am__append_17 = config/systemz/get_tls_offset.S
-@DRUNTIME_CPU_S390_TRUE@am__append_18 = config/s390/get_tls_offset.S
+@DRUNTIME_CPU_LOONGARCH_TRUE@am__append_13 = config/loongarch/switchcontext.S
+@DRUNTIME_CPU_MIPS_TRUE@am__append_14 = config/mips/switchcontext.S
+@DRUNTIME_CPU_POWERPC_TRUE@am__append_15 = config/powerpc/switchcontext.S
+@DRUNTIME_CPU_X86_TRUE@@DRUNTIME_OS_MINGW_TRUE@am__append_16 = 
config/mingw/switchcontext.S
+@DRUNTIME_CPU_X86_TRUE@@DRUNTIME_OS_MINGW_FALSE@am__append_17 = 
config/x86/switchcontext.S
+@DRUNTIME_CPU_SYSTEMZ_TRUE@am__append_18 = config/systemz/get_tls_offset.S
+@DRUNTIME_CPU_S390_TRUE@am__append_19 = config/s390/get_tls_offset.S
 subdir = libdruntime
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -485,46 +486,50 @@ am__objects_23 = core/sys/solaris/dlfcn.lo 
core/sys/solaris/elf.lo \
 @DRUNTIME_OS_SOLARIS

Re: [PATCH v6] libgfortran: Replace mutex with rwlock

2023-12-08 Thread Jakub Jelinek
On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote:
> From: Lipeng Zhu 
> 
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
> (__gthrw): New function
> (__gthread_rwlock_rdlock): New function
> (__gthread_rwlock_tryrdlock): New function
> (__gthread_rwlock_wrlock): New function
> (__gthread_rwlock_trywrlock): New function
> (__gthread_rwlock_unlock): New function
> 
> libgfortran/ChangeLog:
> 
> * io/async.c (DEBUG_LINE): New
> * io/async.h (RWLOCK_DEBUG_ADD): New macro
> (CHECK_RDLOCK): New macro
> (CHECK_WRLOCK): New macro
> (TAIL_RWLOCK_DEBUG_QUEUE): New macro
> (IN_RWLOCK_DEBUG_QUEUE): New macro
> (RDLOCK): New macro
> (WRLOCK): New macro
> (RWUNLOCK): New macro
> (RD_TO_WRLOCK): New macro
> (INTERN_RDLOCK): New macro
> (INTERN_WRLOCK): New macro
> (INTERN_RWUNLOCK): New macro
> * io/io.h (internal_proto): Define unit_rwlock
> * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
> (st_write_done_worker): Relace unit_lock with unit_rwlock
> * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
> (if): Relace unit_lock with unit_rwlock
> (close_unit_1): Relace unit_lock with unit_rwlock
> (close_units): Relace unit_lock with unit_rwlock
> (newunit_alloc): Relace unit_lock with unit_rwlock
> * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock

The changeLog entries are all incorrect.
1) they should be indented by a tab, not 4 spaces, and should end with
   a dot
2) when several consecutive descriptions have the same text, especially
   when it is long, it should use Likewise. for the 2nd and following
3) (internal_proto) is certainly not what you've changed, from what I can
   see in io.h you've done:
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
   (if) is certainly not what you've changed either, always find what
   function or macro the change was in, or if you remove something, state
   it, if you add something, state it.
4) all the
   Replace unit_lock with unit_rwlock. descriptions only partially match
   reality, you've also changed the operations on those variables.

> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -207,9 +207,132 @@
>  INTERN_LOCK (&debug_queue_lock); \
>  MUTEX_DEBUG_ADD (mutex); \
>  INTERN_UNLOCK (&debug_queue_lock);   
> \
> -DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #mutex, mutex); \
> +DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #mutex, \
> +  mutex); \

Why are you changing this at all?

> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock);\

At least a space before \ (or better tab

> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock);\

Likewise.

> +  WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) {}

do {} while (0)
instead of {}
?

>  #endif
>  
>  #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>  
>  #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>  
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock);
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock);
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock);

Admittedly preexisting issue, but I wonder why the ; at the end, all the
uses of RDLOCK etc. I've seen were followed by ;

> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master lock, protect

[PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy [PR111503]

2023-12-08 Thread Harald Anlauf
Dear all,

here's another fix for the CONTIGUOUS attribute: NULL() should
derive its characteristics from its MOLD argument; otherwise it is
"determined by the entity with which the reference is associated".
(F2018:16.9.144).

The testcase is cross-checked with Intel.
NAG rejects cases where MOLD is a pointer.  I think it is wrong here.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From c73b248ec16388ed1ce109fce8a468a87e367085 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 8 Dec 2023 11:11:08 +0100
Subject: [PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy
 [PR111503]

gcc/fortran/ChangeLog:

	PR fortran/111503
	* expr.cc (gfc_is_simply_contiguous): Determine characteristics of
	NULL() from MOLD argument if present, otherwise treat as present.
	* primary.cc (gfc_variable_attr): Derive attributes of NULL(MOLD)
	from MOLD.

gcc/testsuite/ChangeLog:

	PR fortran/111503
	* gfortran.dg/contiguous_14.f90: New test.
---
 gcc/fortran/expr.cc | 14 
 gcc/fortran/primary.cc  |  4 ++-
 gcc/testsuite/gfortran.dg/contiguous_14.f90 | 39 +
 3 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/contiguous_14.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index c668baeef8c..709f3c3cbef 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -5958,6 +5958,20 @@ gfc_is_simply_contiguous (gfc_expr *expr, bool strict, bool permit_element)
   if (expr->expr_type == EXPR_ARRAY)
 return true;

+  if (expr->expr_type == EXPR_NULL)
+{
+  /* F2018:16.9.144  NULL ([MOLD]):
+	 "If MOLD is present, the characteristics are the same as MOLD."
+	 "If MOLD is absent, the characteristics of the result are
+	 determined by the entity with which the reference is associated."
+	 F2018:15.3.2.2 characteristics attributes include CONTIGUOUS.  */
+  if (expr->ts.type == BT_UNKNOWN)
+	return true;
+  else
+	return (gfc_variable_attr (expr, NULL).contiguous
+		|| gfc_variable_attr (expr, NULL).allocatable);
+}
+
   if (expr->expr_type == EXPR_FUNCTION)
 {
   if (expr->value.function.isym)
diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 7278932b634..f8a1c09d190 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2627,7 +2627,9 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
   gfc_component *comp;
   bool has_inquiry_part;

-  if (expr->expr_type != EXPR_VARIABLE && expr->expr_type != EXPR_FUNCTION)
+  if (expr->expr_type != EXPR_VARIABLE
+  && expr->expr_type != EXPR_FUNCTION
+  && !(expr->expr_type == EXPR_NULL && expr->ts.type != BT_UNKNOWN))
 gfc_internal_error ("gfc_variable_attr(): Expression isn't a variable");

   sym = expr->symtree->n.sym;
diff --git a/gcc/testsuite/gfortran.dg/contiguous_14.f90 b/gcc/testsuite/gfortran.dg/contiguous_14.f90
new file mode 100644
index 000..21e42311e9c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/contiguous_14.f90
@@ -0,0 +1,39 @@
+! { dg-do compile }
+! PR fortran/111503 - passing NULL() to POINTER, OPTIONAL, CONTIGUOUS dummy
+
+program test
+  implicit none
+  integer, pointer, contiguous :: p(:) => null()
+  integer, allocatable, target :: a(:)
+  type t
+ integer, pointer, contiguous :: p(:) => null()
+ integer, allocatable :: a(:)
+  end type t
+  type(t),   target :: z
+  class(t), allocatable, target :: c
+  print *, is_contiguous (p)
+  allocate (t :: c)
+  call one (p)
+  call one ()
+  call one (null ())
+  call one (null (p))
+  call one (a)
+  call one (null (a))
+  call one (z% p)
+  call one (z% a)
+  call one (null (z% p))
+  call one (null (z% a))
+  call one (c% p)
+  call one (c% a)
+  call one (null (c% p))
+  call one (null (c% a))
+contains
+  subroutine one (x)
+integer, pointer, optional, contiguous, intent(in) :: x(:)
+print *, present (x)
+if (present (x)) then
+   print *, "->", associated (x)
+   if (associated (x)) stop 99
+end if
+  end subroutine one
+end
--
2.35.3



RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-08 Thread Richard Biener
On Fri, 8 Dec 2023, Tamar Christina wrote:

> > --param vect-partial-vector-usage=2 would, no?
> > 
> I.. didn't even know it went to 2!
> 
> > > In principal I suppose I could mask the individual stmts, that should 
> > > handle the
> > future case when
> > > This is relaxed to supposed non-fix length buffers?
> > 
> > Well, it looks wrong - either put in an assert that we start with a
> > single stmt or assert !masked_loop_p instead?  Better ICE than
> > generate wrong code.
> > 
> > That said, I think you need to apply the masking on the original
> > stmts[], before reducing them, no?
> 
> Yeah, I've done so now.  For simplicity I've just kept the final masking 
> always as well
> and just leave it up to the optimizers to drop it when it's superfluous.
> 
> Simple testcase:
> 
> #ifndef N
> #define N 837
> #endif
> float vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(double x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>if (vect_a[i] > x)
>  break;
>vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> Looks good now. After this one there's only one patch left, the dependency 
> analysis.
> I'm almost done with the cleanup/respin, but want to take the weekend to 
> double check and will post it first thing Monday morning.
> 
> Did you want to see the testsuite changes as well again? I've basically just 
> added the right dg-requires-effective and add-options etc.

Yes please.

> Thanks for all the reviews!
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
>   (check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
>   vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
>   * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
>   lhs.
>   (vectorizable_early_exit): New.
>   (vect_analyze_stmt, vect_transform_stmt): Use it.
>   (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
> *pattern_stmt,
>if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>  {
>gcc_assert (!vectype
> +   || is_a  (pattern_stmt)
> || (VECTOR_BOOLEAN_TYPE_P (vectype)
> == vect_use_mask_type_p (orig_stmt_info)));
>STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
> true if bool VAR can and should be optimized that way.  Assume it 
> shouldn't
> in case it's a result of a comparison which can be directly vectorized 
> into
> a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform 
> any
> +   codegen associated with the boolean condition.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
> + bool analyze_only)
>  {
>tree rhs1;
>enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !analyze_only)
>  return false;
> +  else if (!def_stmt_info)
> +/* If we're a only analyzing we won't be codegen-ing the statements and 
> are
> +   only after if the types match.  In that case we can accept loop 
> invariant
> +   values.  */
> +def_stmt = dyn_cast  (SSA_NAME_DEF_STMT (var));
> +  else
> +def_stmt = dyn_cast  (def_stmt_info->stmt);
>  

Hmm, but we're visiting them then?  I wonder how you get along
without doing adjustmens on the uses if you consider

_1 = a < b;
_2 = c != d;
_3 = _1 | _2;
if (_3 != 0)
  exit loop;

thus a combined condition like

if (a < b || c != d)

that we if-converted.  We need to recognize that _1, _2 and _3 have
mask uses and thus possibly adjust them.

What bad happens if you drop 'analyze_only'?  We're not really
rewriting anything there.

> -  gassign *def_stmt = dyn_cast  (def_stmt_info->stmt);
>if (!def_stmt)
>  return false;
>  
> @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, 
> hash_set &stmts)
>switch (rhs_code)
>  {
>  case SSA_NAME:
> -  if (! check_bool_pattern (rhs1, vinfo, stmts))
> +  if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>   return false;
>break;
>  
>  CASE_CONVERT:
>if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> 

[PATCH] tree-optimization/112909 - uninit diagnostic with abnormal copy

2023-12-08 Thread Richard Biener
The following avoids spurious uninit diagnostics for SSA name
copies which mostly appear when the source is marked as abnormal
which prevents copy propagation.

To prevent regressions I remove the bail out for anonymous SSA
names in the PHI arg place from warn_uninitialized_phi leaving
that to warn_uninit where I handle SSA copies from a SSA name
which isn't anonymous.  In theory this might cause more
valid and false positive diagnostics to pop up.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112909
* tree-ssa-uninit.cc (find_uninit_use): Look through a
single level of SSA name copies with single use.

* gcc.dg/uninit-pr112909.c: New testcase.
---
 gcc/testsuite/gcc.dg/uninit-pr112909.c | 28 +++
 gcc/tree-ssa-uninit.cc | 47 --
 2 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-pr112909.c

diff --git a/gcc/testsuite/gcc.dg/uninit-pr112909.c 
b/gcc/testsuite/gcc.dg/uninit-pr112909.c
new file mode 100644
index 000..d2998f715aa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr112909.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wuninitialized" } */
+
+struct machine_thread_all_state {
+  int set;
+} _hurd_setup_sighandler_state;
+int _hurd_setup_sighandler_ss_0;
+struct {
+  int ctx;
+} *_hurd_setup_sighandler_stackframe;
+void _setjmp();
+void __thread_get_state();
+int machine_get_basic_state(struct machine_thread_all_state *state) {
+  if (state->set)
+__thread_get_state();
+  return 1;
+}
+int *_hurd_setup_sighandler() {
+  int *scp;/* { dg-bogus "used uninitialized" } */
+  if (_hurd_setup_sighandler_ss_0) {
+_setjmp();
+_hurd_setup_sighandler_state.set |= 5;
+  }
+  machine_get_basic_state(&_hurd_setup_sighandler_state);
+  scp = &_hurd_setup_sighandler_stackframe->ctx;
+  _setjmp();
+  return scp;
+}
diff --git a/gcc/tree-ssa-uninit.cc b/gcc/tree-ssa-uninit.cc
index f42f76cd5c6..9a7c7d12dd8 100644
--- a/gcc/tree-ssa-uninit.cc
+++ b/gcc/tree-ssa-uninit.cc
@@ -204,14 +204,29 @@ warn_uninit (opt_code opt, tree t, tree var, gimple 
*context,
 {
   var_def_stmt = SSA_NAME_DEF_STMT (t);
 
-  if (is_gimple_assign (var_def_stmt)
- && gimple_assign_rhs_code (var_def_stmt) == COMPLEX_EXPR)
+  if (gassign *ass = dyn_cast  (var_def_stmt))
{
- tree v = gimple_assign_rhs1 (var_def_stmt);
- if (TREE_CODE (v) == SSA_NAME
- && has_undefined_value_p (v)
- && zerop (gimple_assign_rhs2 (var_def_stmt)))
-   var = SSA_NAME_VAR (v);
+ switch (gimple_assign_rhs_code (var_def_stmt))
+   {
+   case COMPLEX_EXPR:
+ {
+   tree v = gimple_assign_rhs1 (ass);
+   if (TREE_CODE (v) == SSA_NAME
+   && has_undefined_value_p (v)
+   && zerop (gimple_assign_rhs2 (ass)))
+ var = SSA_NAME_VAR (v);
+   break;
+ }
+   case SSA_NAME:
+ {
+   tree v = gimple_assign_rhs1 (ass);
+   if (TREE_CODE (v) == SSA_NAME
+   && SSA_NAME_VAR (v))
+ var = SSA_NAME_VAR (v);
+   break;
+ }
+   default:;
+   }
}
 
   if (gimple_call_internal_p (var_def_stmt, IFN_DEFERRED_INIT))
@@ -1229,6 +1244,18 @@ find_uninit_use (gphi *phi, unsigned uninit_opnds, int 
*bb_to_rpo)
   if (is_gimple_debug (use_stmt))
continue;
 
+  /* Look through a single level of SSA name copies.  This is
+important for copies involving abnormals which we can't always
+proapgate out but which result in spurious unguarded uses.  */
+  use_operand_p use2_p;
+  gimple *use2_stmt;
+  if (gimple_assign_ssa_name_copy_p (use_stmt)
+ && single_imm_use (gimple_assign_lhs (use_stmt), &use2_p, &use2_stmt))
+   {
+ use_p = use2_p;
+ use_stmt = use2_stmt;
+   }
+
   if (gphi *use_phi = dyn_cast (use_stmt))
{
  unsigned idx = PHI_ARG_INDEX_FROM_USE (use_p);
@@ -1262,9 +1289,9 @@ find_uninit_use (gphi *phi, unsigned uninit_opnds, int 
*bb_to_rpo)
   e->src->index, e->dest->index);
  print_gimple_stmt (dump_file, use_stmt, 0);
}
- /* Found a phi use that is not guarded, mark the phi_result as
+ /* Found a phi use that is not guarded, mark the use as
 possibly undefined.  */
- possibly_undefined_names->add (phi_result);
+ possibly_undefined_names->add (USE_FROM_PTR (use_p));
}
   else
cands.safe_push (use_stmt);
@@ -1318,8 +1345,6 @@ warn_uninitialized_phi (gphi *phi, unsigned uninit_opnds, 
int *bb_to_rpo)
 
   unsigned phiarg_index = MASK_FIRST_SET_BIT (uninit_opnds);
   tree uninit_op = gimple_phi_arg_def (phi, phiarg_index);
-  if (

Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)

2023-12-08 Thread Jakub Jelinek
On Tue, Oct 31, 2023 at 07:59:25AM +, Richard Biener wrote:
> On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:
> 
> > This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the
> > arguments and return types have been transformed into vector types.  It also
> > constructs the adjuments and retval modifications after this call, allowing
> > targets to alter the types of the arguments and return of the clone prior to
> > the modifications to the function definition.
> > 
> > Is this OK?
> 
> OK (I was hoping for Jakub to have a look).

Sorry for the delay, no objections from me there.

Jakub



[PATCH v2] libgcc: aarch64: Add SME runtime support

2023-12-08 Thread Szabolcs Nagy
The call ABI for SME (Scalable Matrix Extension) requires a number of
helper routines which are added to libgcc so they are tied to the
compiler version instead of the libc version. See
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines

The routines are in shared libgcc and static libgcc eh, even though
they are not related to exception handling.  This is to avoid linking
a copy of the routines into dynamic linked binaries, because TPIDR2_EL0
block can be extended in the future which is better to handle in a
single place per process.

The support routines have to decide if SME is accessible or not. Linux
tells userspace if SME is accessible via AT_HWCAP2, otherwise a new
__aarch64_sme_accessible symbol was introduced that a libc can define.
Due to libgcc and libc build order, the symbol availability cannot be
checked so for __aarch64_sme_accessible an unistd.h feature test macro
is used while such detection mechanism is not available for __getauxval
so we rely on configure checks based on the target triplet.

Asm helper code is added to make writing the routines easier.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Add sources to the build.
* config/aarch64/__aarch64_have_sme.c: New file.
* config/aarch64/__arm_sme_state.S: New file.
* config/aarch64/__arm_tpidr2_restore.S: New file.
* config/aarch64/__arm_tpidr2_save.S: New file.
* config/aarch64/__arm_za_disable.S: New file.
* config/aarch64/aarch64-asm.h: New file.
* config/aarch64/libgcc-sme.ver: New file.
---
v2:
- do not include unistd.h when inhibit_libc is set.
- use msr tpidr2_el0,xzr in __arm_za_disable.

 libgcc/config/aarch64/__aarch64_have_sme.c   |  75 ++
 libgcc/config/aarch64/__arm_sme_state.S  |  55 ++
 libgcc/config/aarch64/__arm_tpidr2_restore.S |  89 
 libgcc/config/aarch64/__arm_tpidr2_save.S| 101 +++
 libgcc/config/aarch64/__arm_za_disable.S |  65 
 libgcc/config/aarch64/aarch64-asm.h  |  98 ++
 libgcc/config/aarch64/libgcc-sme.ver |  24 +
 libgcc/config/aarch64/t-aarch64  |  10 ++
 8 files changed, 517 insertions(+)
 create mode 100644 libgcc/config/aarch64/__aarch64_have_sme.c
 create mode 100644 libgcc/config/aarch64/__arm_sme_state.S
 create mode 100644 libgcc/config/aarch64/__arm_tpidr2_restore.S
 create mode 100644 libgcc/config/aarch64/__arm_tpidr2_save.S
 create mode 100644 libgcc/config/aarch64/__arm_za_disable.S
 create mode 100644 libgcc/config/aarch64/aarch64-asm.h
 create mode 100644 libgcc/config/aarch64/libgcc-sme.ver

diff --git a/libgcc/config/aarch64/__aarch64_have_sme.c 
b/libgcc/config/aarch64/__aarch64_have_sme.c
new file mode 100644
index 000..5e649246270
--- /dev/null
+++ b/libgcc/config/aarch64/__aarch64_have_sme.c
@@ -0,0 +1,75 @@
+/* Initializer for SME support.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "auto-target.h"
+
+#ifndef inhibit_libc
+/* For libc feature test macros.  */
+# include 
+#endif
+
+#if __ARM_FEATURE_SME
+/* Avoid runtime SME detection if libgcc is built with SME.  */
+# define HAVE_SME_CONST const
+# define HAVE_SME_VALUE 1
+#elif HAVE___GETAUXVAL
+/* SME access detection on Linux.  */
+# define HAVE_SME_CONST
+# define HAVE_SME_VALUE 0
+# define HAVE_SME_CTOR sme_accessible ()
+
+# define AT_HWCAP2 26
+# define HWCAP2_SME(1 << 23)
+unsigned long int __getauxval (unsigned long int);
+
+static _Bool
+sme_accessible (void)
+{
+  unsigned long hwcap2 = __getauxval (AT_HWCAP2);
+  return (hwcap2 & HWCAP2_SME) != 0;
+}
+#elif __LIBC___AARCH64_SME_ACCESSIBLE
+/* Alternative SME access detection.  */
+# define HAVE_SME_CONST
+# define HAVE_SME_VALUE 0
+# define HAVE_SME_CTOR __aarch64_sme_accessible ()
+_Bool __aarch64_sme_accessible (void);
+#else
+# define HAVE_SME_CONST const
+# define HAVE_SME_VALUE 0
+#endif
+
+/* Define the symbol gating SME support in libgcc.  */
+HAVE_SME_CON

Re: [PATCH v2] libgcc: aarch64: Add SME runtime support

2023-12-08 Thread Richard Sandiford
Szabolcs Nagy  writes:
> The call ABI for SME (Scalable Matrix Extension) requires a number of
> helper routines which are added to libgcc so they are tied to the
> compiler version instead of the libc version. See
> https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines
>
> The routines are in shared libgcc and static libgcc eh, even though
> they are not related to exception handling.  This is to avoid linking
> a copy of the routines into dynamic linked binaries, because TPIDR2_EL0
> block can be extended in the future which is better to handle in a
> single place per process.
>
> The support routines have to decide if SME is accessible or not. Linux
> tells userspace if SME is accessible via AT_HWCAP2, otherwise a new
> __aarch64_sme_accessible symbol was introduced that a libc can define.
> Due to libgcc and libc build order, the symbol availability cannot be
> checked so for __aarch64_sme_accessible an unistd.h feature test macro
> is used while such detection mechanism is not available for __getauxval
> so we rely on configure checks based on the target triplet.
>
> Asm helper code is added to make writing the routines easier.
>
> libgcc/ChangeLog:
>
>   * config/aarch64/t-aarch64: Add sources to the build.
>   * config/aarch64/__aarch64_have_sme.c: New file.
>   * config/aarch64/__arm_sme_state.S: New file.
>   * config/aarch64/__arm_tpidr2_restore.S: New file.
>   * config/aarch64/__arm_tpidr2_save.S: New file.
>   * config/aarch64/__arm_za_disable.S: New file.
>   * config/aarch64/aarch64-asm.h: New file.
>   * config/aarch64/libgcc-sme.ver: New file.
> ---
> v2:
> - do not include unistd.h when inhibit_libc is set.
> - use msr tpidr2_el0,xzr in __arm_za_disable.

LGTM, thanks.

>  libgcc/config/aarch64/__aarch64_have_sme.c   |  75 ++
>  libgcc/config/aarch64/__arm_sme_state.S  |  55 ++
>  libgcc/config/aarch64/__arm_tpidr2_restore.S |  89 
>  libgcc/config/aarch64/__arm_tpidr2_save.S| 101 +++
>  libgcc/config/aarch64/__arm_za_disable.S |  65 
>  libgcc/config/aarch64/aarch64-asm.h  |  98 ++
>  libgcc/config/aarch64/libgcc-sme.ver |  24 +
>  libgcc/config/aarch64/t-aarch64  |  10 ++
>  8 files changed, 517 insertions(+)
>  create mode 100644 libgcc/config/aarch64/__aarch64_have_sme.c
>  create mode 100644 libgcc/config/aarch64/__arm_sme_state.S
>  create mode 100644 libgcc/config/aarch64/__arm_tpidr2_restore.S
>  create mode 100644 libgcc/config/aarch64/__arm_tpidr2_save.S
>  create mode 100644 libgcc/config/aarch64/__arm_za_disable.S
>  create mode 100644 libgcc/config/aarch64/aarch64-asm.h
>  create mode 100644 libgcc/config/aarch64/libgcc-sme.ver
>
> diff --git a/libgcc/config/aarch64/__aarch64_have_sme.c 
> b/libgcc/config/aarch64/__aarch64_have_sme.c
> new file mode 100644
> index 000..5e649246270
> --- /dev/null
> +++ b/libgcc/config/aarch64/__aarch64_have_sme.c
> @@ -0,0 +1,75 @@
> +/* Initializer for SME support.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#include "auto-target.h"
> +
> +#ifndef inhibit_libc
> +/* For libc feature test macros.  */
> +# include 
> +#endif
> +
> +#if __ARM_FEATURE_SME
> +/* Avoid runtime SME detection if libgcc is built with SME.  */
> +# define HAVE_SME_CONST const
> +# define HAVE_SME_VALUE 1
> +#elif HAVE___GETAUXVAL
> +/* SME access detection on Linux.  */
> +# define HAVE_SME_CONST
> +# define HAVE_SME_VALUE 0
> +# define HAVE_SME_CTOR sme_accessible ()
> +
> +# define AT_HWCAP2   26
> +# define HWCAP2_SME  (1 << 23)
> +unsigned long int __getauxval (unsigned long int);
> +
> +static _Bool
> +sme_accessible (void)
> +{
> +  unsigned long hwcap2 = __getauxval (AT_HWCAP2);
> +  return (hwcap2 & HWCAP2_SME) != 0;
> +}
> +#elif __LIBC___AARCH64_SME_ACCESSIBLE
> +/* Alternative SME access detection.  */
> +# define HAVE_SME_CONST
> +# define HAVE_SME_VALU

Re: [PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy [PR111503]

2023-12-08 Thread FX Coudert
Hi Harald,

> here's another fix for the CONTIGUOUS attribute: NULL() should
> derive its characteristics from its MOLD argument; otherwise it is
> "determined by the entity with which the reference is associated".
> (F2018:16.9.144).

Looking good to me, but leave 48 hours for someone else to object if they want.

Best,
FX

Re: [Patch] OpenMP: Add C++ support for 'omp allocate' with stack variables

2023-12-08 Thread Jakub Jelinek
On Fri, Oct 20, 2023 at 06:49:58PM +0200, Tobias Burnus wrote:
> +  if (!processing_template_decl)
> +   finish_omp_allocate (true, OMP_CLAUSE_LOCATION (nl), var);

The above should be called even if processing_template_decl, see below.
And pass DECL_ATTRIBUTES (var) to it (also see below).

> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7801,6 +7801,10 @@ extern tree finish_omp_for 
> (location_t, enum tree_code,
>tree, tree, tree, tree, tree,
>tree, tree, vec *, tree);
>  extern tree finish_omp_for_block (tree, tree);
> +extern void finish_omp_allocate  (bool, location_t, tree,
> +  tree = NULL_TREE,
> +  tsubst_flags_t = 
> tf_warning_or_error,
> +  tree = NULL_TREE);

For a function called in 2 spots adding default arguments is an overkill,
but see below.

> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -41864,6 +41864,67 @@ cp_parser_omp_structured_block (cp_parser *parser, 
> bool *if_p)
>return finish_omp_structured_block (stmt);
>  }
>  

Better also add a comment about this structure.

> +struct cp_omp_loc_tree
> +{
> +  location_t loc;
> +  tree var;
> +};
> +
> +/* Check whether the expression used in the allocator clause is declared or
> +   modified between the variable declaration and its allocate directive.  */
> +static tree
> +cp_check_omp_allocate_allocator_r (tree *tp, int *, void *data)
> +{
> +  tree var = ((struct cp_omp_loc_tree *) data)->var;
> +  location_t loc = ((struct cp_omp_loc_tree *) data)->loc;
> +  tree v = NULL_TREE;
> +  if (TREE_CODE (*tp) == VAR_DECL)

VAR_P ?

> +for (v = current_binding_level->names; v; v = TREE_CHAIN (v))
> +  if (v == var)
> + break;

> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 210c6cb9e4d..10603f4c39f 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -18379,6 +18379,10 @@ tsubst_stmt (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl)
>  
>   cp_finish_decl (decl, init, const_init, asmspec_tree, 0,
>   decomp);
> + 
> + if (flag_openmp && VAR_P (decl))
> +   finish_omp_allocate (false, DECL_SOURCE_LOCATION (decl),
> +decl, args, complain, in_decl);

The normal C++ FE way of doing stuff is perform all the substitutions
here (in tsubst_stmt, tsubst_expr and functions it calls) and then call
the various semantics.cc etc. finalizers, with already tsubsted arguments.
So, this would be the first time to do it differently.
IMHO better lookup_attribute here, if found tsubst what is needed and
only then call finish_omp_allocate.  Ideally when you've done the work
already pass the attr to the function as well.

>   if (ndecl != error_mark_node)
> cp_finish_decomp (ndecl, decomp);
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index dc3c11461fb..30c70c0b13e 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -10987,6 +10987,86 @@ finish_omp_for_block (tree bind, tree omp_for)
>return bind;
>  }
>  

Please add a function comment.

> +void
> +finish_omp_allocate (bool in_parsing, location_t loc, tree decl, tree args,
> +  tsubst_flags_t complain, tree in_decl)

As mentioned above, please drop the in_parsing, args, complain and in_decl
arguments and add attr.

> +{
> +  location_t loc2;
> +  tree attr = lookup_attribute ("omp allocate", DECL_ATTRIBUTES (decl));
> +  if (attr == NULL_TREE)
> +return;
> +
> +  tree allocator = TREE_PURPOSE (TREE_VALUE (attr));
> +  tree alignment = TREE_VALUE (TREE_VALUE (attr));
> +
> +  if (alignment == error_mark_node)
> +TREE_VALUE (TREE_VALUE (attr)) = NULL_TREE;
> +  else if (alignment)
> +{
> +  location_t loc2 = EXPR_LOCATION (alignment);
> +  if (!in_parsing)
> + alignment = tsubst_expr (alignment, args, complain, in_decl);
> +  alignment = fold_non_dependent_expr (alignment);
> +
> +  if (TREE_CODE (alignment) != INTEGER_CST
> +   || !INTEGRAL_TYPE_P (TREE_TYPE (alignment))

Please see e.g. the r13-8124 and r14-6193 changes.  Unless we have (possibly
violating standard?) hard restriction like we have on the collapse and
ordered clauses where we want the argument to be INTEGER_CST with the right
value already during parsing because parsing depends on it, we want to
handle 3 different cases, the directive appearing in non-template code,
in that case the diagnostics should be done right away, or in template code
where the arguments of the clauses etc. are not dependent (type nor value),
in that case we want the diagnostics to be also done at parsing time; not
strictly required by the standard, but QoI, get diagnostics of something
clearly incorrect

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-08 Thread Jakub Jelinek
On Wed, Nov 08, 2023 at 05:58:10PM +0100, Tobias Burnus wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -11739,6 +11739,7 @@ builtin_fnspec (tree callee)
>   return ".cO ";
>/* Realloc serves both as allocation point and deallocation point.  */
>case BUILT_IN_REALLOC:
> +  case BUILT_IN_GOMP_REALLOC:
>   return ".Cw ";

I must say I've never been sure if one needs to specify those ". " for
integral arguments for which nothing is known; if they would need to be,
then also all the BUILT_IN_GOMP_* other cases would be wrong, but not just
those, also BUILT_IN_*_CHK (which have extra size_t argument) or
here BUILT_IN_REALLOC itself.  So, let's hope it is ok as is.

Otherwise, the middle-end changes look just fine to me, and for Fortran
FE I'm afraid you know it much more than I do.

Jakub



Re: [PATCH] libgccjit: Make is_int return false on vector types

2023-12-08 Thread David Malcolm
On Thu, 2023-12-07 at 20:09 -0500, Antoni Boucher wrote:
> Can I merge this on master even though we're not in phase 1 anymore?

Yes, assuming it passes the regression testsuite.

> 
> On Thu, 2023-12-07 at 20:07 -0500, David Malcolm wrote:
> > On Thu, 2023-12-07 at 17:32 -0500, Antoni Boucher wrote:
> > > Hi.
> > > This patch changes the function is_int to return false on vector
> > > types.
> > > Thanks for the review.
> > 
> > Thanks; looks good to me
> > 
> > Dave
> > 
> 



Re: [PATCH] driver: Fix memory leak.

2023-12-08 Thread Costas Argyris
Does the simple XDELETEVEC patch need any more work?

I think it just fixes a leak for the JIT case where driver::finalize is
called.

On Thu, 7 Dec 2023 at 16:04, Jakub Jelinek  wrote:

> On Thu, Dec 07, 2023 at 04:01:11PM +, Costas Argyris wrote:
> > Thanks for all the explanations.
> >
> > In that case I restrict this patch to just freeing the buffer from
> > within driver::finalize only (I think it should be XDELETEVEC
> > instead of XDELETE, no?).
>
> Both macros are exactly the same, but XDELETEVEC is probably better
> counterpart to XNEWVEC.
>
> Jakub
>
>


[committed] libgcc: Fix config.in

2023-12-08 Thread Szabolcs Nagy
It was updated incorrectly in

  commit dbbfb52b0e9c66ee9d05b8fd17c4f44655e48463
  Author: Szabolcs Nagy 
  CommitDate: 2023-12-08 11:29:06 +

libgcc: aarch64: Configure check for __getauxval

so regenerate it.

libgcc/ChangeLog:

* config.in: Regenerate.
---
 libgcc/config.in | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/config.in b/libgcc/config.in
index 441d4d39b95..8f7dd437b0e 100644
--- a/libgcc/config.in
+++ b/libgcc/config.in
@@ -16,9 +16,6 @@
 /* Define to 1 if the assembler supports .variant_pcs. */
 #undef HAVE_AS_VARIANT_PCS
 
-/* Define to 1 if __getauxval is available. */
-#undef HAVE___GETAUXVAL
-
 /* Define to 1 if the target assembler supports thread-local storage. */
 #undef HAVE_CC_TLS
 
@@ -67,6 +64,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_UNISTD_H
 
+/* Define to 1 if __getauxval is available. */
+#undef HAVE___GETAUXVAL
+
 /* Define to the address where bug reports for this package should be sent. */
 #undef PACKAGE_BUGREPORT
 
-- 
2.25.1



Re: [patch] OpenMP: Add uses_allocators support

2023-12-08 Thread Jakub Jelinek
On Mon, Nov 20, 2023 at 11:42:02AM +0100, Tobias Burnus wrote:
> 2023-11-19  Tobias Burnus  
>   Chung-Lin Tang 
> 
> gcc/ChangeLog:
> 
>   * builtin-types.def (BT_FN_VOID_PTRMODE):
>   (BT_FN_PTRMODE_PTRMODE_INT_PTR): Add.
>   * gimplify.cc (gimplify_bind_expr): Diagnose missing
>   uses_allocators clause.
>   (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses,
>   gimplify_omp_workshare): Handle uses_allocators.
>   * omp-builtins.def (BUILT_IN_OMP_INIT_ALLOCATOR,
>   BUILT_IN_OMP_DESTROY_ALLOCATOR): Add.
>   * omp-low.cc (scan_sharing_clauses):

Missing description.

> +static tree
> +c_parser_omp_clause_uses_allocators (c_parser *parser, tree list)
> +{
> +  location_t clause_loc = c_parser_peek_token (parser)->location;
> +  tree t = NULL_TREE, nl = list;
> +  matching_parens parens;
> +  if (!parens.require_open (parser))
> +return list;
> +
> +  tree memspace_expr = NULL_TREE;
> +  tree traits_var = NULL_TREE;
> +
> +  struct item_tok
> +  {
> +location_t loc;
> +tree id;
> +item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {}
> +  };
> +  struct item { item_tok name, arg; };
> +  auto_vec *modifiers = NULL, *allocators = NULL;
> +  auto_vec *cur_list = new auto_vec (4);

This is certainly the first time I've seen pointers to auto_vec,
normally one uses just vec in such cases, auto_vec is used typically
on automatic variables to make sure the destruction is done.

But I think all the first parse it as a token soup without checking
anything and only in the second round actually check it is something
we've never done before in exactly the same situations.

The usual way would be to quickly peek at tokens to see if there is
: ahead and decide based on that.

See e.g. c_parser_omp_clause_allocate clause.

That has_modifiers check could be basically copied over with
the names of modifiers changed, or could be done in a loop, or
could be moved into a helper function which could be used
by c_parser_omp_clause_allocate and this function and perhaps
others, pass it the list of modifiers and have it return whether
there are modifiers or not.
c_parser_omp_clause_linear does this too (though, that has one of
the modifiers without arguments).

I'm afraid if parsing of every clause does the parsing so significantly
differently it will be a maintainance nightmare.

> @@ -23648,7 +23861,8 @@ c_parser_omp_target_exit_data (location_t loc, 
> c_parser *parser,
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_THREAD_LIMIT) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR)\
> - | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR))
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR)\

Space before \ please (and also  in the line above.

> +   if (strcmp (IDENTIFIER_POINTER (DECL_NAME (t)),
> +   "omp_null_allocator") == 0)
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "% cannot be used in "
> + "% clause");
> +   break;
> + }
> +
> +   if (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c)
> +   || OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c))
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "modifiers cannot be used with pre-defined "
> + "allocators");
> +   break;
> + }
> + }
> +   t = OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c);
> +   if (t != NULL_TREE
> +   && (TREE_CODE (t) != CONST_DECL
> +   || TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE
> +   || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (TREE_TYPE 
> (t))),
> +  "omp_memspace_handle_t") != 0))
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c), "memspace modifier must be "

Maybe % ?

> + "constant enum of % type");
> +   remove = true;
> +   break;
> + }
> +   t = OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c);
> +   if (t != NULL_TREE)
> + {
> +   bool type_err = false;
> +
> +   if (TREE_CODE (TREE_TYPE (t)) != ARRAY_TYPE
> +   || DECL_SIZE (t) == NULL_TREE)
> + type_err = true;
> +   else
> + {
> +   tree elem_t = TREE_TYPE (TREE_TYPE (t));
> +   if (TREE_CODE (elem_t) != RECORD_TYPE
> +   || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (elem_t)),
> +  "omp_alloctrait_t") != 0
> +   || !TYPE_READONLY (elem_t))
> + type_err = true;
> + }
> +   if (type_err)
> + {
> +   if (TREE_CODE (t) != ERROR_MARK)

t != error_mark_node

> + error_at (OMP_CLAUSE_LOCATION (c), "traits array %qE must "
> +  

Re: [PATCH] driver: Fix memory leak.

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 12:18:50PM +, Costas Argyris wrote:
> Does the simple XDELETEVEC patch need any more work?

Well, it needs to be actually tested and posted and committed.
I can take care of it in my next bootstraps.

Jakub



Re: [PATCH] RISC-V: Add vectorized strlen.

2023-12-08 Thread Robin Dapp
After Juzhe's vsetvl fix earlier this week this seems safe to push.
Going to do so later.

I tested on rv64gcv_zvl128b with -minline-strlen and didn't see
regressions apart from zbb-strlen-disabled-2.c which will always
fail with -minline-strlen because it expects -mno-inline-strlen.

Regards
 Robin



Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-08 Thread Robin Dapp
Similar to strlen, this now seems safe to push.  Will do so
later.

I tested on rv64gcv_zvl128b with -minline-strlen and didn't see
regressions.

Regards
 Robin


RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 8, 2023 10:28 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > --param vect-partial-vector-usage=2 would, no?
> > >
> > I.. didn't even know it went to 2!
> >
> > > > In principal I suppose I could mask the individual stmts, that should 
> > > > handle
> the
> > > future case when
> > > > This is relaxed to supposed non-fix length buffers?
> > >
> > > Well, it looks wrong - either put in an assert that we start with a
> > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > generate wrong code.
> > >
> > > That said, I think you need to apply the masking on the original
> > > stmts[], before reducing them, no?
> >
> > Yeah, I've done so now.  For simplicity I've just kept the final masking 
> > always as
> well
> > and just leave it up to the optimizers to drop it when it's superfluous.
> >
> > Simple testcase:
> >
> > #ifndef N
> > #define N 837
> > #endif
> > float vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(double x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >if (vect_a[i] > x)
> >  break;
> >vect_a[i] = x;
> >
> >  }
> >  return ret;
> > }
> >
> > Looks good now. After this one there's only one patch left, the dependency
> analysis.
> > I'm almost done with the cleanup/respin, but want to take the weekend to
> double check and will post it first thing Monday morning.
> >
> > Did you want to see the testsuite changes as well again? I've basically 
> > just added
> the right dg-requires-effective and add-options etc.
> 
> Yes please.
> 
> > Thanks for all the reviews!
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > (check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > lhs.
> > (vectorizable_early_exit): New.
> > (vect_analyze_stmt, vect_transform_stmt): Use it.
> > (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> ae12523576d29744d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >  {
> >gcc_assert (!vectype
> > + || is_a  (pattern_stmt)
> >   || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >   == vect_use_mask_type_p (orig_stmt_info)));
> >STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> *vinfo,
> > true if bool VAR can and should be optimized that way.  Assume it 
> > shouldn't
> > in case it's a result of a comparison which can be directly vectorized 
> > into
> > a vector comparison.  Fills in STMTS with all stmts visited during the
> > -   walk.  */
> > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> any
> > +   codegen associated with the boolean condition.  */
> >
> >  static bool
> > -check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
> > +check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
> > +   bool analyze_only)
> >  {
> >tree rhs1;
> >enum tree_code rhs_code;
> > +  gassign *def_stmt = NULL;
> >
> >stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > -  if (!def_stmt_info)
> > +  if (!def_stmt_info && !analyze_only)
> >  return false;
> > +  else if (!def_stmt_info)
> > +/* If we're a only analyzing we won't be codegen-ing the statements 
> > and are
> > +   only after if the types match.  In that case we can accept loop 
> > invariant
> > +   values.  */
> > +def_stmt = dyn_cast  (SSA_NAME_DEF_STMT (var));
> > +  else
> > +def_stmt = dyn_cast  (def_stmt_info->stmt);
> >
> 
> Hmm, but we're visiting them then?  I wonder how you get along
> without doing adjustmens on the uses if you consider
> 
> _1 = a < b;
> _2 = c != d;
> _3 = _1 | _2;
> if (_3 != 0)
>   exit loop;
> 
> thus a combined condition like
> 
> if (a < b || c != d)
> 
> that we if-converted.  We need to recognize that _1, _2 and _3 have
> mask uses and thus possibly adjust them.
> 
> What bad happens if you drop 'analyze_only'?  We're not really
> rewriting anything ther

Re: [Patch] OpenMP: Support acquires/release in 'omp require atomic_default_mem_order'

2023-12-08 Thread Jakub Jelinek
On Tue, Nov 28, 2023 at 12:28:05PM +0100, Tobias Burnus wrote:
> I stumbled over this omission when looking at Sandra's patch. It turned out 
> that this is
> a new OpenMP 5.2 feature - probably added to simplify/unify the syntax. I 
> guess the reason
> that release/acquire wasn't added before is that it cannot be universally be 
> used - read/write
> do only accept one of them.

I thought when this was discussed that it was meant to behave right (choose
some more appropriate memory model if one was not allowed), but reading 5.2
I think that is not what ended up in the spec, because [213:11-13] says that
atomic_default_mem_order is as if the argument appeared on any atomic
directive without explicit mem-order clause and atomic directive has the
[314:9-10] restrictions.

I'd bring this to omp-lang whether it was really meant that
#pragma omp requires atomic_default_mem_order (release)
int foo (int *p) {
  int t;
  #pragma omp atomic read
t = *p;
  return t;
}
and
#pragma omp requires atomic_default_mem_order (acquire)
void bar (int *p) {
  #pragma omp atomic write
*p = 0;
}
are meant to be invalid.

Another comment, atomic_default_mem_order arguments aren't handled
just in the requires parsing, but also in context selectors.
So, the additions would need to be reflected in
omp_check_context_selector and omp_context_selector_matches
as well.

Jakub



[PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-12-08 Thread Alexander Monakov
I would like to propose Valgrind integration previously sent as RFC for trunk.

Arsen and Sam, since you commented on the RFC I wonder if you can have
a look at the proposed configure and documentation changes and let me
know if they look fine for you? For reference, gccinstall.info will say:

‘--enable-valgrind-interop’
 Provide wrappers for Valgrind client requests in libgcc, which are
 used for ‘-fvalgrind-annotations’.  Requires Valgrind header files
 for the target (in the build-time sysroot if building a
 cross-compiler).

and GCC manual will document the new option as:

 -fvalgrind-annotations
 Emit Valgrind client requests annotating object lifetime
 boundaries.  This allows to detect attempts to access fields of a
 C++ object after its destructor has completed (but storage was
 not deallocated yet), or to initialize it in advance from
 "operator new" rather than the constructor.

 This instrumentation relies on presence of
 "__gcc_vgmc_make_mem_undefined" function that wraps the
 corresponding Valgrind client request. It is provided by libgcc
 when it is configured with --enable-valgrind-interop.  Otherwise,
 you can implement it like this:

 #include 

 void
 __gcc_vgmc_make_mem_undefined (void *addr, size_t size)
 {
   VALGRIND_MAKE_MEM_UNDEFINED (addr, size);
 }

Changes since the RFC:

* Add documentation and tests.

* Drop 'emit-' from -fvalgrind-emit-annotations.

* Use --enable-valgrind-interop instead of overloading
  --enable-valgrind-annotations.

* Do not build the wrapper unless --enable-valgrind-interop is given and
  Valgrind headers are present.

* Clean up libgcc configure changes.
* Reword comments.

Daniil Frolov (1):
  object lifetime instrumentation for Valgrind [PR66487]

 gcc/Makefile.in   |   1 +
 gcc/builtins.def  |   3 +
 gcc/common.opt|   4 +
 gcc/doc/install.texi  |   5 +
 gcc/doc/invoke.texi   |  27 +
 gcc/gimple-valgrind-interop.cc| 112 ++
 gcc/passes.def|   1 +
 gcc/testsuite/g++.dg/valgrind-annotations-1.C |  22 
 gcc/testsuite/g++.dg/valgrind-annotations-2.C |  12 ++
 gcc/tree-pass.h   |   1 +
 libgcc/Makefile.in|   3 +
 libgcc/config.in  |   6 +
 libgcc/configure  |  22 +++-
 libgcc/configure.ac   |  15 ++-
 libgcc/libgcc2.h  |   2 +
 libgcc/valgrind-interop.c |  40 +++
 16 files changed, 274 insertions(+), 2 deletions(-)
 create mode 100644 gcc/gimple-valgrind-interop.cc
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-1.C
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-2.C
 create mode 100644 libgcc/valgrind-interop.c

-- 
2.39.2



[PATCH 1/1] object lifetime instrumentation for Valgrind [PR66487]

2023-12-08 Thread Alexander Monakov
From: Daniil Frolov 

PR 66487 is asking to provide sanitizer-like detection for C++ object
lifetime violations that are worked around with -fno-lifetime-dse or
-flifetime-dse=1 in Firefox, LLVM (PR 106943), OpenJade (PR 69534).

The discussion in the PR was centered around extending MSan, but MSan
was not ported to GCC (and requires rebuilding everything with
instrumentation).

Instead, allow Valgrind to see lifetime boundaries by emitting client
requests along *this = { CLOBBER }.  The client request marks the
"clobbered" memory as undefined for Valgrind; clobbering assignments
mark the beginning of ctor and end of dtor execution for C++ objects.
Hence, attempts to read object storage after the destructor, or
"pre-initialize" its fields prior to the constructor will be caught.

Valgrind client requests are offered as macros that emit inline asm.
For use in code generation, let's wrap them as libgcc builtins.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-valgrind-interop.o.
* builtins.def (BUILT_IN_VALGRIND_MAKE_UNDEFINED): New.
* common.opt (-fvalgrind-annotations): New option.
* doc/install.texi (--enable-valgrind-interop): Document.
* doc/invoke.texi (-fvalgrind-annotations): Document.
* passes.def (pass_instrument_valgrind): Add.
* tree-pass.h (make_pass_instrument_valgrind): Declare.
* gimple-valgrind-interop.cc: New file.

libgcc/ChangeLog:

* Makefile.in (LIB2ADD): Add valgrind-interop.c.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (--enable-valgrind-interop): New flag.
* libgcc2.h (__gcc_vgmc_make_mem_undefined): Declare.
* valgrind-interop.c: New file.

gcc/testsuite/ChangeLog:

* g++.dg/valgrind-annotations-1.C: New test.
* g++.dg/valgrind-annotations-2.C: New test.

Co-authored-by: Alexander Monakov 
---
 gcc/Makefile.in   |   1 +
 gcc/builtins.def  |   3 +
 gcc/common.opt|   4 +
 gcc/doc/install.texi  |   5 +
 gcc/doc/invoke.texi   |  27 +
 gcc/gimple-valgrind-interop.cc| 112 ++
 gcc/passes.def|   1 +
 gcc/testsuite/g++.dg/valgrind-annotations-1.C |  22 
 gcc/testsuite/g++.dg/valgrind-annotations-2.C |  12 ++
 gcc/tree-pass.h   |   1 +
 libgcc/Makefile.in|   3 +
 libgcc/config.in  |   6 +
 libgcc/configure  |  22 +++-
 libgcc/configure.ac   |  15 ++-
 libgcc/libgcc2.h  |   2 +
 libgcc/valgrind-interop.c |  40 +++
 16 files changed, 274 insertions(+), 2 deletions(-)
 create mode 100644 gcc/gimple-valgrind-interop.cc
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-1.C
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-2.C
 create mode 100644 libgcc/valgrind-interop.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 68410a86af..4db18387c1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1506,6 +1506,7 @@ OBJS = \
gimple-ssa-warn-restrict.o \
gimple-streamer-in.o \
gimple-streamer-out.o \
+   gimple-valgrind-interop.o \
gimple-walk.o \
gimple-warn-recursion.o \
gimplify.o \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index f03df32f98..b05e20e062 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1194,6 +1194,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 /* Control Flow Redundancy hardening out-of-line checker.  */
 DEF_BUILTIN_STUB (BUILT_IN___HARDCFR_CHECK, "__builtin___hardcfr_check")
 
+/* Wrappers for Valgrind client requests.  */
+DEF_EXT_LIB_BUILTIN (BUILT_IN_VALGRIND_MAKE_UNDEFINED, 
"__gcc_vgmc_make_mem_undefined", BT_FN_VOID_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
+
 /* Synchronization Primitives.  */
 #include "sync-builtins.def"
 
diff --git a/gcc/common.opt b/gcc/common.opt
index f070aff8cb..b53565fc1a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3372,6 +3372,10 @@ Enum(auto_init_type) String(pattern) 
Value(AUTO_INIT_PATTERN)
 EnumValue
 Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO)
 
+fvalgrind-annotations
+Common Var(flag_valgrind_annotations) Optimization
+Annotate lifetime boundaries with Valgrind client requests.
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index c1128d9274..aaf0213bbf 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1567,6 +1567,11 @@ Disable TM clone registry in libgcc. It is enabled in 
libgcc by default.
 This option helps to reduce code siz

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-08 Thread Richard Biener
On Fri, 8 Dec 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, December 8, 2023 10:28 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > codegen of exit code
> > 
> > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > 
> > > > --param vect-partial-vector-usage=2 would, no?
> > > >
> > > I.. didn't even know it went to 2!
> > >
> > > > > In principal I suppose I could mask the individual stmts, that should 
> > > > > handle
> > the
> > > > future case when
> > > > > This is relaxed to supposed non-fix length buffers?
> > > >
> > > > Well, it looks wrong - either put in an assert that we start with a
> > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > generate wrong code.
> > > >
> > > > That said, I think you need to apply the masking on the original
> > > > stmts[], before reducing them, no?
> > >
> > > Yeah, I've done so now.  For simplicity I've just kept the final masking 
> > > always as
> > well
> > > and just leave it up to the optimizers to drop it when it's superfluous.
> > >
> > > Simple testcase:
> > >
> > > #ifndef N
> > > #define N 837
> > > #endif
> > > float vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(double x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >if (vect_a[i] > x)
> > >  break;
> > >vect_a[i] = x;
> > >
> > >  }
> > >  return ret;
> > > }
> > >
> > > Looks good now. After this one there's only one patch left, the dependency
> > analysis.
> > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > double check and will post it first thing Monday morning.
> > >
> > > Did you want to see the testsuite changes as well again? I've basically 
> > > just added
> > the right dg-requires-effective and add-options etc.
> > 
> > Yes please.
> > 
> > > Thanks for all the reviews!
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > >   (check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > >   vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > >   * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > >   lhs.
> > >   (vectorizable_early_exit): New.
> > >   (vect_analyze_stmt, vect_transform_stmt): Use it.
> > >   (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> > >
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index
> > 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > ae12523576d29744d 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > *pattern_stmt,
> > >if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > >  {
> > >gcc_assert (!vectype
> > > +   || is_a  (pattern_stmt)
> > > || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > > == vect_use_mask_type_p (orig_stmt_info)));
> > >STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > *vinfo,
> > > true if bool VAR can and should be optimized that way.  Assume it 
> > > shouldn't
> > > in case it's a result of a comparison which can be directly 
> > > vectorized into
> > > a vector comparison.  Fills in STMTS with all stmts visited during the
> > > -   walk.  */
> > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not 
> > > perform
> > any
> > > +   codegen associated with the boolean condition.  */
> > >
> > >  static bool
> > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
> > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
> > > + bool analyze_only)
> > >  {
> > >tree rhs1;
> > >enum tree_code rhs_code;
> > > +  gassign *def_stmt = NULL;
> > >
> > >stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > > -  if (!def_stmt_info)
> > > +  if (!def_stmt_info && !analyze_only)
> > >  return false;
> > > +  else if (!def_stmt_info)
> > > +/* If we're a only analyzing we won't be codegen-ing the statements 
> > > and are
> > > +   only after if the types match.  In that case we can accept loop 
> > > invariant
> > > +   values.  */
> > > +def_stmt = dyn_cast  (SSA_NAME_DEF_STMT (var));
> > > +  else
> > > +def_stmt = dyn_cast  (def_stmt_info->stmt);
> > >
> > 
> > Hmm, but we're visiting them then?  I wonder how you get along
> > without doing adjustmens on the uses if you consider
> > 
> > _1 = a < b;
> > _2 = c != d;
> > _3 = _1 | _2

Re: [PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 04:49:49PM +0300, Alexander Monakov wrote:
> I would like to propose Valgrind integration previously sent as RFC for trunk.
> 
> Arsen and Sam, since you commented on the RFC I wonder if you can have
> a look at the proposed configure and documentation changes and let me
> know if they look fine for you? For reference, gccinstall.info will say:

Does VALGRIND_MAKE_MEM_UNDEFINED macro ever change onarches once implemented
there?  Wouldn't this be better done by emitting the sequence inline?
Even if it is done in libgcc, it is part of ABI.

So, basically add a new optab, valgrind_request, where each target would
define_insn whatever is needed (it will need to be a single pattern, it
can't be split among multiple) and sorry on -fvalgrind-annotations if the
optab is not defined.

Advantage would be that --enable-valgrind-interop nor building against
valgrind headers is not needed.

In your version, did the new function go just to libgcc.a or to
libgcc_s.so.1?  Having a function in there or not dependent on
--enable-valgrind-interop would turn it into an ABI configure option.

Jakub



Re: [PATCH] libgcov: Call __builtin_fork instead of fork

2023-12-08 Thread Jakub Jelinek
On Sat, Dec 02, 2023 at 01:43:22PM +0100, Florian Weimer wrote:
> Some targets do not provide a prototype for fork, and compilation now
> fails with an implicit-function-declaration error.
> 
> libgcc/
> 
>   * libgcov-interface.c (__gcov_fork):

Description missing (Use __builtin_fork instead of fork.).

Ok with that change.
> 
> Generated code is the same on x86_64-linux-gnu.  Okay for trunk?
> 
> Thanks,
> Florian
> ---
>  libgcc/libgcov-interface.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libgcc/libgcov-interface.c b/libgcc/libgcov-interface.c
> index b2ee9308641..d166e98510d 100644
> --- a/libgcc/libgcov-interface.c
> +++ b/libgcc/libgcov-interface.c
> @@ -182,7 +182,7 @@ pid_t
>  __gcov_fork (void)
>  {
>pid_t pid;
> -  pid = fork ();
> +  pid = __builtin_fork ();
>if (pid == 0)
>  {
>__GTHREAD_MUTEX_INIT_FUNCTION (&__gcov_mx);
> 
> base-commit: 193ef02a7f4f3e5349ad9cf8d3d4df466a99b677

Jakub



Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-08 Thread Thomas Schwinge
Hi!

On 2023-12-07T13:43:17+, Andrew Stubbs  wrote:
> @Thomas, there are questions for you below

It's been a while that I've been working on this; I'll try to produce
some coherent answers now.

> On 22/11/2023 17:07, Tobias Burnus wrote:
>> Let's start with the patch itself:
>>> --- a/libgomp/target.c
>>> +++ b/libgomp/target.c
>>> ...
>>> +static struct gomp_device_descr *
>>> +get_device_for_page_locked (void)
>>> +{
>>> + gomp_debug (0, "%s\n",
>>> + __FUNCTION__);
>>> +
>>> + struct gomp_device_descr *device;
>>> +#ifdef HAVE_SYNC_BUILTINS
>>> + device
>>> +   = __atomic_load_n (&device_for_page_locked, MEMMODEL_RELAXED);
>>> + if (device == (void *) -1)
>>> +   {
>>> + gomp_debug (0, " init\n");
>>> +
>>> + gomp_init_targets_once ();
>>> +
>>> + device = NULL;
>>> + for (int i = 0; i < num_devices; ++i)
>>
>> Given that this function just sets a single variable based on whether the
>> page_locked_host_alloc_func function pointer exists, wouldn't it be much
>> simpler to just do all this handling in   gomp_target_init  ?
>
> @Thomas, care to comment on this?

>From what I remember, we cannot assume that 'gomp_target_init' has
already been done when we get here; therefore 'gomp_init_targets_once' is
being called here.  We may get to 'get_device_for_page_locked' via
host-side OpenMP, in code that doesn't contain any OpenMP 'target'
offloading things.  Therefore, this was (a) necessary to make that work,
and (b) did seem to be a useful abstraction to me.

>>> + for (int i = 0; i < num_devices; ++i)
>>> ...
>>> +/* We consider only the first device of potentially several of the
>>> +   same type as this functionality is not specific to an individual
>>> +   offloading device, but instead relates to the host-side
>>> +   implementation of the respective offloading implementation. */
>>> +if (devices[i].target_id != 0)
>>> +  continue;
>>> +
>>> +if (!devices[i].page_locked_host_alloc_func)
>>> +  continue;
>>> ...
>>> +if (device)
>>> +  gomp_fatal ("Unclear how %s and %s libgomp plugins may"
>>> +  " simultaneously provide functionality"
>>> +  " for page-locked memory",
>>> +  device->name, devices[i].name);
>>> +else
>>> +  device = &devices[i];
>>
>> I find this a bit inconsistent: If - let's say - GCN does not not
>> provide its
>> own pinning, the code assumes that CUDA pinning is just fine.  However,
>> if both
>> support it, CUDA pinning suddenly is not fine for GCN.
>
> I think it means that we need to revisit this code if that situation
> ever occurs. Again, @Thomas?

That's correct.  As you know, I don't like doing half-baked things.  ;-)
However, this did seem like a useful stepping-stone to me, to get such a
thing implemented at all; we do understand that this won't handle all
(future) cases, thus the 'gomp_fatal' to catch that loudly.

Once we are in the situation where we have multiple ways of allocating
large amounts of pinned memory, we'll have to see how to deal with that.
(May, of course, already now work out how conceptually that should be
done, possibly via OpenMP committee/specification work, if necessary?)
(As for the future implementation, maybe *allocate* via one device, and
then *register* the allocation with the other devices.)

>> Additionally, all wording suggests that it does not matter for CUDA for
>> which
>> device access we want to optimize the pinning. But the code above also
>> fails if
>> I have a system with two Nvidia cards.

Why/how does the code fail in that case?  Assuming I understood the
question correctly, the 'if (devices[i].target_id != 0) continue;' is
meant to handle that case.

>> From the wording, it sounds as
>> if just
>> checking whether the  device->type  is different would do.

Maybe, but I'm not sure I follow what exactly you mean.

>> But all in all, I wonder whether it wouldn't be much simpler to state
>> something
>> like the following (where applicable):
>>
>> If first device that provided pinning support is used; the assumption is
>> that
>> all other devices

"of the same kind" or also "of different kinds"?

>> and the host can access this memory without measurable
>> performance penalty compared to a normal page lock and that having multiple
>> device types or host/device NUMA aware pinning support in the plugin is not
>> available.

If I understood you correctly, that, however, is not correct: if you
(hypothetically) allocate pinned memory via GCN (or even the small amount
you get via the host), then yes, a nvptx device will be able to access
it, but it won't see the performance gains that you'd get if you had
allocated via nvptx.  (You'll need to register existing memory regions
with the nvptx device/CUDA, if I offhand remember correctly, which is
subject to later work.)


Hopefully that did help?


Grüße
 Thomas


>> NOTE: For OpenMP 6.0's OMP_AVAILABLE_DEVICES environment variable,
>> device-set
>> memory spa

[patch] OpenMP: Handle same-directive mapped vars with pointer predefined firstprivate [PR110639]

2023-12-08 Thread Tobias Burnus

This patch fixes the issue:

  int a[100];
  p = &a[0];

  #pragma omp target map(a)
p[0] = p[99] = 3;

where 'p' is predetermined firstprivate, i.e. it is firstprivatized
but its address gets updated to the device address of 'a' as there is
associated storage for the value of 'p', i.e. its pointee.


[This is a C/C++-only feature that cannot be replicated by using a single 
clause.
('target data map(a) use_device_ptr(p)' + 'target is_device_ptr(p)' would do
so in two steps. - or 'p2 = omp_get_mapped_ptr(p, devnum)' + 'target 
is_device_ptr(p2)'.)]

Before this only worked when that storage was mapped before and not on the same
directive.

The gimplify_scan_omp_clauses change was done when I saw some runtime fails; I 
think
those were due to a bug in libgomp (now fixed) and not due to having two pointer
privatisations in a now different order. Still, they at least prevent mapping
'this' multiple times when 'this' is not 'this' but __closure->this which is at 
least
a missed optimization.  And also for libgomp.c++/pr108286.C which has a normal
'this' and map(tofrom:*this [len: 16]).


Build and tested without offloading and with nvptx offloading.
Comments, remarks, suggestions?

* * *

(I wonder whether our current approach of removing explicit MAP if its
DECL is unsued is the right one if there is any GOVD_MAP_0LEN_ARRAY around
- or even any OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION.)

(See new libgomp.c-c++-common/target-implicit-map-6.c; BTW, I tried:
before '(void) a;' but that only worked with C and not with C++.)

* * *

The other issue in the PR (still to be done) is for code like:

  int a[100];
  p = &a[0];

  #pragma omp target map(a[20:20])  // Map only a[20] to a[40], but p points to 
&a[0]
p[20] = p[30] = 3;

where 'p' points to the base address of 'a' but p[0] == a[0] it not actually
mapped. As we currently do not keep track of base pointer, this won't work.
I have not (yet) explored how to best implement this.

* * *

OpenMP Spec:

The first feature is not new, but I have not checked the wording in 4.5 or 5.0;
it might be that older versions only required it to work for storage mapped 
before
the current taget directive. But at least TR12 is very explicit in permitting it
and the (nonpublic) issue which lead to the 5.1 change also uses this. (See PR.)
(The second feature is definitely new in OpenMP 5.1.)

TR12 states in "14.8 target Construct" [379:8-10]:

"[C/C++] If a list item in a map clause has a base pointer that is 
predetermined firstprivate
(see Section 6.1.1) and on entry to the target region the list item is mapped, 
the firstprivate
pointer is updated via corresponding base pointer initialization."

(For OpenMP 5.1, read its Section 2.21.7.2.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Handle same-directive mapped vars with pointer predefined firstprivate [PR110639]

Predefined 'firstprivate' for pointer variables firstprivatizes the pointer
but if it is associated with a mapped target, its address is updated to the
corresponding target. (If not, the host value remains.)

This commit extends this handling to also update the pointer address for
storaged mapped on the same directive.

The 'gimplify_scan_omp_clauses' change avoids adding an additional
  map(alloc:this) (+ptr assignment)
when there is already a
  map(tofrom:*this) (+ptr assignment)
This shows up for libgomp.c++/pr108286.C and also when 'this' is
actually '__closure->this' (-> g++.dg/gomp/target-{this-{2,4},lambda-1}.C).

	PR middle-end/110639

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_adjust_omp_clauses_data): Add
	append_list.
	(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Add
	GOVD_MAP_0LEN_ARRAY clauses at the end.
	(gimplify_scan_omp_clauses): Mark also '*var' as found not only
	'var'.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_internal): Handle also variables
	mapped in the same directive for GOVD_MAP_0LEN_ARRAY.
	* testsuite/libgomp.c++/pr108286.C: Add gimple tree-scan test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-6.c: New test.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-this-2.C: Remove 'this' pointer mapping alreay
	mapped via __closure->this.
* g++.dg/gomp/target-this-4.C: Likewise.
* g++.dg/gomp/target-lambda-1.C: Likewise. Move 'iptr' pointer
	mapping to the end in scan-tree-dump.

 gcc/gimplify.cc|  45 -
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C|   4 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C  |   4 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C  |   6 +-
 libgomp/target.c   |  11 +-
 libgomp/testsuite/libgomp.c++/pr108286.C   |   4 +
 .../libgomp.c-c++-common/target-implicit-map-6.c  

Re: [patch] OpenMP: Handle same-directive mapped vars with pointer predefined firstprivate [PR110639]

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 03:28:59PM +0100, Tobias Burnus wrote:
> This patch fixes the issue:
> 
>   int a[100];
>   p = &a[0];
> 
>   #pragma omp target map(a)
> p[0] = p[99] = 3;
> 
> where 'p' is predetermined firstprivate, i.e. it is firstprivatized
> but its address gets updated to the device address of 'a' as there is
> associated storage for the value of 'p', i.e. its pointee.

I think the above is invalid even in TR12.

> OpenMP Spec:
> 
> The first feature is not new, but I have not checked the wording in 4.5 or 
> 5.0;
> it might be that older versions only required it to work for storage mapped 
> before
> the current taget directive. But at least TR12 is very explicit in permitting 
> it
> and the (nonpublic) issue which lead to the 5.1 change also uses this. (See 
> PR.)
> (The second feature is definitely new in OpenMP 5.1.)
> 
> TR12 states in "14.8 target Construct" [379:8-10]:
> 
> "[C/C++] If a list item in a map clause has a base pointer that is 
> predetermined firstprivate
> (see Section 6.1.1) and on entry to the target region the list item is 
> mapped, the firstprivate
> pointer is updated via corresponding base pointer initialization."

The list item (a) in the above case doesn't have a base pointer, but base
array.  See the glossary.  So, the rule would be about something like
int *p = ...;
#pragma omp target map (p[20:100]) or similar, not about an array and an
unrelated pointer.

Jakub



Re: [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-12-08 Thread Szabolcs Nagy
The 11/29/2023 15:15, Richard Earnshaw wrote:
> On 13/11/2023 11:37, Victor Do Nascimento wrote:
> > +/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic,
> > +   bits[23:20].  The expected value is 0b0011.  Check that.  */
> > +#define HAS_LSE128() ({\
> > +  unsigned long val;   \
> > +  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val));  \
> > +  (val & 0xf0) >= 0x30;\
> > +})
> > +
> 
> The pseudo-code for this register reads:
> 
> if PSTATE.EL == EL0 then
>   if IsFeatureImplemented(FEAT_IDST) then
> if EL2Enabled() && HCR_EL2.TGE == '1' then
>   AArch64.SystemAccessTrap(EL2, 0x18);
> else
>   AArch64.SystemAccessTrap(EL1, 0x18);
>   else
> UNDEFINED;
> ...
> 
> So this instruction may result in SIGILL if run on cores without FEAT_IDST.
> SystemAccessTrap just punts the problem up to the kernel or hypervisor as
> well.

yes, HWCAP_CPUID has to be checked to see if
linux traps and emulates the mrs for userspace.

> I think we need a hwcap bit to work this out, which is the preferred way on

yes, use hwcap instead of id reg (hwcap2 is
passed to aarch64 ifuncs or __getauxval works)

> Linux anyway.  Something like this? :) 
> https://lore.kernel.org/linux-arm-kernel/20231003124544.858804-2-joey.go...@arm.com/T/

note that there was no linux release since this
got added.

we can add the hwcap values tentatively, but
there is a risk of revert on the kernel side
(which means libatomic vs linux abi break) so
i would only commit the patch into gcc after
a linux release is tagged.


Re: [PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2023-12-08 Thread Szabolcs Nagy
The 11/13/2023 11:47, Victor Do Nascimento wrote:
> +/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic,
> +   bits[23:20].  The expected value is 0b0011.  Check that.  */
> +#define HAS_LRCPC3() ({  \
> +  unsigned long val; \
> +  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (val));\
> +  (val & 0xf0) >= 0x30;  \
> +})

same comment as for the lse128 patch: use hwcaps
(and wait for linux release before committing).


RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 8, 2023 2:00 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, December 8, 2023 10:28 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > > codegen of exit code
> > >
> > > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > >
> > > > > --param vect-partial-vector-usage=2 would, no?
> > > > >
> > > > I.. didn't even know it went to 2!
> > > >
> > > > > > In principal I suppose I could mask the individual stmts, that 
> > > > > > should handle
> > > the
> > > > > future case when
> > > > > > This is relaxed to supposed non-fix length buffers?
> > > > >
> > > > > Well, it looks wrong - either put in an assert that we start with a
> > > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > > generate wrong code.
> > > > >
> > > > > That said, I think you need to apply the masking on the original
> > > > > stmts[], before reducing them, no?
> > > >
> > > > Yeah, I've done so now.  For simplicity I've just kept the final 
> > > > masking always
> as
> > > well
> > > > and just leave it up to the optimizers to drop it when it's superfluous.
> > > >
> > > > Simple testcase:
> > > >
> > > > #ifndef N
> > > > #define N 837
> > > > #endif
> > > > float vect_a[N];
> > > > unsigned vect_b[N];
> > > >
> > > > unsigned test4(double x)
> > > > {
> > > >  unsigned ret = 0;
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >if (vect_a[i] > x)
> > > >  break;
> > > >vect_a[i] = x;
> > > >
> > > >  }
> > > >  return ret;
> > > > }
> > > >
> > > > Looks good now. After this one there's only one patch left, the 
> > > > dependency
> > > analysis.
> > > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > > double check and will post it first thing Monday morning.
> > > >
> > > > Did you want to see the testsuite changes as well again? I've basically 
> > > > just
> added
> > > the right dg-requires-effective and add-options etc.
> > >
> > > Yes please.
> > >
> > > > Thanks for all the reviews!
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-vect-patterns.cc (vect_init_pattern_stmt): Support 
> > > > gconds.
> > > > (check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > > > vect_recog_bool_pattern, sort_after_uid): Support gconds type 
> > > > analysis.
> > > > * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts 
> > > > without
> > > > lhs.
> > > > (vectorizable_early_exit): New.
> > > > (vect_analyze_stmt, vect_transform_stmt): Use it.
> > > > (vect_is_simple_use, vect_get_vector_types_for_stmt): Support 
> > > > gcond.
> > > >
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index
> > >
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > > ae12523576d29744d 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > > *pattern_stmt,
> > > >if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > > >  {
> > > >gcc_assert (!vectype
> > > > + || is_a  (pattern_stmt)
> > > >   || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > > >   == vect_use_mask_type_p (orig_stmt_info)));
> > > >STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > > *vinfo,
> > > > true if bool VAR can and should be optimized that way.  Assume it 
> > > > shouldn't
> > > > in case it's a result of a comparison which can be directly 
> > > > vectorized into
> > > > a vector comparison.  Fills in STMTS with all stmts visited during 
> > > > the
> > > > -   walk.  */
> > > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not 
> > > > perform
> > > any
> > > > +   codegen associated with the boolean condition.  */
> > > >
> > > >  static bool
> > > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set 
> > > > &stmts)
> > > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set 
> > > > &stmts,
> > > > +   bool analyze_only)
> > > >  {
> > > >tree rhs1;
> > > >enum tree_code rhs_code;
> > > > +  gassign *def_stmt = NULL;
> > > >
> > > >stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);

[PATCH v3] A new copy propagation and PHI elimination pass

2023-12-08 Thread Filip Kastl
> > Hi,
> > 
> > this is a patch that I submitted two months ago as an RFC. I added some 
> > polish
> > since.
> > 
> > It is a new lightweight pass that removes redundant PHI functions and as a
> > bonus does basic copy propagation. With Jan Hubička we measured that it is 
> > able
> > to remove usually more than 5% of all PHI functions when run among early 
> > passes
> > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > later optimized away but with this pass it is possible to remove them early
> > enough so that they don't get streamed when runing LTO (and also potentially
> > inlined at multiple places). It is also able to remove some redundant PHIs
> > that otherwise would still be present during RTL expansion.
> > 
> > Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> > with and without this patch. These are the sizes of .debug_info and
> > .debug_loclists
> > 
> > .debug_info without patch 181694311
> > .debug_infowith patch 181692320
> > +0.0011% change
> > 
> > .debug_loclists without patch 47934753
> > .debug_loclistswith patch 47934966
> > -0.0004% change
> > 
> > I wanted to use dwlocstat to compare debug coverages but didn't manage to 
> > get
> > the program working on my machine sadly. Hope this suffices. Seems to me 
> > that
> > my patch doesn't have a significant impact on debug info.
> > 
> > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > 
> > * One testcase (pr79691.c) did regress. However that is because the test is
> > dependent on a certain variable not being copy propagated. I will go into 
> > more
> > detail about this in a reply to this mail.
> > 
> > Ok to commit?
> 
> This is a second version of the patch.  In this version, I modified the
> pr79691.c testcase so that it works as intended with other changes from the
> patch.
> 
> The pr79691.c testcase checks that we get constants from snprintf calls and
> that they simplify into a single constant.  The testcase doesn't account for
> the fact that this constant may be further copy propagated which is exactly
> what happens with this patch applied.
> 
> Bootstrapped and tested on x86_64-pc-linux-gnu.
> 
> Ok to commit?

This is the third version of the patch. In this version, I adressed most of
Richards remarks about the second version. Here is a summary of changes I made:

- Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
- Use simple_dce_from_worklist to remove propagated statements
- Use existing replace_uses API instead of reinventing it
  - This allowed me to get rid of some now redundant cleanup code
- Encapsulate the SCC finding into a class
- Rework stmt_may_generate_copy to get rid of redundant checks
- Add check that PHI doesn't contain two non-SSA-name values to
  stmt_may_generate_copy
- Regarding alignment and value ranges in stmt_may_generate_copy: For now use
  the conservative check that Richard suggested
- Index array of vertices that SCC discovery uses by SSA name version numbers
  instead of numbering statements myself.


I didn't make any changes based on these remarks:

1 It might be nice to optimize SCCs of size 1 somehow, not sure how
  many times these appear - possibly prevent them from even entering
  the SCC discovery?

It would be nice. But the only way to do this that I see right now is to first
propagate SCCs of size 1 and then the rest. This would mean adding a new copy
propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
pass relies on having SCCs topogically sorted so this procedure would have to
implement some topological sort algorithm.

This could be done. It could save allocating some vec<>s (right now, SCCs of
size 1 are represented by a vec<> with a single element). But is it worth it to
optimize the pass this way right now? If possible, I'd like to see that the
pass works and sort out any problems people encounter with it before I start
optimizing it.

2 Instead of collecting all stmts that may generate a copy at the beginning of
  the pass into a vec<>, let the SCC discovery check that statements may
  generate a copy on the fly.

This would be a big change to the pass, it would require a lot of reworking.
I'm also not sure if this would help reduce the number of allocated vec<>s that
much because I'll still want to represent SCCs by vec<>s.

Again - its possible I'll want to rework the pass in this way in the future but
I'd like to leave it as it is for now.

3 Add a comment saying that the pass is doing optimistic copy propagation

I don't think the pass works in an optimistic way. It doesn't assume that all
variables are copies of each other at any point. It instead identifies copy
statements (or PHI SCCs that act as copy statements) and then for each of these
it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
then it replaces all uses of _3 by _2.

But its possible that I still misinterpret what 'optimistic' means. If
mentioning that it works in a

Re: [PATCH] libgcov: Call __builtin_fork instead of fork

2023-12-08 Thread Florian Weimer
* Jakub Jelinek:

> On Sat, Dec 02, 2023 at 01:43:22PM +0100, Florian Weimer wrote:
>> Some targets do not provide a prototype for fork, and compilation now
>> fails with an implicit-function-declaration error.
>> 
>> libgcc/
>> 
>>  * libgcov-interface.c (__gcov_fork):
>
> Description missing (Use __builtin_fork instead of fork.).
>
> Ok with that change.

Fixed & pushed, thanks.

Florian



Re: [PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-12-08 Thread Alexander Monakov


On Fri, 8 Dec 2023, Jakub Jelinek wrote:

> Does VALGRIND_MAKE_MEM_UNDEFINED macro ever change onarches once implemented
> there?

It seems Valgrind folks take binary compatibility seriously, so that sounds
unlikely.

> Wouldn't this be better done by emitting the sequence inline?
> Even if it is done in libgcc, it is part of ABI.

I'd rather keep it as simple as possible. We could drop the libgcc parts,
users can drop in the wrapper as explained in the manual.

> So, basically add a new optab, valgrind_request, where each target would
> define_insn whatever is needed (it will need to be a single pattern, it
> can't be split among multiple) and sorry on -fvalgrind-annotations if the
> optab is not defined.

There are going to be complications since the request needs a descriptor
structure (on the stack), plus it needs more effort on the GCC side than
Valgrind side (when Valgrind is ported to a new target). I'd rather not
go that way.

> Advantage would be that --enable-valgrind-interop nor building against
> valgrind headers is not needed.

Alternatively, how about synthesizing an auxiliary translation unit with
the wrapper from the driver for -fvalgrind-annotations?

> In your version, did the new function go just to libgcc.a or to
> libgcc_s.so.1?

It participates in libgcc_s link, but it's not listed in the version script,
so it's not exported from libgcc_s (and -gc-sections should eliminate it).

Alexander


Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-08 Thread Manos Anagnostakis
So is it OK for trunk as is in v6 with the generic changes added in GCC-15?

Manos.

Στις Πέμ 7 Δεκ 2023, 16:10 ο χρήστης Richard Biener <
richard.guent...@gmail.com> έγραψε:

> On Thu, Dec 7, 2023 at 1:20 PM Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich <
> philipp.toms...@vrull.eu> wrote:
> > >>
> > >> On Wed, 6 Dec 2023 at 23:32, Richard Biener <
> richard.guent...@gmail.com> wrote:
> > >> >
> > >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
> > >> >  wrote:
> > >> > >
> > >> > > This is an RTL pass that detects store forwarding from stores to
> larger loads (load pairs).
> > >> > >
> > >> > > This optimization is SPEC2017-driven and was found to be
> beneficial for some benchmarks,
> > >> > > through testing on ampere1/ampere1a machines.
> > >> > >
> > >> > > For example, it can transform cases like
> > >> > >
> > >> > > str  d5, [sp, #320]
> > >> > > fmul d5, d31, d29
> > >> > > ldp  d31, d17, [sp, #312] # Large load from small store
> > >> > >
> > >> > > to
> > >> > >
> > >> > > str  d5, [sp, #320]
> > >> > > fmul d5, d31, d29
> > >> > > ldr  d31, [sp, #312]
> > >> > > ldr  d17, [sp, #320]
> > >> > >
> > >> > > Currently, the pass is disabled by default on all architectures
> and enabled by a target-specific option.
> > >> > >
> > >> > > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> > >> > > or other architectures as well, without needing to be turned on
> by this option.
> > >> >
> > >> > What is aarch64-specific about the pass?
> > >> >
> > >> > I see an increasingly large number of target specific passes pop up
> (probably
> > >> > for the excuse we can generalize them if necessary).  But GCC isn't
> LLVM
> > >> > and this feels like getting out of hand?
> > >>
> > >> We had an OK from Richard Sandiford on the earlier (v5) version with
> > >> v6 just fixing an obvious bug... so I was about to merge this earlier
> > >> just when you commented.
> > >>
> > >> Given that this had months of test exposure on our end, I would prefer
> > >> to move this forward for GCC14 in its current form.
> > >> The project of replacing architecture-specific store-forwarding passes
> > >> with a generalized infrastructure could then be addressed in the GCC15
> > >> timeframe (or beyond)?
> > >
> > > It's up to target maintainers, I just picked this pass (randomly) to
> make this
> > > comment (of course also knowing that STLF fails are a common issue on
> > > pipelined uarchs).
> >
> > I agree there's scope for making some of this target-independent.
> >
> > One vague thing I've been wondering about is whether, for some passes
> > like these, we should use inheritance rather than target hooks.  So in
> > this case, the target-independent code would provide a framework for
> > iterating over the function and testing for forwarding, but the target
> > would ultimately decide what to do with that information.  This would
> > also make it easier for targets to add genuinely target-specific
> > information to the bookkeeping structures.
> >
> > In case it sounds otherwise, that's supposed to be more than
> > just a structural C++-vs-C thing.  The idea is that we'd have
> > a pass for "resolving store forwarding-related problems",
> > but the specific goals would be mostly (or at least partially)
> > target-specific rather than target-independent.
>
> In some cases we've used target hooks for this, in this case it might
> work as well.
>
> > I'd wondered the same thing about the early-ra pass that we're
> > adding for SME.  Some of the framework could be generalised and
> > made target-independent, but the main purpose of the pass (using
> > strided registers with certain patterns and constraints) is highly
> > target-specific.
>
> .. not sure about this one though.
>
> Richard.
>
> > Thanks,
> > Richard
>


Re: [PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 06:43:19PM +0300, Alexander Monakov wrote:
> On Fri, 8 Dec 2023, Jakub Jelinek wrote:
> > In your version, did the new function go just to libgcc.a or to
> > libgcc_s.so.1?
> 
> It participates in libgcc_s link, but it's not listed in the version script,
> so it's not exported from libgcc_s (and -gc-sections should eliminate it).

Then it at least should not participate in that link.
There are various other objects which are libgcc.a only (e.g. all of dfp
stuff, etc.).

Jakub



Re: [PATCH v2 0/3] [GCC] arm: vst1_types_xN ACLE intrinsics

2023-12-08 Thread Richard Earnshaw
Sorry, Ezra, but I've taken the decision to back out all 4 of the patch 
series' related to this.  I think the problems that the CI has shown up 
need to be addressed first, and the fixes don't seem to be entirely trivial.


R.

On 07/12/2023 16:44, Richard Earnshaw wrote:

Pushed, thanks.

R.


On 07/12/2023 15:28, ezra.sito...@arm.com wrote:

Add xN variants of vst1_types intrinsic.




Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-08 Thread Robin Dapp
Ah, I forgot to attach the current v2 that also enables strncmp.
It was additionally tested with -minline-strncmp on rv64gcv.

Regards
 Robin

Subject: [PATCH v2] RISC-V: Add vectorized strcmp and strncmp.

This patch adds vectorized strcmp and strncmp implementations and
tests.  Similar to strlen, expansion is still guarded by
-minline-str(n)cmp.

gcc/ChangeLog:

PR target/112109

* config/riscv/riscv-protos.h (expand_strcmp): Declare.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
strategy handling and delegation to scalar and vector expanders.
(expand_strcmp): Vectorized implementation.
* config/riscv/riscv.md: Add TARGET_VECTOR to strcmp and strncmp
expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-string.cc  | 161 +-
 gcc/config/riscv/riscv.md |   6 +-
 .../riscv/rvv/autovec/builtin/strcmp-run.c|  32 
 .../riscv/rvv/autovec/builtin/strcmp.c|  13 ++
 .../riscv/rvv/autovec/builtin/strncmp-run.c   | 136 +++
 .../riscv/rvv/autovec/builtin/strncmp.c   |  13 ++
 7 files changed, 357 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c7b5789a4b3..20bbb5b859c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -558,6 +558,7 @@ void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
 void expand_popcount (rtx *);
 void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, poly_int64);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6cde1bf89a0..11c1f74d0b3 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,12 +511,19 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
 return false;
   alignment = UINTVAL (align_rtx);
 
-  if (TARGET_ZBB || TARGET_XTHEADBB)
+  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
 {
-  return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
-ncompare);
+  bool ok = riscv_vector::expand_strcmp (result, src1, src2,
+bytes_rtx, alignment,
+ncompare);
+  if (ok)
+   return true;
 }
 
+  if ((TARGET_ZBB || TARGET_XTHEADBB) && stringop_strategy & STRATEGY_SCALAR)
+return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+  ncompare);
+
   return false;
 }
 
@@ -1092,4 +1099,152 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
 }
 }
 
+/* Implement cmpstr using vector instructions.  The ALIGNMENT and
+   NCOMPARE parameters are unused for now.  */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+  unsigned HOST_WIDE_INT, bool)
+{
+  gcc_assert (TARGET_VECTOR);
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  bool with_length = nbytes != NULL_RTX;
+
+  if (with_length
+  && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+return false;
+
+  if (with_length && CONST_INT_P (nbytes))
+nbytes = force_reg (Pmode, nbytes);
+
+  machine_mode mode = E_QImode;
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = TARGET_MAX_LMUL;
+  poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+  machine_mode vmode;
+  if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), nunits)
+.exists (&vmode))
+gcc_unreachable ();
+
+  machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+  /* Prepare addresses.  */
+  rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+  rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+  rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+  rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+  /* Set initial pointer bump to 0.  */
+  rtx cnt = gen_reg_rtx (Pmode);
+  emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+  rtx sub = gen_reg_rtx (Pmode);
+  emit_mo

[PATCH] aarch64: Some tweaks to the early-ra pass

2023-12-08 Thread Richard Sandiford
early-ra's likely_operand_match_p didn't handle relaxed and special
memory constraints, which meant that the pass wasn't able to match
LD1RQ instructions to their constraints, and so backed out of
trying to allocate.  This patch fixes that by switching the sense
of the match: does the rtx seem appropriate for the constraint?,
rather than: does the constraint seem appropriate for the rtx?

Also, I came across a case that needed more general equivalence
detection.  Previously we would only record equivalences after
the last definition of the source register, but it's worth trying
to handle cases where the destination register's live range is
restricted to a block, and the next definition of the source
occurs only after the end of the destination register's live range.

The patch also fixes a cut-&-pasto that Alex noticed (thanks).

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::chain_next):
Put into an enum with...
(allocno_info::last_def_point): ...new member variable.
(allocno_info::m_current_bb_point): New member variable.
(likely_operand_match_p): Switch based on get_constraint_type,
rather than based on rtx code.  Handle relaxed and special memory
constraints.
(early_ra::record_copy): Allow the source of an equivalence to be
assigned to more than once.
(early_ra::record_allocno_use): Invalidate any previous equivalence.
Initialize last_def_point.
(early_ra::record_allocno_def): Set last_def_point.
(early_ra::valid_equivalence_p): New function, split out from...
(early_ra::record_copy): ...here.  Use last_def_point to handle
source registers that have a later definition.
(make_pass_aarch64_early_ra): Fix comment.

gcc/testsuite/
* gcc.target/aarch64/sme/strided_2.c: New test.
---
 gcc/config/aarch64/aarch64-early-ra.cc|  89 +++---
 .../gcc.target/aarch64/sme/strided_2.c| 115 ++
 2 files changed, 184 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/strided_2.c

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index c065416c5b9..f05869b5cf2 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -306,9 +306,18 @@ private:
 // equivalent to EQUIV_ALLOCNO for the whole of this allocno's lifetime.
 unsigned int equiv_allocno;
 
-// The next chained allocno in program order (i.e. at lower program
-// points), or INVALID_ALLOCNO if none.
-unsigned int chain_next;
+union
+{
+  // The program point at which the allocno was last defined,
+  // or START_OF_REGION if none.  This is only used temporarily
+  // while recording allocnos; after that, chain_next below is
+  // used instead.
+  unsigned int last_def_point;
+
+  // The next chained allocno in program order (i.e. at lower program
+  // points), or INVALID_ALLOCNO if none.
+  unsigned int chain_next;
+};
 
 // The previous chained allocno in program order (i.e. at higher
 // program points), or INVALID_ALLOCNO if none.
@@ -406,6 +415,7 @@ private:
   void record_fpr_def (unsigned int);
   void record_allocno_use (allocno_info *);
   void record_allocno_def (allocno_info *);
+  bool valid_equivalence_p (allocno_info *, allocno_info *);
   void record_copy (rtx, rtx, bool = false);
   void record_constraints (rtx_insn *);
   void record_artificial_refs (unsigned int);
@@ -479,6 +489,9 @@ private:
   // The basic block that we're currently processing.
   basic_block m_current_bb;
 
+  // The lowest-numbered program point in the current basic block.
+  unsigned int m_current_bb_point;
+
   // The program point that we're currently processing (described above).
   unsigned int m_current_point;
 
@@ -576,21 +589,26 @@ likely_operand_match_p (const operand_alternative 
&op_alt, rtx op)
return true;
 
   auto cn = lookup_constraint (constraint);
-  if (REG_P (op) || SUBREG_P (op))
+  switch (get_constraint_type (cn))
{
- if (insn_extra_register_constraint (cn))
+   case CT_REGISTER:
+ if (REG_P (op) || SUBREG_P (op))
return true;
-   }
-  else if (MEM_P (op))
-   {
- if (insn_extra_memory_constraint (cn))
+ break;
+
+   case CT_MEMORY:
+   case CT_SPECIAL_MEMORY:
+   case CT_RELAXED_MEMORY:
+ if (MEM_P (op))
return true;
-   }
-  else
-   {
- if (!insn_extra_memory_constraint (cn)
- && constraint_satisfied_p (op, cn))
+ break;
+
+   case CT_CONST_INT:
+   case CT_ADDRESS:
+   case CT_FIXED_FORM:
+ if (constraint_satisfied_p (op, cn))
return true;
+ break;
}
 
   constraint += len;
@@ -1407,10 +1425,14 @@ early_ra::record_allocno_us

Re: [PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-12-08 Thread Alexander Monakov



On Fri, 8 Dec 2023, Jakub Jelinek wrote:

> On Fri, Dec 08, 2023 at 06:43:19PM +0300, Alexander Monakov wrote:
> > On Fri, 8 Dec 2023, Jakub Jelinek wrote:
> > > In your version, did the new function go just to libgcc.a or to
> > > libgcc_s.so.1?
> > 
> > It participates in libgcc_s link, but it's not listed in the version script,
> > so it's not exported from libgcc_s (and -gc-sections should eliminate it).
> 
> Then it at least should not participate in that link.
> There are various other objects which are libgcc.a only (e.g. all of dfp
> stuff, etc.).

Thanks, changing

LIB2ADD += $(srcdir)/valgrind-interop.c

to

LIB2ADD_ST += $(srcdir)/valgrind-interop.c

in my tree achieved that.

Alexander


[C PATCH] Fix regression causing ICE for structs with VLAs [PR 112488]

2023-12-08 Thread Martin Uecker


This fixes a regression caused by my previous VM fixes.


Fix regression causing ICE for structs with VLAs [PR 112488]

A previous patch the fixed several ICEs related to size expressions
of VM types (PR c/70418, ...) caused a regression for structs where
a DECL_EXPR is not generated anymore although reqired.  We now call
add_decl_expr introduced by the previous patch from finish_struct.
The function gets a new argument to not set the TYPE_NAME for the
type to the DECL_EXPR in this spicitic case.

PR c/112488

gcc/c
* c-decl.cc (add_decl_expr): Add argument.
(finish_struct): Create DECL_EXPR.
(c_simulate_record_decl): Adapt.
* c-parser.cc (c_parser_struct_or_union_specifier): Call
finish_struct with expression for VLA sizes.
* c-tree.h (finish_struct): Add argument.

gcc/testsuite
* gcc.dg/pr112488-1.c: New test.
* gcc.dg/pr112488-2.c: New test.
* gcc.dg/pr112898.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.
---
 gcc/c/c-decl.cc | 22 +++--
 gcc/c/c-parser.cc   |  2 +-
 gcc/c/c-tree.h  |  3 ++-
 gcc/testsuite/gcc.dg/pr112488-1.c   | 14 +
 gcc/testsuite/gcc.dg/pr112488-2.c   | 13 
 gcc/testsuite/gcc.dg/pr112898.c |  9 +
 gcc/testsuite/gcc.misc-tests/gcov-pr85350.c |  2 +-
 7 files changed, 56 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112488-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr112488-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr112898.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 92c83e1bf10..0b500c19e70 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -6619,7 +6619,7 @@ smallest_type_quals_location (const location_t *locations,
use BIND_EXPRs in TYPENAME contexts too.  */
 static void
 add_decl_expr (location_t loc, enum decl_context decl_context, tree type,
-  tree *expr)
+  tree *expr, bool set_name_p)
 {
   tree bind = NULL_TREE;
   if (decl_context == TYPENAME || decl_context == PARM
@@ -6636,7 +6636,8 @@ add_decl_expr (location_t loc, enum decl_context 
decl_context, tree type,
   pushdecl (decl);
   DECL_ARTIFICIAL (decl) = 1;
   add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl));
-  TYPE_NAME (type) = decl;
+  if (set_name_p)
+TYPE_NAME (type) = decl;
 
   if (bind)
 {
@@ -7635,7 +7636,7 @@ grokdeclarator (const struct c_declarator *declarator,
   type has a name/declaration of it's own, but special attention
   is required if the type is anonymous. */
if (!TYPE_NAME (type) && c_type_variably_modified_p (type))
- add_decl_expr (loc, decl_context, type, expr);
+ add_decl_expr (loc, decl_context, type, expr, true);
 
type = c_build_pointer_type (type);
 
@@ -7900,7 +7901,7 @@ grokdeclarator (const struct c_declarator *declarator,
 
/* The pointed-to type may need a decl expr (see above).  */
if (!TYPE_NAME (type) && c_type_variably_modified_p (type))
- add_decl_expr (loc, decl_context, type, expr);
+ add_decl_expr (loc, decl_context, type, expr, true);
 
type = c_build_pointer_type (type);
type_quals = array_ptr_quals;
@@ -9257,7 +9258,8 @@ is_flexible_array_member_p (bool is_last_field,
 
 tree
 finish_struct (location_t loc, tree t, tree fieldlist, tree attributes,
-  class c_struct_parse_info *enclosing_struct_parse_info)
+  class c_struct_parse_info *enclosing_struct_parse_info,
+  tree *expr)
 {
   tree x;
   bool toplevel = file_scope == current_scope;
@@ -9595,6 +9597,13 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 
   finish_incomplete_vars (incomplete_vars, toplevel);
 
+  /* Make sure a DECL_EXPR is created for structs with VLA members.
+ Because we do not know the context, we use decl_context TYPENAME
+ here to force creation of a BIND_EXPR which is required in some
+ contexts.  */
+  if (c_type_variably_modified_p (t))
+add_decl_expr (loc, TYPENAME, t, expr, false);
+
   if (warn_cxx_compat)
 warn_cxx_compat_finish_struct (fieldlist, TREE_CODE (t), loc);
 
@@ -10191,7 +10200,8 @@ c_simulate_record_decl (location_t loc, const char 
*name,
DECL_CHAIN (fields[i - 1]) = fields[i];
 }
 
-  finish_struct (loc, type, fields[0], NULL_TREE, struct_info);
+  tree expr = NULL_TREE;
+  finish_struct (loc, type, fields[0], NULL_TREE, struct_info, &expr);
 
   tree decl = build_decl (loc, TYPE_DECL, ident, type);
   set_underlying_type (decl);
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index df9a07928b5..dcb6c21da41 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -4087,7 +4087,7 @@ c_parser_struct_or_union_specifier (c_parser *parser)
   ret.spec = finish_struct (struct_loc, type, nrev

Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-08 Thread Tobias Burnus

On 08.12.23 15:09, Thomas Schwinge wrote:

On 22/11/2023 17:07, Tobias Burnus wrote:

Let's start with the patch itself:

--- a/libgomp/target.c
+++ b/libgomp/target.c
...
+static struct gomp_device_descr *
+get_device_for_page_locked (void)
+{
+ gomp_debug (0, "%s\n",
+ __FUNCTION__);
+
+ struct gomp_device_descr *device;
+#ifdef HAVE_SYNC_BUILTINS
+ device
+   = __atomic_load_n (&device_for_page_locked, MEMMODEL_RELAXED);
+ if (device == (void *) -1)
+   {
+ gomp_debug (0, " init\n");
+
+ gomp_init_targets_once ();
+
+ device = NULL;
+ for (int i = 0; i < num_devices; ++i)

Given that this function just sets a single variable based on whether the
page_locked_host_alloc_func function pointer exists, wouldn't it be much
simpler to just do all this handling in   gomp_target_init  ?

@Thomas, care to comment on this?

 From what I remember, we cannot assume that 'gomp_target_init' has
already been done when we get here; therefore 'gomp_init_targets_once' is
being called here.  We may get to 'get_device_for_page_locked' via
host-side OpenMP, in code that doesn't contain any OpenMP 'target'
offloading things.  Therefore, this was (a) necessary to make that work,
and (b) did seem to be a useful abstraction to me.


I am not questioning the "gomp_init_targets_once ();" but I am wounding
whether only 'gomp_init_targets_once()' should remain without the
locking + loading dance - and then just set that single variable inside
gomp_target_init.

If you reach here w/o target set up, the "gomp_init_targets_once ();"
would ensure it gets initialized with all the other code inside
gomp_target_init.

And if gomp_target_init() was called before, gomp_init_targets_once()
will just return without doing anything and your are also fine.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]

2023-12-08 Thread Jason Merrill

On 12/7/23 04:28, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled because two ubsan instrumentations
run into each other.
The first one is the shift instrumentation.  Before the C++ FE calls
it, it wraps the 2 shift arguments with cp_save_expr, so that side-effects
in them aren't evaluated multiple times.  And, ubsan_instrument_shift
itself uses unshare_expr on any uses of the operands to make sure further
modifications in them don't affect other copies of them (the only not
unshared ones are the one the caller then uses for the actual operation
after the instrumentation, which means there is no tree sharing).

Now, if there are side-effects in the first operand like say function
call, cp_save_expr wraps it into a SAVE_EXPR, and ubsan_instrument_shift
in this mode emits something like
if (..., SAVE_EXPR , SAVE_EXPR  > const)
   __ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR , ...);
and caller adds
SAVE_EXPR  << SAVE_EXPR 
after it in a COMPOUND_EXPR.  So far so good.

If there are no side-effects and cp_save_expr doesn't create SAVE_EXPR,
everything is ok as well because of the unshare_expr.
We have
if (..., SAVE_EXPR  > const)
   __ubsan_handle_shift_out_of_bounds (..., ptr->something[i], ...);
and
ptr->something[i] << SAVE_EXPR 
where ptr->something[i] is unshared.

In the testcase below, the !x->s[j] ? 1 : 0 expression is wrapped initially
into a SAVE_EXPR though, and unshare_expr doesn't unshare SAVE_EXPRs nor
anything used in them for obvious reasons, so we end up with:
if (..., SAVE_EXPR (x)->s[j] ? 1 : 0>, 
SAVE_EXPR  > const)
   __ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR (x)->s[j] ? 1 : 0>, ...);
and
SAVE_EXPR (x)->s[j] ? 1 : 0> << SAVE_EXPR 

So far good as well.  But later during cp_fold of the SAVE_EXPR we find
out that VIEW_CONVERT_EXPR(x)->s[j] ? 0 : 1 is actually
invariant (has TREE_READONLY set) and so cp_fold simplifies the above to
if (..., SAVE_EXPR  > const)
   __ubsan_handle_shift_out_of_bounds (..., (bool) VIEW_CONVERT_EXPR(x)->s[j] ? 0 : 1, ...);
and
((bool) VIEW_CONVERT_EXPR(x)->s[j] ? 0 : 1) << SAVE_EXPR 
with the s[j] ARRAY_REFs and other expressions shared in between the two
uses (and obviously the expression optimized away from the COMPOUND_EXPR in
the if condition.

Then comes another ubsan instrumentation at genericization time,
this time to instrument the ARRAY_REFs with strict bounds checking,
and replaces the s[j] in there with s[.UBSAN_BOUNDS (0B, SAVE_EXPR, 8), 
SAVE_EXPR]
As the trees are shared, it does that just once though.
And as the if body is gimplified first, the SAVE_EXPR is evaluated inside
of the if body and when it is used again after the if, it uses a potentially
uninitialized value of j.1 (always uninitialized if the shift count isn't
out of bounds).

The following patch fixes that by unshare_expr unsharing the folded argument
of a SAVE_EXPR if we've folded the SAVE_EXPR into an invariant and it is
used more than once.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Do we want to do the same for TARGET_EXPR, since those are handled like 
SAVE_EXPR in mostly_copy_tree_r?



2023-12-07  Jakub Jelinek  

PR sanitizer/112727
* cp-gimplify.cc (cp_fold): If SAVE_EXPR has been previously
folded, unshare_expr what is returned.

* c-c++-common/ubsan/pr112727.c: New test.

--- gcc/cp/cp-gimplify.cc.jj2023-12-05 09:06:06.112878408 +0100
+++ gcc/cp/cp-gimplify.cc   2023-12-06 22:32:46.379370223 +0100
@@ -2906,7 +2906,14 @@ cp_fold (tree x, fold_flags_t flags)
  fold_cache = hash_map::create_ggc (101);
  
if (tree *cached = fold_cache->get (x))

-return *cached;
+{
+  /* unshare_expr doesn't recurse into SAVE_EXPRs.  If SAVE_EXPR's
+argument has been folded into a tree invariant, make sure it is
+unshared.  See PR112727.  */
+  if (TREE_CODE (x) == SAVE_EXPR && *cached != x)
+   return unshare_expr (*cached);
+  return *cached;
+}
  
uid_sensitive_constexpr_evaluation_checker c;
  
--- gcc/testsuite/c-c++-common/ubsan/pr112727.c.jj	2023-12-06 22:35:46.012819991 +0100

+++ gcc/testsuite/c-c++-common/ubsan/pr112727.c 2023-12-06 22:35:16.708236026 
+0100
@@ -0,0 +1,17 @@
+/* PR sanitizer/112727 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=shift-exponent,bounds-strict -Wuninitialized" 
} */
+
+#ifndef __cplusplus
+#define bool _Bool
+#endif
+
+struct S { bool s[8]; };
+
+void
+foo (const struct S *x)
+{
+  unsigned n = 0;
+  for (unsigned j = 0; j < 8; j++)
+n |= ((!x->s[j]) ? 1 : 0) << (16 + j);
+}

Jakub





Re: [PATCH] c++: Don't diagnose ignoring of attributes if all ignored attributes are attribute_ignored_p

2023-12-08 Thread Jason Merrill

On 12/6/23 09:10, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 11:01:20AM -0500, Jason Merrill wrote:

And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?


Sounds good.


The following patch implements it.
I've kept warnings for cases where the C++ standard says explicitly any
attributes aren't ok -
"If an attribute-specifier-seq appertains to a friend declaration, that
declaration shall be a definition."

For some changes I haven't figured out how could I cover it in the
testsuite.

So far tested with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp=Wno-attributes* ubsan.exp=Wno-attributes*"
(which is all tests that use -Wno-attributes=), ok for trunk if it passes
full bootstrap/regtest?

Note, C uses a different strategy, it has c_warn_unused_attributes
function which warns about all the attributes one by one unless they
are ignored (or allowed in certain position).
Though that is just a single diagnostic wording, while C++ FE just warns
that there are some ignored attributes and doesn't name them individually
(except for namespace and using namespace) and uses different wordings in
different spots.

2023-12-06  Jakub Jelinek  

gcc/
* attribs.h (any_nonignored_attribute_p): Declare.
* attribs.cc (any_nonignored_attribute_p): New function.
gcc/cp/
* parser.cc (cp_parser_statement, cp_parser_expression_statement,
cp_parser_declaration, cp_parser_elaborated_type_specifier,
cp_parser_asm_definition): Don't diagnose ignored attributes
if !any_nonignored_attribute_p.
* decl.cc (grokdeclarator): Likewise.
* name-lookup.cc (handle_namespace_attrs, finish_using_directive):
Don't diagnose ignoring of attr_ignored_p attributes.
gcc/testsuite/
* g++.dg/warn/Wno-attributes-1.C: New test.

--- gcc/cp/parser.cc.jj 2023-12-06 12:03:27.502174967 +0100
+++ gcc/cp/parser.cc2023-12-06 12:36:55.704884514 +0100
@@ -21095,14 +21094,20 @@ cp_parser_elaborated_type_specifier (cp_
if (attributes)
  {
if (TREE_CODE (type) == TYPENAME_TYPE)
-   warning (OPT_Wattributes,
-"attributes ignored on uninstantiated type");
+   {
+ if (any_nonignored_attribute_p (attributes))
+   warning (OPT_Wattributes,
+"attributes ignored on uninstantiated type");
+   }
else if (tag_type != enum_type
   && TREE_CODE (type) != BOUND_TEMPLATE_TEMPLATE_PARM
   && CLASSTYPE_TEMPLATE_INSTANTIATION (type)
   && ! processing_explicit_instantiation)
-   warning (OPT_Wattributes,
-"attributes ignored on template instantiation");
+   {
+ if (any_nonignored_attribute_p (attributes))
+   warning (OPT_Wattributes,
+"attributes ignored on template instantiation");
+   }
else if (is_friend && cxx11_attribute_p (attributes))
{
  if (warning (OPT_Wattributes, "attribute ignored"))
@@ -2,7 +21116,7 @@ cp_parser_elaborated_type_specifier (cp_
}
else if (is_declaration && cp_parser_declares_only_class_p (parser))
cplus_decl_attributes (&type, attributes, (int) 
ATTR_FLAG_TYPE_IN_PLACE);
-  else
+  else if (any_nonignored_attribute_p (attributes))
warning (OPT_Wattributes,
 "attributes ignored on elaborated-type-specifier that is "
 "not a forward declaration");


I believe this is also prohibited by
https://eel.is/c++draft/dcl.type.elab#3

so I would leave all the warnings in this function alone.


@@ -22672,7 +22677,7 @@ cp_parser_asm_definition (cp_parser* par
symtab->finalize_toplevel_asm (string);
  }
  
-  if (std_attrs)

+  if (std_attrs && any_nonignored_attribute_p (std_attrs))
  warning_at (asm_loc, OPT_Wattributes,
"attributes ignored on % declaration");
  }
--- gcc/cp/decl.cc.jj   2023-12-06 12:03:27.483175235 +0100
+++ gcc/cp/decl.cc  2023-12-06 12:36:55.698884598 +0100
@@ -13058,7 +13058,8 @@ grokdeclarator (const cp_declarator *dec
&& !diagnose_misapplied_contracts (declspecs->std_attributes))
  {
location_t attr_loc = declspecs->locations[ds_std_attribute];
-  if (warning_at (attr_loc, OPT_Wattributes, "attribute ignored"))
+  if (any_nonignored_attribute_p (declspecs->std_attributes)
+ && warning_at (attr_loc, OPT

Re: [PATCH] c++: fix ICE with sizeof in a template [PR112869]

2023-12-08 Thread Jason Merrill

On 12/5/23 15:31, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

   min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
 (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

   min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for unevaluated operands.


I agree that we want this change for in_immediate_context (), but I 
don't see why we want it for TYPE_P or unevaluated_p (code) or 
cp_unevaluated_operand?



gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
  gcc/cp/cp-gimplify.cc| 8 ++--
  gcc/testsuite/g++.dg/template/sizeof18.C | 8 
  2 files changed, 10 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..46c3eb91853 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1177,13 +1177,9 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
   ? tf_error : tf_none);
const tree_code code = TREE_CODE (stmt);
  
-  /* No need to look into types or unevaluated operands.

- NB: This affects cp_fold_r as well.  */
+  /* No need to look into types or unevaluated operands.  */
if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
-{
-  *walk_subtrees = 0;
-  return NULL_TREE;
-}
+return NULL_TREE;
  
tree decl = NULL_TREE;

bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}

base-commit: 9c3a880feecf81c310b4ade210fbd7004c9aece7




Re: [PATCH] c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 11:51:19AM -0500, Jason Merrill wrote:
> Do we want to do the same for TARGET_EXPR, since those are handled like
> SAVE_EXPR in mostly_copy_tree_r?

In mostly_copy_tree_r yes, but I don't see cp_fold doing anything for
TARGET_EXPRs (like it does for SAVE_EXPRs), so I think TARGET_EXPRs stay
around until gimplification.

Jakub



Re: [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]

2023-12-08 Thread Vladimir Makarov



On 12/7/23 03:39, Jakub Jelinek wrote:

On Thu, Dec 07, 2023 at 09:36:22AM +0100, Jakub Jelinek wrote:

So, one way to fix the LRA issue would be just to use
   lra_insn_recog_data_len = index * 3U / 2;
   if (lra_insn_recog_data_len <= index)
 lra_insn_recog_data_len = index + 1;
basically do what vec.cc does.  I thought we can do better for
both vec.cc and LRA on 64-bit hosts even without growing the allocated
counters, but now that I look at it again, perhaps we can't.
The above overflows already with original alloc or lra_insn_recog_data_len
0x5556, where 0x555 * 3U / 2 is still 0x7fff
and so representable in the 32-bit, but 0x5556 * 3U / 2 is
1.  I thought (and the patch implements it) that we could use
alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
that quickly, but 0x5556 * (size_t) 3 / 2 there is 0x8001
which is still ok in unsigned, but given that vec.h then stores the
counter into unsigned m_alloc:31; bit-field, it is too much.

The patch below is what I've actually bootstrapped/regtested on
x86_64-linux and i686-linux, but given the above I think I should drop
the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.

Here is so far untested adjusted patch, which does the computation
just in unsigned int rather than size_t, because doing it in size_t
wouldn't improve things.

2023-12-07  Jakub Jelinek  

PR middle-end/112411
* params.opt (-param=min-nondebug-insn-uid=): Add
IntegerRange(0, 1073741824).
* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
in * 3 / 2 computation and if the result is smaller or equal to
index, use index + 1.

* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
--param min-nondebug-insn-uid=1073741824.

Jakub, if you are still waiting for an approval,  LRA change is ok for 
me with given max param.


Thank you for fixing this.





[PATCH] c++, v2: Don't diagnose ignoring of attributes if all ignored attributes are attribute_ignored_p

2023-12-08 Thread Jakub Jelinek
On Fri, Dec 08, 2023 at 12:06:01PM -0500, Jason Merrill wrote:
> > @@ -2,7 +21116,7 @@ cp_parser_elaborated_type_specifier (cp_
> > }
> > else if (is_declaration && cp_parser_declares_only_class_p (parser))
> > cplus_decl_attributes (&type, attributes, (int) 
> > ATTR_FLAG_TYPE_IN_PLACE);
> > -  else
> > +  else if (any_nonignored_attribute_p (attributes))
> > warning (OPT_Wattributes,
> >  "attributes ignored on elaborated-type-specifier that is "
> >  "not a forward declaration");
> 
> I believe this is also prohibited by
> https://eel.is/c++draft/dcl.type.elab#3

You're right and there is also
https://eel.is/c++draft/temp.spec#temp.explicit-3
which prohibits it for explicit template instantiations.

> so I would leave all the warnings in this function alone.

Ok.

> > location_t attr_loc = declspecs->locations[ds_std_attribute];
> > -  if (warning_at (attr_loc, OPT_Wattributes, "attribute ignored"))
> > +  if (any_nonignored_attribute_p (declspecs->std_attributes)
> > + && warning_at (attr_loc, OPT_Wattributes, "attribute ignored"))
> > inform (attr_loc, "an attribute that appertains to a type-specifier "
> > "is ignored");
> >   }
> 
> This seems untested, e.g.
> 
> int [[foo::bar]] i;

Thanks.

Here is an updated patch, so far tested with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp='Wno-attributes*' ubsan.exp=Wno-attributes*"
Ok for trunk if it passes full bootstrap/regtest?

2023-12-08  Jakub Jelinek  

gcc/
* attribs.h (any_nonignored_attribute_p): Declare.
* attribs.cc (any_nonignored_attribute_p): New function.
gcc/cp/
* parser.cc (cp_parser_statement, cp_parser_expression_statement,
cp_parser_declaration, cp_parser_asm_definition): Don't diagnose
ignored attributes if !any_nonignored_attribute_p.
* decl.cc (grokdeclarator): Likewise.
* name-lookup.cc (handle_namespace_attrs, finish_using_directive):
Don't diagnose ignoring of attr_ignored_p attributes.
gcc/testsuite/
* g++.dg/warn/Wno-attributes-1.C: New test.

--- gcc/attribs.h.jj2023-12-06 12:03:27.421176109 +0100
+++ gcc/attribs.h   2023-12-06 12:36:55.704884514 +0100
@@ -48,6 +48,7 @@ extern void apply_tm_attr (tree, tree);
 extern tree make_attribute (const char *, const char *, tree);
 extern bool attribute_ignored_p (tree);
 extern bool attribute_ignored_p (const attribute_spec *const);
+extern bool any_nonignored_attribute_p (tree);
 
 extern struct scoped_attributes *
   register_scoped_attributes (const scoped_attribute_specs &, bool = false);
--- gcc/attribs.cc.jj   2023-12-06 12:03:27.386176602 +0100
+++ gcc/attribs.cc  2023-12-06 12:36:55.704884514 +0100
@@ -584,6 +584,19 @@ attribute_ignored_p (const attribute_spe
   return as->max_length == -2;
 }
 
+/* Return true if the ATTRS chain contains at least one attribute which
+   is not ignored.  */
+
+bool
+any_nonignored_attribute_p (tree attrs)
+{
+  for (tree attr = attrs; attr; attr = TREE_CHAIN (attr))
+if (!attribute_ignored_p (attr))
+  return true;
+
+  return false;
+}
+
 /* See whether LIST contains at least one instance of attribute ATTR
(possibly with different arguments).  Return the first such attribute
if so, otherwise return null.  */
--- gcc/cp/parser.cc.jj 2023-12-06 12:03:27.502174967 +0100
+++ gcc/cp/parser.cc2023-12-06 12:36:55.704884514 +0100
@@ -12778,9 +12778,8 @@ cp_parser_statement (cp_parser* parser,
 SET_EXPR_LOCATION (statement, statement_location);
 
   /* Allow "[[fallthrough]];" or "[[assume(cond)]];", but warn otherwise.  */
-  if (std_attrs != NULL_TREE)
-warning_at (attrs_loc,
-   OPT_Wattributes,
+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
+warning_at (attrs_loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");
 }
 
@@ -12986,7 +12985,7 @@ cp_parser_expression_statement (cp_parse
 }
 
   /* Allow "[[fallthrough]];", but warn otherwise.  */
-  if (attr != NULL_TREE)
+  if (attr != NULL_TREE && any_nonignored_attribute_p (attr))
 warning_at (loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");
 
@@ -15191,7 +15190,7 @@ cp_parser_declaration (cp_parser* parser
}
}
 
-  if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
   if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
@@ -22672,7 +22671,7 @@ cp_parser_asm_definition (cp_parser* par
symtab->finalize_toplevel_asm (string);
 }
 
-  if (std_attrs)
+  if (std_attrs && any_nonignored_attribute_p (std_attrs))
 warning_at (asm_loc, OPT_Wattributes,
"att

Re: [PATCH v3 2/2] libphobos: Update build scripts for LoongArch64.

2023-12-08 Thread Iain Buclaw
Excerpts from Yang Yujie's message of Dezember 8, 2023 11:09 am:
> libphobos/ChangeLog:
> 
>   * m4/druntime/cpu.m4: Support loongarch* targets.
>   * libdruntime/Makefile.am: Same.
>   * libdruntime/Makefile.in: Regenerate.
>   * configure: Regenerate.
> ---
>  libphobos/configure   | 21 ++-
>  libphobos/libdruntime/Makefile.am |  3 +
>  libphobos/libdruntime/Makefile.in | 98 +++
>  libphobos/m4/druntime/cpu.m4  |  5 ++
>  4 files changed, 87 insertions(+), 40 deletions(-)
> 

Both these patches by themselves are fine.

Thanks again!

Iain.


[pushed] c++: Add fixed test [PR88848]

2023-12-08 Thread Marek Polacek
Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --
This one was fixed by r12-7714-g47da5198766256.

PR c++/88848

gcc/testsuite/ChangeLog:

* g++.dg/inherit/multiple2.C: New test.
---
 gcc/testsuite/g++.dg/inherit/multiple2.C | 35 
 1 file changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/inherit/multiple2.C

diff --git a/gcc/testsuite/g++.dg/inherit/multiple2.C 
b/gcc/testsuite/g++.dg/inherit/multiple2.C
new file mode 100644
index 000..dd3d0daf248
--- /dev/null
+++ b/gcc/testsuite/g++.dg/inherit/multiple2.C
@@ -0,0 +1,35 @@
+// PR c++/88848
+// { dg-do compile { target c++17 } }
+
+template
+struct True { static constexpr bool value{ true }; };
+
+template
+struct Integer { static constexpr int value{ VALUE }; };
+
+template
+struct Foo
+{
+  using Integer_t = Integer;
+
+  static TYPE get_type(Integer_t);
+};
+
+template
+struct Bar : ARGS...
+{
+  using ARGS::get_type...;
+
+  template
+  using Type_t = decltype(get_type(Integer{}));
+
+  Bar() { static_assert((True< Type_t >::value && 
...)); }
+
+  static_assert((True< Type_t >::value && ...));
+};
+
+int main()
+{
+  Bar, Foo<8, double>> obj;
+  return int{ sizeof(obj) };
+}

base-commit: 0c018a74eb1affe2a1fa385cdddaa93979683420
-- 
2.43.0



Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-08 Thread Jason Merrill

On 12/6/23 02:33, waffl3x wrote:

Here is the next version, it feels very close to finished. As before, I
haven't ran a bootstrap or the full testsuite yet but I did run the
explicit-obj tests which completed as expected.

There's a few test cases that still need to be written but more tests
can always be added. The behavior added by CWG2789 works in at least
one case, but I have not added tests for it yet. The test cases for
dependent lambda expressions need to be fleshed out more, but a few
temporary ones are included to demonstrate that they do work and that
the crash is fixed. Explicit object conversion functions work, but I
need to add fleshed out tests for them, explicit-obj-basic5.C has that
test.



@@ -6586,6 +6586,17 @@ add_candidates (tree fns, tree first_arg, const vec *args,
+   /* FIXME: I believe this will be bugged for xobj member functions,
+  leaving this comment here to make sure we look into it
+  at some point.
+  Seeing this makes me want correspondence checking to be unified
+  in one place though, not sure if this one needs to be different
+  from other ones though.
+  This function is only used here, but maybe we can use it in add
+  method and move some of the logic out of there?


fns_correspond absolutely needs updating to handle xob fns, and doing 
that by unifying it with add_method's calculation would be good.



+  Side note: CWG2586 might be relevant for this area in
+  particular, perhaps we wait to see if it gets accepted first?  */


2586 was accepted last year.


@@ -12574,17 +12601,25 @@ cand_parms_match (z_candidate *c1, z_candidate *c2)
   fn1 = DECL_TEMPLATE_RESULT (t1);
   fn2 = DECL_TEMPLATE_RESULT (t2);
 }
+  /* The changes I made here might be stuff I was told not to worry about?
+ I'm not really sure so I'm going to leave it in.  */


Good choice, this comment can go.


   tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
   tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (fn2));
   if (DECL_FUNCTION_MEMBER_P (fn1)
   && DECL_FUNCTION_MEMBER_P (fn2)
-  && (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn1)
- != DECL_NONSTATIC_MEMBER_FUNCTION_P (fn2)))
+  && (DECL_STATIC_FUNCTION_P (fn1)
+ != DECL_STATIC_FUNCTION_P (fn2)))
 {
   /* Ignore 'this' when comparing the parameters of a static member
 function with those of a non-static one.  */
-  parms1 = skip_artificial_parms_for (fn1, parms1);
-  parms2 = skip_artificial_parms_for (fn2, parms2);
+  auto skip_parms = [](tree fn, tree parms){
+ if (DECL_XOBJ_MEMBER_FUNCTION_P (fn))
+   return TREE_CHAIN (parms);
+ else
+   return skip_artificial_parms_for (fn, parms);
+   };
+  parms1 = skip_parms (fn1, parms1);
+  parms2 = skip_parms (fn2, parms2);
 }


https://cplusplus.github.io/CWG/issues/2789.html fixes the handling of 
xobj fns here.


Your change does the right thing for comparing static and xobj, but 
doesn't handle comparing iobj and xobj; I think we want to share 
parameter comparison code with fns_correspond/add_method.  Maybe 
parms_correspond?



@@ -8727,21 +8882,42 @@ resolve_address_of_overloaded_function (tree 
target_type,
   /* Good, exactly one match.  Now, convert it to the correct type.  */
   fn = TREE_PURPOSE (matches);
 
-  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)

-  && !(complain & tf_ptrmem_ok) && !flag_ms_extensions)
+  if (DECL_OBJECT_MEMBER_FUNCTION_P (fn)
+  && !(complain & tf_ptrmem_ok))
 {
-  static int explained;
-
-  if (!(complain & tf_error))
+  /* For iobj member functions, if if -fms_extensions was passed in, this
+is not an error, so we do nothing.  It is still an error regardless
+for xobj member functions though, as it is a new feature we
+(hopefully) don't need to support the behavior.  */


Unfortunately, it seems that MSVC extended their weirdness to xobj fns, 
so -fms-extensions should as well.

https://godbolt.org/z/nfvn64Kx5


+ /* I'm keeping it more basic for now.  */


OK, this comment can go.


@@ -15502,9 +15627,10 @@ void
 grok_special_member_properties (tree decl)
 {
   tree class_type;
-
+  /* I believe we have to make some changes in here depending on the outcome
+ of CWG2586.  */


As mentioned above, CWG2586 is resolved.  Be sure to scroll down to the 
approved resolution, or refer to the working draft.

https://cplusplus.github.io/CWG/issues/2586.html


@@ -11754,8 +11754,16 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, tre
   else if (cxx_dialect < cxx23)
 omitted_parms_loc = cp_lexer_peek_token (parser->lexer)->location;
 
+  /* Review note: I figured I might as well update the comments since I'm here.

+ There are also some additions to the below.  */


Great, this comment can go.


+  /* [expr.prim.lambda.general-4]
+If the lambda-declarator cont

Re: [PATCH] c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]

2023-12-08 Thread Jason Merrill

On 12/8/23 12:35, Jakub Jelinek wrote:

On Fri, Dec 08, 2023 at 11:51:19AM -0500, Jason Merrill wrote:

Do we want to do the same for TARGET_EXPR, since those are handled like
SAVE_EXPR in mostly_copy_tree_r?


In mostly_copy_tree_r yes, but I don't see cp_fold doing anything for
TARGET_EXPRs (like it does for SAVE_EXPRs), so I think TARGET_EXPRs stay
around until gimplification.


Makes sense, the patch is OK



Re: [PATCH] c++, v2: Don't diagnose ignoring of attributes if all ignored attributes are attribute_ignored_p

2023-12-08 Thread Jason Merrill

On 12/8/23 12:53, Jakub Jelinek wrote:

On Fri, Dec 08, 2023 at 12:06:01PM -0500, Jason Merrill wrote:

@@ -2,7 +21116,7 @@ cp_parser_elaborated_type_specifier (cp_
}
 else if (is_declaration && cp_parser_declares_only_class_p (parser))
cplus_decl_attributes (&type, attributes, (int) 
ATTR_FLAG_TYPE_IN_PLACE);
-  else
+  else if (any_nonignored_attribute_p (attributes))
warning (OPT_Wattributes,
 "attributes ignored on elaborated-type-specifier that is "
 "not a forward declaration");


I believe this is also prohibited by
https://eel.is/c++draft/dcl.type.elab#3


You're right and there is also
https://eel.is/c++draft/temp.spec#temp.explicit-3
which prohibits it for explicit template instantiations.


so I would leave all the warnings in this function alone.


Ok.


 location_t attr_loc = declspecs->locations[ds_std_attribute];
-  if (warning_at (attr_loc, OPT_Wattributes, "attribute ignored"))
+  if (any_nonignored_attribute_p (declspecs->std_attributes)
+ && warning_at (attr_loc, OPT_Wattributes, "attribute ignored"))
inform (attr_loc, "an attribute that appertains to a type-specifier "
"is ignored");
   }


This seems untested, e.g.

int [[foo::bar]] i;


Thanks.

Here is an updated patch, so far tested with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp='Wno-attributes*' ubsan.exp=Wno-attributes*"
Ok for trunk if it passes full bootstrap/regtest?


OK.


2023-12-08  Jakub Jelinek  

gcc/
* attribs.h (any_nonignored_attribute_p): Declare.
* attribs.cc (any_nonignored_attribute_p): New function.
gcc/cp/
* parser.cc (cp_parser_statement, cp_parser_expression_statement,
cp_parser_declaration, cp_parser_asm_definition): Don't diagnose
ignored attributes if !any_nonignored_attribute_p.
* decl.cc (grokdeclarator): Likewise.
* name-lookup.cc (handle_namespace_attrs, finish_using_directive):
Don't diagnose ignoring of attr_ignored_p attributes.
gcc/testsuite/
* g++.dg/warn/Wno-attributes-1.C: New test.

--- gcc/attribs.h.jj2023-12-06 12:03:27.421176109 +0100
+++ gcc/attribs.h   2023-12-06 12:36:55.704884514 +0100
@@ -48,6 +48,7 @@ extern void apply_tm_attr (tree, tree);
  extern tree make_attribute (const char *, const char *, tree);
  extern bool attribute_ignored_p (tree);
  extern bool attribute_ignored_p (const attribute_spec *const);
+extern bool any_nonignored_attribute_p (tree);
  
  extern struct scoped_attributes *

register_scoped_attributes (const scoped_attribute_specs &, bool = false);
--- gcc/attribs.cc.jj   2023-12-06 12:03:27.386176602 +0100
+++ gcc/attribs.cc  2023-12-06 12:36:55.704884514 +0100
@@ -584,6 +584,19 @@ attribute_ignored_p (const attribute_spe
return as->max_length == -2;
  }
  
+/* Return true if the ATTRS chain contains at least one attribute which

+   is not ignored.  */
+
+bool
+any_nonignored_attribute_p (tree attrs)
+{
+  for (tree attr = attrs; attr; attr = TREE_CHAIN (attr))
+if (!attribute_ignored_p (attr))
+  return true;
+
+  return false;
+}
+
  /* See whether LIST contains at least one instance of attribute ATTR
 (possibly with different arguments).  Return the first such attribute
 if so, otherwise return null.  */
--- gcc/cp/parser.cc.jj 2023-12-06 12:03:27.502174967 +0100
+++ gcc/cp/parser.cc2023-12-06 12:36:55.704884514 +0100
@@ -12778,9 +12778,8 @@ cp_parser_statement (cp_parser* parser,
  SET_EXPR_LOCATION (statement, statement_location);
  
/* Allow "[[fallthrough]];" or "[[assume(cond)]];", but warn otherwise.  */

-  if (std_attrs != NULL_TREE)
-warning_at (attrs_loc,
-   OPT_Wattributes,
+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
+warning_at (attrs_loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");
  }
  
@@ -12986,7 +12985,7 @@ cp_parser_expression_statement (cp_parse

  }
  
/* Allow "[[fallthrough]];", but warn otherwise.  */

-  if (attr != NULL_TREE)
+  if (attr != NULL_TREE && any_nonignored_attribute_p (attr))
  warning_at (loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");
  
@@ -15191,7 +15190,7 @@ cp_parser_declaration (cp_parser* parser

}
}
  
-  if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))

+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
@@ -22672,7 +22671,7 @@ cp_parser_asm_definition (cp_parser* par
symtab->finalize_toplevel_asm (string);
  }
  
-  if (std_attrs)

+  if (std_attrs && any_nonignored_attribute_p (std_attrs))
  warning_at (asm_

Re: [PATCH] c++, v2: Fix parsing [[]][[]];

2023-12-08 Thread Jason Merrill

On 12/5/23 12:17, Marek Polacek wrote:

On Tue, Dec 05, 2023 at 06:00:31PM +0100, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 09:45:32AM -0500, Marek Polacek wrote:

When working on the previous patch I put [[]] [[]] asm (""); into a
testcase, but was surprised it wasn't parsed.


By wasn't parsed you mean we gave an error, right?  I only see an error
with block-scope [[]] [[]];.


Yeah.
The reason why [[]][[]]; works at namespace scope is that if
   else if (cp_lexer_nth_token_is (parser->lexer,
   cp_parser_skip_std_attribute_spec_seq 
(parser,
  1),
   CPP_SEMICOLON))
which is the case here then even if after parsing the attributes next token
isn't CPP_SEMICOLON (the case here without the patch), it will just return
and another cp_parser_declaration will parse another [[]], that time also
with CPP_SEMICOLON.


It seems marginally better to me to use void_list_node so that we don't
need a new parm, like what we do when parsing parameters: ()/(void)/(...),
but I should let others decide.


Here is a modified version of the patch which does it like that.


Thanks, this looks good to me.


Agreed, OK.



[pushed][PR112875][LRA]: Fix an assert in lra elimination code

2023-12-08 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112875

The patch was successfully tested and bootstrapped on x86-64 and ppc64le.

commit 48cb51827c9eb991b92014a3f59d31eb237ce03f
Author: Vladimir N. Makarov 
Date:   Fri Dec 8 15:37:42 2023 -0500

[PR112875][LRA]: Fix an assert in lra elimination code

PR112875 test ran into a wrong assert (gcc_unreachable) in elimination
in a debug insn.  The insn seems ok.  So I change the assertion.
To be more accurate I made it the same as analogous reload pass code.

gcc/ChangeLog:

PR rtl-optimization/112875
* lra-eliminations.cc (lra_eliminate_regs_1): Change an assert.
Add ASM_OPERANDS case.

gcc/testsuite/ChangeLog:

PR rtl-optimization/112875
* gcc.target/i386/pr112875.c: New test.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index f3b75e08390..cf229b402da 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -666,6 +666,10 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode mem_mode,
   return x;
 
 case CLOBBER:
+case ASM_OPERANDS:
+  gcc_assert (insn && DEBUG_INSN_P (insn));
+  break;
+
 case SET:
   gcc_unreachable ();
 
diff --git a/gcc/testsuite/gcc.target/i386/pr112875.c b/gcc/testsuite/gcc.target/i386/pr112875.c
new file mode 100644
index 000..b704404b248
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112875.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Oz -frounding-math -fno-dce -fno-trapping-math -fno-tree-dce -fno-tree-dse -g" } */
+long a, f;
+int b, c, d, g, h, i, j;
+char e;
+void k(long, int l, char t) {
+  char m = b, n = g, o = 0;
+  int p, q, r = h;
+  long s = g;
+  if (f) {
+q = t + (float)16777217;
+o = ~0;
+  }
+  if (e) {
+d = g + a;
+if (d % (a % l)) {
+  p = d;
+  n = b;
+}
+if (l) {
+  i = b;
+  r = a;
+  p = h;
+}
+if (s)
+  s = q;
+c = f;
+e += t;
+a = p;
+  }
+  j = r % n;
+  s += g / 0xc000 + !o;
+}


[pushed] analyzer: fix ICE on infoleak with poisoned size

2023-12-08 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-6348-g08262e78209ed4.

gcc/analyzer/ChangeLog:
* region-model.cc (contains_uninit_p): Only check for
svalues that the infoleak warning can handle.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/infoleak-uninit-size-1.c: New test.
* gcc.dg/plugin/infoleak-uninit-size-2.c: New test.
* gcc.dg/plugin/plugin.exp: Add the new tests.
---
 gcc/analyzer/region-model.cc  | 37 ---
 .../gcc.dg/plugin/infoleak-uninit-size-1.c| 20 ++
 .../gcc.dg/plugin/infoleak-uninit-size-2.c| 20 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|  2 +
 4 files changed, 66 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 2157ad2578b..9b970d7a3e3 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -6557,22 +6557,33 @@ private:
 static bool
 contains_uninit_p (const svalue *sval)
 {
-  struct uninit_finder : public visitor
-  {
-  public:
-uninit_finder () : m_found_uninit (false) {}
-void visit_poisoned_svalue (const poisoned_svalue *sval)
+  switch (sval->get_kind ())
 {
-  if (sval->get_poison_kind () == POISON_KIND_UNINIT)
-   m_found_uninit = true;
-}
-bool m_found_uninit;
-  };
+default:
+  return false;
+case SK_POISONED:
+  {
+   const poisoned_svalue *psval
+ = as_a  (sval);
+   return psval->get_poison_kind () == POISON_KIND_UNINIT;
+  }
+case SK_COMPOUND:
+  {
+   const compound_svalue *compound_sval
+ = as_a  (sval);
 
-  uninit_finder v;
-  sval->accept (&v);
+   for (auto iter : *compound_sval)
+ {
+   const svalue *sval = iter.second;
+   if (const poisoned_svalue *psval
+   = sval->dyn_cast_poisoned_svalue ())
+ if (psval->get_poison_kind () == POISON_KIND_UNINIT)
+   return true;
+ }
 
-  return v.m_found_uninit;
+   return false;
+  }
+}
 }
 
 /* Function for use by plugins when simulating writing data through a
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-1.c
new file mode 100644
index 000..7466112fe14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-1.c
@@ -0,0 +1,20 @@
+/* Reduced from infoleak ICE seen on Linux kernel with
+   -Wno-analyzer-use-of-uninitialized-value.
+
+   Verify that we don't ICE when complaining about an infoleak
+   when the size is uninitialized.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer -Wno-analyzer-use-of-uninitialized-value" } */
+/* { dg-require-effective-target analyzer } */
+
+extern unsigned long
+copy_to_user(void* to, const void* from, unsigned long n);
+
+unsigned long
+test_uninit_size (void *to, void *from)
+{
+  unsigned long n;
+  char buf[16];
+  return copy_to_user(to, from, n);
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-2.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-2.c
new file mode 100644
index 000..a8a383f4b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-uninit-size-2.c
@@ -0,0 +1,20 @@
+/* Reduced from infoleak ICE seen on Linux kernel with
+   -Wno-analyzer-use-of-uninitialized-value.
+
+   Verify that we complain about the uninit value when
+   -Wno-analyzer-use-of-uninitialized-value isn't supplied.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-effective-target analyzer } */
+
+extern unsigned long
+copy_to_user(void* to, const void* from, unsigned long n);
+
+unsigned long
+test_uninit_size (void *to, void *from)
+{
+  unsigned long n;
+  char buf[16];
+  return copy_to_user(to, from, n); /* { dg-warning "use of uninitialized 
value 'n'" } */
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp 
b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index f0b4bb7a051..d6cccb269df 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -150,6 +150,8 @@ set plugin_test_list [list \
  infoleak-CVE-2017-18550-1.c \
  infoleak-antipatterns-1.c \
  infoleak-fixit-1.c \
+ infoleak-uninit-size-1.c \
+ infoleak-uninit-size-2.c \
  infoleak-net-ethtool-ioctl.c \
  infoleak-vfio_iommu_type1.c \
  taint-CVE-2011-0521-1-fixed.c \
-- 
2.26.3



[pushed] analyzer: avoid taint for (TAINTED % NON_TAINTED)

2023-12-08 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-6349-g0bef72539e585d.

gcc/analyzer/ChangeLog:
* sm-taint.cc (taint_state_machine::alt_get_inherited_state): Fix
handling of TRUNC_MOD_EXPR.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/taint-modulus-1.c: New test.
---
 gcc/analyzer/sm-taint.cc  | 9 -
 gcc/testsuite/c-c++-common/analyzer/taint-modulus-1.c | 8 
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/taint-modulus-1.c

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 6b5d51c62af..597e8e55609 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -891,7 +891,6 @@ taint_state_machine::alt_get_inherited_state (const 
sm_state_map &map,
  case MULT_EXPR:
  case POINTER_PLUS_EXPR:
  case TRUNC_DIV_EXPR:
- case TRUNC_MOD_EXPR:
{
  state_t arg0_state = map.get_state (arg0, ext_state);
  state_t arg1_state = map.get_state (arg1, ext_state);
@@ -899,6 +898,14 @@ taint_state_machine::alt_get_inherited_state (const 
sm_state_map &map,
}
break;
 
+ case TRUNC_MOD_EXPR:
+   {
+ /* The left-hand side of X % Y can be sanitized by
+the operation.  */
+ return map.get_state (arg1, ext_state);
+   }
+   break;
+
  case BIT_AND_EXPR:
  case RSHIFT_EXPR:
return NULL;
diff --git a/gcc/testsuite/c-c++-common/analyzer/taint-modulus-1.c 
b/gcc/testsuite/c-c++-common/analyzer/taint-modulus-1.c
new file mode 100644
index 000..ed286fa341c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/taint-modulus-1.c
@@ -0,0 +1,8 @@
+#define SIZE 16
+char buf[SIZE];
+
+__attribute__ ((tainted_args))
+char test_sanitized_by_modulus (int val)
+{
+  return buf[val % SIZE]; /* { dg-bogus "use of attacker-controlled value" } */
+}
-- 
2.26.3



[PATCH v2] c++: fix ICE with sizeof in a template [PR112869]

2023-12-08 Thread Marek Polacek
On Fri, Dec 08, 2023 at 12:09:18PM -0500, Jason Merrill wrote:
> On 12/5/23 15:31, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > This test shows that we cannot clear *walk_subtrees in
> > cp_fold_immediate_r when we're in_immediate_context, because that,
> > as the comment says, affects cp_fold_r as well.  Here we had an
> > expression with
> > 
> >min ((long int) VIEW_CONVERT_EXPR(bytecount), (long 
> > int) <<< Unknown tree: sizeof_expr
> >  (int) <<< error >>> >>>)
> > 
> > as its sub-expression, and we never evaluated that into
> > 
> >min ((long int) bytecount, 4)
> > 
> > so the SIZEOF_EXPR leaked into the middle end.
> > 
> > (There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
> > one should be OK.)
> > 
> > PR c++/112869
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
> > for unevaluated operands.
> 
> I agree that we want this change for in_immediate_context (), but I don't
> see why we want it for TYPE_P or unevaluated_p (code) or
> cp_unevaluated_operand?

No particular reason, just paranoia.  How's this?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

  min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

  min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for in_immediate_context.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
 gcc/cp/cp-gimplify.cc| 6 +-
 gcc/testsuite/g++.dg/template/sizeof18.C | 8 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..6af7c787372 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1179,11 +1179,15 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
 
   /* No need to look into types or unevaluated operands.
  NB: This affects cp_fold_r as well.  */
-  if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
+  if (TYPE_P (stmt) || unevaluated_p (code))
 {
   *walk_subtrees = 0;
   return NULL_TREE;
 }
+  else if (in_immediate_context ())
+/* Don't clear *walk_subtrees here: we still need to walk the subtrees
+   of SIZEOF_EXPR and similar.  */
+return NULL_TREE;
 
   tree decl = NULL_TREE;
   bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}

base-commit: d468718c9a097aeb8794fb1a2df6db2c1064d7f7
-- 
2.43.0



Re: [PATCH 0/4] v2 of Option handling: add documentation URLs

2023-12-08 Thread David Malcolm
On Tue, 2023-11-21 at 23:43 +, Joseph Myers wrote:
> On Tue, 21 Nov 2023, Tobias Burnus wrote:
> 
> > On 21.11.23 14:57, David Malcolm wrote:
> > > On Tue, 2023-11-21 at 02:09 +0100, Hans-Peter Nilsson wrote:
> > > > Sorry for barging in though I did try finding the relevant
> > > > discussion, but is committing this generated stuff necessary?
> > > > Is it because we don't want to depend on Python being
> > > > present at build time?
> > > Partly, yes, [...]
> > 
> > I wonder how to ensure that this remains up to date. Should there
> > be an
> > item at
> > 
> > https://gcc.gnu.org/branching.html and/or
> > https://gcc.gnu.org/releasing.html similar to the .pot generation?
> 
> My suggestion earlier in the discussion was that it should be added
> to the 
> post-commit CI discussed starting at 
>  (I
> think 
> that CI is now in operation).  These are generated files that ought
> to be 
> kept up to date with each commit that affects .opt files, unlike the
> .pot 
> files where the expectation is that they should be up to date for
> releases 
> and updated from time to time at other times for submission to the
> TP.

I had a go at scripting the testing of this, but I am terrible at shell
scripts (maybe I should use Python?).  Here's what I have so far:

$ cat contrib/regenerate-index-urls.sh

set -x

SRC_DIR=$1
BUILD_DIR=$2
NUM_JOBS=$3

# FIXME: error-checking!

mkdir -p $BUILD_DIR || exit 1
cd $BUILD_DIR
$SRC_DIR/configure --disable-bootstrap --enable-languages=c,d,fortran || exit 2
make html-gcc -j$NUM_JOBS || exit 3
cd gcc || exit 4
make regenerate-opt-urls || exit 5
cd $SRC_DIR
(git diff $1 > /dev/null ) && echo "regenerate-opt-urls needs to be run and the 
results committed" || exit 6

# e.g.
#  time bash contrib/regenerate-index-urls.sh $(pwd) $(pwd)/../build-ci 64

This takes about 100 seconds of wallclock on my 64-core box (mostly
configuring gcc, which as well as the usual sequence of unparallelized
tests seems to require building libiberty and lto-plugin).  Is that
something we want to do on every commit?  Is implementing the CI a
blocker for getting the patches in? (if so, I'll likely need some help)

As it turned out, I hadn't regenerated the .opt.urls in my working copy
for a couple of weeks, leading to a correct-looking patch containing
things like:

@@ -154,8 +157,8 @@ 
UrlSuffix(gcc/Warning-Options.html#index-Wbuiltin-declaration-mismatch) LangUrlS
 Wbuiltin-macro-redefined
 UrlSuffix(gcc/Warning-Options.html#index-Wbuiltin-macro-redefined)
 
-Wc11-c2x-compat
-UrlSuffix(gcc/Warning-Options.html#index-Wc11-c2x-compat)
+Wc11-c23-compat
+UrlSuffix(gcc/Warning-Options.html#index-Wc11-c23-compat)
 
 Wc90-c99-compat
 UrlSuffix(gcc/Warning-Options.html#index-Wc90-c99-compat)

so I think the idea works; and the only issue for not regenerating was
some missing/out-of-date URLs.

Dave




[PATCH V2] RISC-V: XFAIL scan dump fails for autovec PR111311

2023-12-08 Thread Edwin Lu
Clean up scan dump failures on linux rv64 vector targets Juzhe mentioned 
could be ignored for now. This will help reduce noise and make it more obvious
if a bug or regression is introduced. The failures that are still reported
are either execution failures or failures that are also present on armv8-a+sve

gcc/testsuite/ChangeLog:

* c-c++-common/vector-subscript-4.c: xfail testcase
* g++.dg/tree-ssa/pr83518.C: ditto
* gcc.dg/attr-alloc_size-11.c: remove xfail
* gcc.dg/signbit-2.c: xfail testcase
* gcc.dg/signbit-5.c: ditto
* gcc.dg/tree-ssa/cunroll-16.c: ditto
* gcc.dg/tree-ssa/gen-vect-34.c: ditto
* gcc.dg/tree-ssa/loop-bound-1.c: ditto
* gcc.dg/tree-ssa/loop-bound-2.c: ditto
* gcc.dg/tree-ssa/pr84512.c: remove xfail
* gcc.dg/tree-ssa/predcom-4.c: xfail testcase
* gcc.dg/tree-ssa/predcom-5.c: ditto
* gcc.dg/tree-ssa/predcom-9.c: ditto
* gcc.dg/tree-ssa/reassoc-46.c: ditto
* gcc.dg/tree-ssa/scev-10.c: ditto
* gcc.dg/tree-ssa/scev-11.c: ditto
* gcc.dg/tree-ssa/scev-14.c: ditto
* gcc.dg/tree-ssa/scev-9.c: ditto
* gcc.dg/tree-ssa/split-path-11.c: ditto
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: ditto
* gcc.dg/tree-ssa/update-threading.c: ditto
* gcc.dg/unroll-8.c: ditto
* gcc.dg/var-expand1.c: ditto
* gcc.dg/vect/pr103116-1.c: ditto
* gcc.dg/vect/pr103116-2.c: ditto
* gcc.dg/vect/pr65310.c: ditto
* gfortran.dg/vect/vect-8.f90: ditto
* lib/target-supports.exp: ditto

Signed-off-by: Edwin Lu 
---
V2 changes:
- added attr-alloc_size-11.c and update-threading.c which were missed in
  previous patch
- remove pr83232.f90 xfail since it was fixed in a recent trunk update
- adjust xfail on split-path-11.c to only apply to rv64
---
 gcc/testsuite/c-c++-common/vector-subscript-4.c  | 3 ++-
 gcc/testsuite/g++.dg/tree-ssa/pr83518.C  | 2 +-
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c| 4 ++--
 gcc/testsuite/gcc.dg/signbit-2.c | 5 +++--
 gcc/testsuite/gcc.dg/signbit-5.c | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-16.c   | 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/loop-bound-2.c | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c  | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c| 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c| 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c| 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c   | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-10.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-11.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-14.c  | 4 +++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-9.c   | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-11.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/update-threading.c | 2 +-
 gcc/testsuite/gcc.dg/unroll-8.c  | 8 +---
 gcc/testsuite/gcc.dg/var-expand1.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr103116-1.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr103116-2.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr65310.c  | 4 ++--
 gcc/testsuite/gfortran.dg/vect/vect-8.f90| 3 ++-
 gcc/testsuite/lib/target-supports.exp| 3 +++
 28 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/vector-subscript-4.c 
b/gcc/testsuite/c-c++-common/vector-subscript-4.c
index 2c2481f88b7..eb0bca1c19e 100644
--- a/gcc/testsuite/c-c++-common/vector-subscript-4.c
+++ b/gcc/testsuite/c-c++-common/vector-subscript-4.c
@@ -25,5 +25,6 @@ foobar(16)
 foobar(32)
 foobar(64)
 
+/* Xfail riscv PR112531.  */
 /* Verify we don't have any vector temporaries in the IL.  */
-/* { dg-final { scan-tree-dump-not "vector" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "vector" "optimized" { xfail { riscv_v && 
vect_variable_length } } } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
index b8a2bd1ebbd..6f2fc56c82c 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
@@ -24,4 +24,4 @@ unsigned test()
   return sum;
 }
 
-/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail 
vect_variable_length } } } */
+/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail { 
vect_variable_length && aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c 
b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index a2efe128915..2828db12e05 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fail because of missing range information.  The xfail
exclu

[gcc-wwwdocs COMMITTED] Disallow /cgit for web robots

2023-12-08 Thread Mark Wielaard
Although cgit is more efficient than gitweb it still is not great
for bots to go through it.
---
 htdocs/robots.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/htdocs/robots.txt b/htdocs/robots.txt
index c650057b..b9fc830d 100644
--- a/htdocs/robots.txt
+++ b/htdocs/robots.txt
@@ -6,6 +6,7 @@ User-agent: *
 Disallow: /viewvc/
 Disallow: /viewcvs
 Disallow: /git/
+Disallow: /cgit/
 Disallow: /svn
 Disallow: /cgi-bin/
 Disallow: /bugzilla/buglist.cgi
-- 
2.39.3



Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-08 Thread 钟居哲
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 00:25
To: gcc-patches; palmer; kito.cheng; Jeff Law; 钟居哲
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
Ah, I forgot to attach the current v2 that also enables strncmp.
It was additionally tested with -minline-strncmp on rv64gcv.
 
Regards
Robin
 
Subject: [PATCH v2] RISC-V: Add vectorized strcmp and strncmp.
 
This patch adds vectorized strcmp and strncmp implementations and
tests.  Similar to strlen, expansion is still guarded by
-minline-str(n)cmp.
 
gcc/ChangeLog:
 
PR target/112109
 
* config/riscv/riscv-protos.h (expand_strcmp): Declare.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
strategy handling and delegation to scalar and vector expanders.
(expand_strcmp): Vectorized implementation.
* config/riscv/riscv.md: Add TARGET_VECTOR to strcmp and strncmp
expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp.c: New test.
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-string.cc  | 161 +-
gcc/config/riscv/riscv.md |   6 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c|  32 
.../riscv/rvv/autovec/builtin/strcmp.c|  13 ++
.../riscv/rvv/autovec/builtin/strncmp-run.c   | 136 +++
.../riscv/rvv/autovec/builtin/strncmp.c   |  13 ++
7 files changed, 357 insertions(+), 5 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c7b5789a4b3..20bbb5b859c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -558,6 +558,7 @@ void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6cde1bf89a0..11c1f74d0b3 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,12 +511,19 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
 return false;
   alignment = UINTVAL (align_rtx);
-  if (TARGET_ZBB || TARGET_XTHEADBB)
+  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
 {
-  return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
- ncompare);
+  bool ok = riscv_vector::expand_strcmp (result, src1, src2,
+  bytes_rtx, alignment,
+  ncompare);
+  if (ok)
+ return true;
 }
+  if ((TARGET_ZBB || TARGET_XTHEADBB) && stringop_strategy & STRATEGY_SCALAR)
+return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+ncompare);
+
   return false;
}
@@ -1092,4 +1099,152 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
 }
}
+/* Implement cmpstr using vector instructions.  The ALIGNMENT and
+   NCOMPARE parameters are unused for now.  */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+unsigned HOST_WIDE_INT, bool)
+{
+  gcc_assert (TARGET_VECTOR);
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  bool with_length = nbytes != NULL_RTX;
+
+  if (with_length
+  && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+return false;
+
+  if (with_length && CONST_INT_P (nbytes))
+nbytes = force_reg (Pmode, nbytes);
+
+  machine_mode mode = E_QImode;
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = TARGET_MAX_LMUL;
+  poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+  ma

[PATCH] strub: add note on attribute access

2023-12-08 Thread Alexandre Oliva
On Dec  7, 2023, Alexandre Oliva  wrote:

> Thanks for raising the issue.  Maybe there should be at least a comment
> there, and perhaps some asserts to check that pointer and reference
> types don't make to indirect_parms.

Document why attribute access doesn't need the same treatment as fn
spec, and check that the assumption behind it holds.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* ipa-strub.cc (pass_ipa_strub::execute): Check that we don't
add indirection to pointer parameters, and document attribute
access non-interactions.
---
 gcc/ipa-strub.cc |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index 2afb7a455751d..8ec6824e8a802 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -2889,6 +2889,13 @@ pass_ipa_strub::execute (function *)
&& (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (nparm)))
<= 4 * UNITS_PER_WORD
{
+ /* No point in indirecting pointer types.  Presumably they
+won't ever pass the size-based test above, but check the
+assumption here, because getting this wrong would mess
+with attribute access and possibly others.  We deal with
+fn spec below.  */
+ gcc_checking_assert (!POINTER_TYPE_P (TREE_TYPE (nparm)));
+
  indirect_nparms.add (nparm);
 
  /* ??? Is there any case in which it is not safe to suggest the parms
@@ -2976,7 +2983,9 @@ pass_ipa_strub::execute (function *)
}
}
 
-   /* ??? Maybe we could adjust it instead.  */
+   /* ??? Maybe we could adjust it instead.  Note we don't need
+  to mess with attribute access: pointer-typed parameters are
+  not modified, so they can remain unchanged.  */
if (drop_fnspec)
  remove_named_attribute_unsharing ("fn spec",
&TYPE_ATTRIBUTES (nftype));


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH 1/2] c-family: -Waddress-of-packed-member and casts

2023-12-08 Thread Alexandre Oliva
On Nov 22, 2023, Jason Merrill  wrote:

> Tested x86_64-pc-linux-gnu, OK for trunk?

FYI, Linaro's regression tester let me know that my patch reversal, that
expected this patch to go in instead, caused two "regressions".
https://linaro.atlassian.net/browse/GNU-1067

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] -finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]

2023-12-08 Thread Alexandre Oliva


smallest_int_mode_for_size may abort when the requested mode is not
available.  Call int_mode_for_size instead, that signals the
unsatisfiable request in a more graceful way.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR middle-end/112784
* expr.cc (emit_block_move_via_loop): Call int_mode_for_size
for maybe-too-wide sizes.
(emit_block_cmp_via_loop): Likewise.

for  gcc/testsuite/ChangeLog

PR middle-end/112784
* gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
---
 gcc/expr.cc|   22 
 .../i386/avx512cd-inline-stringops-pr112784.c  |   12 +++
 2 files changed, 25 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 6da51f2aca296..178b3ec6d5adb 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2449,15 +2449,17 @@ emit_block_move_via_loop (rtx x, rtx y, rtx size,
 }
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_move_mode
-= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_move_mode) != incr * BITS_PER_UNIT)
+  opt_scalar_int_mode int_move_mode
+= int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_move_mode.exists ()
+  || (GET_MODE_BITSIZE (as_a  (int_move_mode))
+ != incr * BITS_PER_UNIT))
 {
   move_mode = BLKmode;
   gcc_checking_assert (can_move_by_pieces (incr, align));
 }
   else
-move_mode = int_move_mode;
+move_mode = as_a  (int_move_mode);
 
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
   y_addr = force_operand (XEXP (y, 0), NULL_RTX);
@@ -2701,16 +2703,18 @@ emit_block_cmp_via_loop (rtx x, rtx y, rtx len, tree 
len_type, rtx target,
   iter = gen_reg_rtx (iter_mode);
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_cmp_mode
-= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_cmp_mode) != incr * BITS_PER_UNIT
-  || !can_compare_p (NE, int_cmp_mode, ccp_jump))
+  opt_scalar_int_mode int_cmp_mode
+= int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_cmp_mode.exists ()
+  || (GET_MODE_BITSIZE (as_a  (int_cmp_mode))
+ != incr * BITS_PER_UNIT)
+  || !can_compare_p (NE, as_a  (int_cmp_mode), ccp_jump))
 {
   cmp_mode = BLKmode;
   gcc_checking_assert (incr != 1);
 }
   else
-cmp_mode = int_cmp_mode;
+cmp_mode = as_a  (int_cmp_mode);
 
   /* Save the base addresses.  */
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c 
b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
new file mode 100644
index 0..c81f99c693c24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512cd -finline-stringops" } */
+
+struct S {
+  int e;
+} __attribute__((aligned(128)));
+
+int main() {
+  struct S s1;
+  struct S s2;
+  int v = __builtin_memcmp(&s1, &s2, sizeof(s1));
+}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] -finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

2023-12-08 Thread Alexandre Oliva


On aarch64 -milp32, and presumably on other such targets, ptr can be
in a different mode than ptr_mode in the testcase.  Cope with it.

Regstrapped on x86_64-linux-gnu, also tested the new test on
aarch64-elf.  Ok to install?


for  gcc/ChangeLog

PR target/112804
* builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
for the increment.

for  gcc/testsuite/ChangeLog

PR target/112804
* gcc.target/aarch64/inline-mem-set-pr112804.c: New.
---
 gcc/builtins.cc|2 +-
 .../gcc.target/aarch64/inline-mem-set-pr112804.c   |7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 38b0acff13124..12a535d313f12 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -4519,7 +4519,7 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
  to = change_address (to, QImode, 0);
  emit_move_insn (to, val);
  if (update_needed)
-   next_ptr = plus_constant (ptr_mode, ptr, blksize);
+   next_ptr = plus_constant (GET_MODE (ptr), ptr, blksize);
}
   else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c 
b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
new file mode 100644
index 0..fe8414559864d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-finline-stringops -mabi=ilp32 -ftrivial-auto-var-init=zero" 
} */
+
+short m(unsigned k) {
+  const unsigned short *n[65];
+  return 0;
+}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] -finline-stringops: check base blksize for memset [PR112778]

2023-12-08 Thread Alexandre Oliva


The recently-added logic for -finline-stringops=memset introduced an
assumption that doesn't necessarily hold, namely, that
can_store_by_pieces of a larger size implies can_store_by_pieces by
smaller sizes.  Checks for all sizes the by-multiple-pieces machinery
might use before committing to an expansion pattern.

Regstrapped (and slightly different version) and regstrapping this one
on x86_64-linux-gnu.  Ok to install?

(FWIW, for completeness, I've just launched bootstraps with
-finline-stringops on ppc64le-linux-gnu, and aarch64-linux-gnu, and will
do so on x86_64-linux-gnu as soon as my retesting completes.)


for  gcc/ChangeLog

PR target/112778
* builtins.cc (can_store_by_multiple_pieces): New.
(try_store_by_multiple_pieces): Call it.

for  gcc/testsuite/ChangeLog

PR target/112778
* gcc.dg/inline-mem-cmp-pr112778.c: New.
---
 gcc/builtins.cc|   57 
 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c |   10 
 2 files changed, 58 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 12a535d313f12..ad8497192a2dd 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -4284,6 +4284,40 @@ expand_builtin_memset (tree exp, rtx target, 
machine_mode mode)
   return expand_builtin_memset_args (dest, val, len, target, mode, exp);
 }
 
+/* Check that store_by_pieces allows BITS + LEN (so that we don't
+   expand something too unreasonably long), and every power of 2 in
+   BITS.  It is assumed that LEN has already been tested by
+   itself.  */
+static bool
+can_store_by_multiple_pieces (unsigned HOST_WIDE_INT bits,
+ by_pieces_constfn constfun,
+ void *constfundata, unsigned int align,
+ bool memsetp,
+ unsigned HOST_WIDE_INT len)
+{
+  if (bits
+  && !can_store_by_pieces (bits + len, constfun, constfundata,
+  align, memsetp))
+return false;
+
+  /* Avoid the loop if we're just going to repeat the same single
+ test.  */
+  if (!len && popcount_hwi (bits) == 1)
+return true;
+
+  for (int i = ctz_hwi (bits); i >= 0; i = ctz_hwi (bits))
+{
+  unsigned HOST_WIDE_INT bit = 1;
+  bit <<= i;
+  bits &= ~bit;
+  if (!can_store_by_pieces (bit, constfun, constfundata,
+   align, memsetp))
+   return false;
+}
+
+  return true;
+}
+
 /* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
Return TRUE if successful, FALSE otherwise.  TO is assumed to be
aligned at an ALIGN-bits boundary.  LEN must be a multiple of
@@ -4341,7 +4375,11 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
   else
 /* Huh, max_len < min_len?  Punt.  See pr100843.c.  */
 return false;
-  if (min_len >= blksize)
+  if (min_len >= blksize
+  /* ??? Maybe try smaller fixed-prefix blksizes before
+punting?  */
+  && can_store_by_pieces (blksize, builtin_memset_read_str,
+ &valc, align, true))
 {
   min_len -= blksize;
   min_bits = floor_log2 (min_len);
@@ -4367,8 +4405,9 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
  happen because of the way max_bits and blksize are related, but
  it doesn't hurt to test.  */
   if (blksize > xlenest
-  || !can_store_by_pieces (xlenest, builtin_memset_read_str,
-  &valc, align, true))
+  || !can_store_by_multiple_pieces (xlenest - blksize,
+   builtin_memset_read_str,
+   &valc, align, true, blksize))
 {
   if (!(flag_inline_stringops & ILSOP_MEMSET))
return false;
@@ -4386,17 +4425,17 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
 of overflow.  */
  if (max_bits < orig_max_bits
  && xlenest + blksize >= xlenest
- && can_store_by_pieces (xlenest + blksize,
- builtin_memset_read_str,
- &valc, align, true))
+ && can_store_by_multiple_pieces (xlenest,
+  builtin_memset_read_str,
+  &valc, align, true, blksize))
{
  max_loop = true;
  break;
}
  if (blksize
- && can_store_by_pieces (xlenest,
- builtin_memset_read_str,
- &valc, align, true))
+ && can_store_by_multiple_pieces (xlenest,
+  builtin_memset_read_str,
+  &valc, align, true, 0))
{
  max_len += b

[PATCH] multiflags: fix doc warning

2023-12-08 Thread Alexandre Oliva


Comply with dubious doc warning that after an @xref there must be a
comma or a period, not a close parentheses.

Build-testing on x86_64-linux-gnu now.  Ok to install?


for  gcc/ChangeLog

* doc/invoke.texi (multiflags): Add period after @xref to
silence warning.
---
 gcc/doc/invoke.texi |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d4e689b64c010..4e67c95dbf85a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20589,7 +20589,7 @@ allocation before or after interprocedural optimization.
 This option enables multilib-aware @code{TFLAGS} to be used to build
 target libraries with options different from those the compiler is
 configured to use by default, through the use of specs (@xref{Spec
-Files}) set up by compiler internals, by the target, or by builders at
+Files}.) set up by compiler internals, by the target, or by builders at
 configure time.
 
 Like @code{TFLAGS}, this allows the target libraries to be built for

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] RISC-V: Support highest overlap for wv instructions

2023-12-08 Thread Juzhe-Zhong
According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap.

Before this patch:

nop
vsetivlizero,4,e8,m4,tu,ma
vle16.v v8,0(a0)
vmv8r.v v0,v8
vwsub.wvv0,v8,v12
nop
addia4,a0,100
vle16.v v8,0(a4)
vmv8r.v v24,v8
vwsub.wvv24,v8,v12
nop
addia4,a0,200
vle16.v v8,0(a4)
vmv8r.v v16,v8
vwsub.wvv16,v8,v12
nop

After this patch:

nop
vsetivlizero,4,e8,m4,tu,ma
vle16.v v0,0(a0)
vwsub.wvv0,v0,v4
nop
addia4,a0,100
vle16.v v24,0(a4)
vwsub.wvv24,v24,v28
nop
addia4,a0,200
vle16.v v16,0(a4)
vwsub.wvv16,v16,v20

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highest overlap for wv instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-39.c: New test.
* gcc.target/riscv/rvv/base/pr112431-40.c: New test.
* gcc.target/riscv/rvv/base/pr112431-41.c: New test.

---
 gcc/config/riscv/vector.md|  88 +-
 .../gcc.target/riscv/rvv/base/pr112431-39.c   | 158 ++
 .../gcc.target/riscv/rvv/base/pr112431-40.c   |  94 +++
 .../gcc.target/riscv/rvv/base/pr112431-41.c   |  62 +++
 4 files changed, 360 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-40.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-41.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ba0714a9971..31c13a6dcca 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3795,46 +3795,48 @@
(set_attr "group_overlap" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])
 
 (define_insn "@pred_single_widen_sub"
-  [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr, vd, 
vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr")
(if_then_else:VWEXTI
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand"  "i,
i")
+   [(match_operand: 1 "vector_mask_operand"   " vm,Wc1, 
vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1")
+(match_operand 5 "vector_length_operand"  " rK, rK, 
rK, rK, rK, rK, rK, rK, rK, rK, rK, rK,   rK,   rK")
+(match_operand 6 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
+(match_operand 7 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
+(match_operand 8 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (minus:VWEXTI
-   (match_operand:VWEXTI 3 "register_operand" "   vr,   
vr")
+   (match_operand:VWEXTI 3 "register_operand" " vr, vr, 
vr, vr, vr, vr, vr, vr, vr, vr, vr, vr,   vr,   vr")
(any_extend:VWEXTI
- (match_operand: 4 "register_operand" "   vr,   
vr")))
- (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,
0")))]
+ (match_operand: 4 "register_operand" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,   vr,   vr")))
+ (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, vu,  
0,  0, vu, vu,  0,  0, vu, vu,  0,  0,   vu,0")))]
   "TARGET_VECTOR"
   "vwsub.wv\t%0,%3,%4%p1"
   [(set_attr "type" "viwalu")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "group_overlap" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])
 
 (define_insn "@pred_single_widen_add"
-  [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr, vd, 
vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr")
(if_then_else:VWEXTI
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand" 

Re: [PATCH v2] c++: fix ICE with sizeof in a template [PR112869]

2023-12-08 Thread Jason Merrill

On 12/8/23 16:15, Marek Polacek wrote:

On Fri, Dec 08, 2023 at 12:09:18PM -0500, Jason Merrill wrote:

On 12/5/23 15:31, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
  (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for unevaluated operands.


I agree that we want this change for in_immediate_context (), but I don't
see why we want it for TYPE_P or unevaluated_p (code) or
cp_unevaluated_operand?


No particular reason, just paranoia.  How's this?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

   min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
 (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

   min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for in_immediate_context.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
  gcc/cp/cp-gimplify.cc| 6 +-
  gcc/testsuite/g++.dg/template/sizeof18.C | 8 
  2 files changed, 13 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..6af7c787372 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1179,11 +1179,15 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
  
/* No need to look into types or unevaluated operands.

   NB: This affects cp_fold_r as well.  */
-  if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
+  if (TYPE_P (stmt) || unevaluated_p (code))
  {
*walk_subtrees = 0;
return NULL_TREE;
  }
+  else if (in_immediate_context ())
+/* Don't clear *walk_subtrees here: we still need to walk the subtrees
+   of SIZEOF_EXPR and similar.  */
+return NULL_TREE;
  
tree decl = NULL_TREE;

bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}


Hmm, actually, why does the above make a difference for this testcase?

...

It seems that in_immediate_context always returns true in 
cp_fold_function because current_binding_level->kind == 
sk_template_parms.  That seems like a problem.  Maybe for 
cp_fold_immediate_r we only want to check cp_unevaluated_operand or 
DECL_IMMEDIATE_CONTEXT (current_function_decl)?


Jason



[PATCH v2] -finline-stringops: check base blksize for memset [PR112778]

2023-12-08 Thread Alexandre Oliva
Scratch the previous one, the "slightly different version" I had before
it was not entirely broken due to unnecessary, suboptimal and incorrect
use of ctz.  Here I have yet another implementation of that loop that
should perform better and even work correctly ;-)


This one has so far regstrapped on x86_64-linux-gnu (v1 failed in
regression testing, sorry), and bootstrapped with -finline-stringops on
ppc64le-linux-gnu (still ongoing on x86-64-linux-gnu and
aarch64-linux-gnu).  Ok to install?


The recently-added logic for -finline-stringops=memset introduced an
assumption that doesn't necessarily hold, namely, that
can_store_by_pieces of a larger size implies can_store_by_pieces by
smaller sizes.  Checks for all sizes the by-multiple-pieces machinery
might use before committing to an expansion pattern.


for  gcc/ChangeLog

PR target/112778
* builtins.cc (can_store_by_multiple_pieces): New.
(try_store_by_multiple_pieces): Call it.

for  gcc/testsuite/ChangeLog

PR target/112778
* gcc.dg/inline-mem-cmp-pr112778.c: New.
---
 gcc/builtins.cc|   57 
 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c |   10 
 2 files changed, 58 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 12a535d313f12..f6c96498f0783 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -4284,6 +4284,40 @@ expand_builtin_memset (tree exp, rtx target, 
machine_mode mode)
   return expand_builtin_memset_args (dest, val, len, target, mode, exp);
 }
 
+/* Check that store_by_pieces allows BITS + LEN (so that we don't
+   expand something too unreasonably long), and every power of 2 in
+   BITS.  It is assumed that LEN has already been tested by
+   itself.  */
+static bool
+can_store_by_multiple_pieces (unsigned HOST_WIDE_INT bits,
+ by_pieces_constfn constfun,
+ void *constfundata, unsigned int align,
+ bool memsetp,
+ unsigned HOST_WIDE_INT len)
+{
+  if (bits
+  && !can_store_by_pieces (bits + len, constfun, constfundata,
+  align, memsetp))
+return false;
+
+  /* BITS set are expected to be generally in the low range and
+ contiguous.  We do NOT want to repeat the test above in case BITS
+ has a single bit set, so we terminate the loop when BITS == BIT.
+ In the unlikely case that BITS has the MSB set, also terminate in
+ case BIT gets shifted out.  */
+  for (unsigned HOST_WIDE_INT bit = 1; bit < bits && bit; bit <<= 1)
+{
+  if ((bits & bit) == 0)
+   continue;
+
+  if (!can_store_by_pieces (bit, constfun, constfundata,
+   align, memsetp))
+   return false;
+}
+
+  return true;
+}
+
 /* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
Return TRUE if successful, FALSE otherwise.  TO is assumed to be
aligned at an ALIGN-bits boundary.  LEN must be a multiple of
@@ -4341,7 +4375,11 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
   else
 /* Huh, max_len < min_len?  Punt.  See pr100843.c.  */
 return false;
-  if (min_len >= blksize)
+  if (min_len >= blksize
+  /* ??? Maybe try smaller fixed-prefix blksizes before
+punting?  */
+  && can_store_by_pieces (blksize, builtin_memset_read_str,
+ &valc, align, true))
 {
   min_len -= blksize;
   min_bits = floor_log2 (min_len);
@@ -4367,8 +4405,9 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
  happen because of the way max_bits and blksize are related, but
  it doesn't hurt to test.  */
   if (blksize > xlenest
-  || !can_store_by_pieces (xlenest, builtin_memset_read_str,
-  &valc, align, true))
+  || !can_store_by_multiple_pieces (xlenest - blksize,
+   builtin_memset_read_str,
+   &valc, align, true, blksize))
 {
   if (!(flag_inline_stringops & ILSOP_MEMSET))
return false;
@@ -4386,17 +4425,17 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
int ctz_len,
 of overflow.  */
  if (max_bits < orig_max_bits
  && xlenest + blksize >= xlenest
- && can_store_by_pieces (xlenest + blksize,
- builtin_memset_read_str,
- &valc, align, true))
+ && can_store_by_multiple_pieces (xlenest,
+  builtin_memset_read_str,
+  &valc, align, true, blksize))
{
  max_loop = true;
  break;
}
  if (blksize
- && can_store_by_pieces (xlenest,
-