date:20231110

Re: [PATCH] Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]

2023-11-10 Thread Richard Biener

On Thu, 9 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds 6 new type-generic builtins,
> __builtin_clzg
> __builtin_ctzg
> __builtin_clrsbg
> __builtin_ffsg
> __builtin_parityg
> __builtin_popcountg
> The g at the end stands for generic because the unsuffixed variant
> of the builtins already have unsigned int or int arguments.
> 
> The main reason to add these is to support arbitrary unsigned (for
> clrsb/ffs signed) bit-precise integer types and also __int128 which
> wasn't supported by the existing builtins, so that e.g. 
> type-generic functions could then support not just bit-precise unsigned
> integer type whose width matches a standard or extended integer type,
> but others too.
> 
> None of these new builtins promote their first argument, so the argument
> can be e.g. unsigned char or unsigned short or unsigned __int20 etc.

But is that a good idea?  Is that how type generic functions work in C?
I think it introduces non-obvious/unexpected behavior in user code.

If people do not want to "compensate" for this maybe insted also add
__builtin_*{8,16} (like we have for the bswap variants)?

Otherwise this looks reasonable.  I'm not sure why we need separate
CFN_CLZ and CFN_BUILT_IN_CLZG?  (why CFN_BUILT_IN_CLZG and not CFN_CLZG?)
That is, I'm confused about

 CASE_CFN_CLRSB:
+case CFN_BUILT_IN_CLRSBG:

why does CASE_CFN_CLRSB not include CLRSBG?  It includes IFN_CLRSB, no?
And IFN_CLRSB already has the two and one arg case and thus encompasses
some BUILT_IN_CLRSBG cases?

> The first 2 support either 1 or 2 arguments, if only 1 argument is supplied,
> the behavior is undefined for argument 0 like for other __builtin_c[lt]z*
> builtins, if 2 arguments are supplied, the second argument should be int
> that will be returned if the argument is 0.  All other builtins have
> just one argument.  For __builtin_clrsbg and __builtin_ffsg the argument
> shall be any signed standard/extended or bit-precise integer, for the others
> any unsigned standard/extended or bit-precise integer (bool not allowed).
> 
> One possibility would be to also allow signed integer types for
> the clz/ctz/parity/popcount ones (and just cast the argument to
> unsigned_type_for during folding) and similarly unsigned integer types
> for the clrsb/ffs ones, dunno what is better; for stdbit.h the current
> version is sufficient and diagnoses use of the inappropriate sign,
> though on the other side I wonder if users won't be confused by
> __builtin_clzg (1) being an error and having to write __builtin_clzg (1U).
> And I think we don't have anything in C that would allow casting to
> corresponding unsigned type (or vice versa) given arbitrary integral type,
> one could use _Generic for that for standard and extended types, but not
> for arbitrary _BitInt.  What do you think?
> 
> The new builtins are lowered to corresponding builtins with other suffixes
> or internal calls (plus casts and adjustments where needed) during FE
> folding or during gimplification at latest, the non-suffixed builtins
> handling precisions up to precision of int, l up to precision of long,
> ll up to precision of long long, up to __int128 precision lowered to
> double-word expansion early and the rest (which must be _BitInt) lowered
> to internal fn calls - those are then lowered during bitint lowering pass.
> 
> The patch also changes representation of IFN_CLZ and IFN_CTZ calls,
> previously they were in the IL only if they are directly supported optab
> and depending on C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 they had or didn't
> have defined behavior at 0, now they are in the IL either if directly
> supported optab, or for the large/huge BITINT_TYPEs and they have either
> 1 or 2 arguments.  If one, the behavior is undefined at zero, if 2, the
> second argument is an int constant that should be returned for 0.
> As there is no extra support during expansion, for directly supported optab
> the second argument if present should still match the
> C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 value, but for BITINT_TYPE arguments
> it can be arbitrary int INTEGER_CST.
> 
> The goal is e.g.
> #ifdef __has_builtin
> #if __has_builtin(__builtin_clzg) && __has_builtin(__builtin_popcountg)
> #define stdc_leading_zeros(x) \
>   __builtin_clzg (x, __builtin_popcountg ((__typeof (x)) -1))
> #endif
> #endif
> where __builtin_popcountg ((__typeof (x)) -1) computes the bit precision
> of x's type (kind of _Bitwidthof (x) alternative).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Besides the above question I'd say OK (I assume Josephs reply is a
general ack from his side).

Thanks,
Richard.

> 2023-11-09  Jakub Jelinek  
> 
>   PR c/111309
> gcc/
>   * builtins.def (BUILT_IN_CLZG, BUILT_IN_CTZG, BUILT_IN_CLRSBG,
>   BUILT_IN_FFSG, BUILT_IN_PARITYG, BUILT_IN_POPCOUNTG): New
>   builtins.
>   * builtins.cc (fold_builtin_bit_query): New function.
>   (fold_builtin_1): Use it for
>   BUILT_IN_{CLZ,CTZ

Re: [PATCH] RISC-V: Robustify vec_init pattern[NFC]

2023-11-10 Thread Robin Dapp

Hi Juzhe,

yes, that's reasonable. OK.

Regards
 Robin

[PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec

2023-11-10 Thread pan2 . li

From: Pan Li 

The internal-fn has support the FLOATN already. This patch
would like to re-enable the vector HFmode for the autovec for
below standard name mode iterators.

1. lrint
2. llround

For now the vector HFmodes are disabled to limit the impact,
and the underlying FP16 rint/round autovec will enable this
one by one.

gcc/ChangeLog:

* config/riscv/autovec.md: Disable vector HFmode for
rint, round, ceil and floor.
* config/riscv/vector-iterators.md: Add vector HFmode
for rint, round, ceil and floor mode iterator.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md  | 26 +++-
 gcc/config/riscv/vector-iterators.md | 59 +++-
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33722ea1139..a199caabf87 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2443,12 +2443,11 @@ (define_expand "roundeven2"
   }
 )
 
-;; Add mode_size equal check as we opened the modes for different sizes.
-;; The check will be removed soon after related codegen implemented
 (define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2458,7 +2457,8 @@ (define_expand "lrint2"
 (define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2468,7 +2468,8 @@ (define_expand "lrint2"
 (define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2478,7 +2479,8 @@ (define_expand "lround2"
 (define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2488,7 +2490,8 @@ (define_expand "lround2"
 (define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2498,7 +2501,8 @@ (define_expand "lceil2"
 (define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2508,7 +2512,8 @@ (define_expand "lceil2"
 (define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2518,7 +2523,8 @@ (define_expand "lfloor2"
 (define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e80eaedc4b3..f2d9f60b631 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3221,15 +3221,20 @@ (define_mode_attr vnnconvert [
 ;; V_F2SI_CONVERT: (HF, SF, DF) => SI
 ;; V_F2DI_CONVERT: (HF, SF, DF) => DI
 ;;
-;; HF requires addit

Re: [PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec

2023-11-10 Thread juzhe.zh...@rivai.ai

No test?



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-10 16:14
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec
From: Pan Li 
 
The internal-fn has support the FLOATN already. This patch
would like to re-enable the vector HFmode for the autovec for
below standard name mode iterators.
 
1. lrint
2. llround
 
For now the vector HFmodes are disabled to limit the impact,
and the underlying FP16 rint/round autovec will enable this
one by one.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Disable vector HFmode for
rint, round, ceil and floor.
* config/riscv/vector-iterators.md: Add vector HFmode
for rint, round, ceil and floor mode iterator.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md  | 26 +++-
gcc/config/riscv/vector-iterators.md | 59 +++-
2 files changed, 73 insertions(+), 12 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33722ea1139..a199caabf87 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2443,12 +2443,11 @@ (define_expand "roundeven2"
   }
)
-;; Add mode_size equal check as we opened the modes for different sizes.
-;; The check will be removed soon after related codegen implemented
(define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2458,7 +2457,8 @@ (define_expand "lrint2"
(define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2468,7 +2468,8 @@ (define_expand "lrint2"
(define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2478,7 +2479,8 @@ (define_expand "lround2"
(define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2488,7 +2490,8 @@ (define_expand "lround2"
(define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2498,7 +2501,8 @@ (define_expand "lceil2"
(define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2508,7 +2512,8 @@ (define_expand "lceil2"
(define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2518,7 +2523,8 @@ (define_expand "lfloor2"
(define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e80eaedc4b3..f2d9f60b631 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector

RE: [PATCH] RISC-V: Robustify vec_init pattern[NFC]

2023-11-10 Thread Li, Pan2

Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, November 10, 2023 4:12 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Robustify vec_init pattern[NFC]

Hi Juzhe,

yes, that's reasonable. OK.

Regards
 Robin

RE: [PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec

2023-11-10 Thread Li, Pan2

This patch only add new modes to iterator, I failed to find a way to test it.
Maybe I can add underlying lrint autovec implment together, which is more 
straightforward
to add test cases here.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, November 10, 2023 4:16 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec

No test?


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-11-10 16:14
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Add HFmode for l/ll round and rint autovec
From: Pan Li mailto:pan2...@intel.com>>

The internal-fn has support the FLOATN already. This patch
would like to re-enable the vector HFmode for the autovec for
below standard name mode iterators.

1. lrint
2. llround

For now the vector HFmodes are disabled to limit the impact,
and the underlying FP16 rint/round autovec will enable this
one by one.

gcc/ChangeLog:

* config/riscv/autovec.md: Disable vector HFmode for
rint, round, ceil and floor.
* config/riscv/vector-iterators.md: Add vector HFmode
for rint, round, ceil and floor mode iterator.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/autovec.md  | 26 +++-
gcc/config/riscv/vector-iterators.md | 59 +++-
2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33722ea1139..a199caabf87 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2443,12 +2443,11 @@ (define_expand "roundeven2"
   }
)
-;; Add mode_size equal check as we opened the modes for different sizes.
-;; The check will be removed soon after related codegen implemented
(define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2458,7 +2457,8 @@ (define_expand "lrint2"
(define_expand "lrint2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2468,7 +2468,8 @@ (define_expand "lrint2"
(define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2478,7 +2479,8 @@ (define_expand "lround2"
(define_expand "lround2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2488,7 +2490,8 @@ (define_expand "lround2"
(define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2498,7 +2501,8 @@ (define_expand "lceil2"
(define_expand "lceil2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
 DONE;
@@ -2508,7 +2512,8 @@ (define_expand "lceil2"
(define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& GET_MODE_INNER (mode) != HFmode"
   {
 riscv_vector::expand_vec_lfl

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding


Hi Dimitar,

Thanks for the tests.


This patch set breaks the build for at least three embedded targets. See
below.

For avr the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded 
‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
   149 | set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);


I think it's because `HARD_REG_SET` and `unsigned int` are of the same 
type on avr target(i.e. No more than 32 registers on avr target), so 
these two bellow function prototypes conflict, I'll adjust it.


static void
set_subreg_conflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)

static void
set_subreg_conflict_hard_regs (ira_allocno_t a, unsigned int regno)


For arm-none-eabi the newlib build fails with:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal 
compiler error: Floating point exception
   279 | }
   | ^
0x1176e0f crash_signal
 /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
0xf60312 new_insn_reg
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0xf6064d add_regs_to_insn_regno_info
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
0xf62909 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0xf62e46 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
0xf62e46 lra_push_insn_1
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0xf62f2d lra_push_insn(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0xf62f2d push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0xf63302 push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
0xf63302 lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0xf0e399 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xf0e399 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148

The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following 
rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding 
])) -1
  (nil))


I just cross compiled an arm-none-eabi compiler and didn't encounter 
this error, can you give me a little more config info about build? For 
example, flags_for_target, etc. Thanks again.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [pushed][PATCH v1] LoongArch: Fix instruction name typo in lsx_vreplgr2vr_ template

2023-11-10 Thread chenglulu


Pushed to r14-5314.

在 2023/11/3 下午5:01, Chenghui Pan 写道:

gcc/ChangeLog:

* config/loongarch/lsx.md: Fix instruction name typo in
lsx_vreplgr2vr_ template.
---
  gcc/config/loongarch/lsx.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 4af32c8dfe1..55c7d79a030 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1523,7 +1523,7 @@ (define_insn "lsx_vreplgr2vr_"
"ISA_HAS_LSX"
  {
if (which_alternative == 1)
-return "ldi.\t%w0,0";
+return "vldi.\t%w0,0";
  
if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode))

  return "#";

Re: [PATCH v4 2/2] c++: Diagnostics for P0847R7 (Deducing this) [PR102609]

2023-11-10 Thread waffl3x

> > I had wanted to write about some of my frustrations with trying to
> > write a test for virtual specifiers and errors/warnings for
> > shadowing/overloading virtual functions, but I am a bit too tired at
> > the moment and I don't want to delay getting this up for another night.
> > In short, the standard does not properly specify the criteria for
> > overriding functions, which leaves a lot of ambiguity in how exactly we
> > should be handling these cases.
> 
> 
> Agreed, this issue came up in the C++ committee meeting today. See
> 
> https://cplusplus.github.io/CWG/issues/2553.html
> https://cplusplus.github.io/CWG/issues/2554.html
> 
> for draft changes to clarify some of these issues.

Ah I guess I should have read all your responses before sending any
back instead of going one by one. I ended up writing about this again I
think.

struct B {
virtual void f() const {}
};

struct S0 : B {
void f() {}
};

struct S1 : B {
void f(this S1&) {}
};

I had a bit of a debate with a peer, I initially disagreed with the
standard resolution because it disallows S1's declaration of f, while
S0's is an overload. I won't bore you with all the details of going
back and forth about the proposed wording, my feelings are mostly that
being able to overload something with an iobj member function but not
being able to with an xobj member function was inconsistent. He argued
that keeping restrictions high at first and lowering them later is
better, and overloading virtual functions is already not something you
should really ever do, so he was in favor of the proposed wording.

In light of our debate, my stance is that we should implement things as
per the proposed wording. 

struct B {
  virtual void foo() const&;
};

struct D : B {
  void foo(this int);
};

This would be ill-formed now according to the change in wording. My
laymans interpretation of the semantics are that, the parameter list is
the same when the xobj parameter is ignore, so it overrides. And since
it overrides, and xobj member functions are not allowed to override, it
is an error.

To be honest, I still don't understand where the wording accounts for
the qualifiers of B::f, but my friend assured me that it is.

> > The standard also really poorly
> > specifies things related to the implicit object parameter and implicit
> > object argument which also causes some trouble. Anyhow, for the time
> > being I am not including my test for diagnostics related to a virtual
> > specifier on xobj member functions. I can't get it to a point I am
> > happy with it and I think there will need to be some discussion on how
> > exactly we want to handle that.
> 
> 
> The discussion might be easier with the testcase to refer to?

I'm pretty clear on how to proceed now. My biggest issue with the tests
is I didn't know what combination of errors to emit, whether to warn
for overriding, etc. There is still a bit of an issue on how to emit
errors for virtual specifiers on xobj member functions, it gets noisy
quickly, but maybe we do just emit all of them? This aside, what should
be done here is clear now.

Somewhat related is some warnings I wanted to implement for impossible
to call by-value xobj member functions. Ones that involve an unrelated
type get dicey because people could in theory have legitimate use cases
for that, P0847R7 includes an example of this combining recursive
lambdas and the overload pattern to create a visitor. However, I think
it would be reasonable to warn when a by-value xobj member function can
not be called due to the presence of overloads that take references.
Off the top of my head I don't recall how many overloads it would take
to prevent a by-value version from being called, nor have I looked into
how to implement this. But I do know that redeclarations of xobj member
functions as iobj member functions (and vice-versa) are not properly
identified when ref qualifiers are omitted from the corresponding (is
this the correct term here?) iobj member function.

> > I was fairly lazy with the changelog and commit message in this patch
> > as I expect to need to do another round on this patch before it can be
> > accepted. One specific question I have is whether I should be listing
> > out all the diagnostics that were added to a function. For the cases
> > where there were only one diagnostic added I stated it, but for
> > grokdeclarator which has the majority of the diagnostics I did not. I
> > welcome input here, really I request it, because the changelogs are
> > still fairly difficult for me to write. Hell, the commit messages are
> > hard to write, I feel I went overboard on the first patch but I guess
> > it's a fairly large patch so maybe it's alright? Again, I am looking
> > for feedback here if anyone is willing to provide it.
> 
> 
> ChangeLog entries are very brief summaries of the changes, there's
> absolutely no need to enumerate multiple diagnostics. If someone wants
> more detail they can look at the patch.

Noted, thank goo

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding

The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the 
following rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ 
encoding ])) -1

  (nil))


I just cross compiled an arm-none-eabi compiler and didn't encounter 
this error, can you give me a little more config info about build? For 
example, flags_for_target, etc. Thanks again.




Forgot, please also provide the version information of newlib code.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

[RFC][V2] RISC-V: Support -mcmodel=large.

2023-11-10 Thread KuanLin Chen

gcc/ChangeLog:

* gcc/config/riscv/predicates.md(move_operand): Check SYMBOL_REF
and LABEL_REF type.
(call_insn_operand): Support for CM_Large.
(pcrel_symbol_operand): New.
* gcc/config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add builtin_define
"__riscv_cmodel_large".
* gcc/config/riscv/riscv-opts.h (riscv_code_model): Define CM_LARGE.
* gcc/config/riscv/riscv-protos.h (riscv_symbol_type): Define
SYMBOL_FORCE_TO_MEM.
* gcc/config/riscv/riscv.cc (riscv_classify_symbol) Support CM_LARGE model.
(riscv_symbol_insns) Add SYMBOL_FORCE_TO_MEM.
(riscv_cannot_force_const_mem): Ditto.
(riscv_split_symbol): Ditto.
(riscv_force_address): Check pseudo reg available before force_reg.
(riscv_can_use_per_function_literal_pools_p): New.
(riscv_elf_select_rtx_section): Literal pool stays with the function.
(riscv_output_mi_thunk): Add riscv_in_thunk_func.
(riscv_option_override): Support CM_LARGE model.
(riscv_function_ok_for_sibcall): Disable sibcalls in CM_LARGE model.
* gcc/config/riscv/riscv.h (ASM_OUTPUT_POOL_EPILOGUE): Hookfg
* gcc/config/riscv/riscv.md (unspec): Define UNSPEC_FORCE_FOR_MEM.
(*large_load_address"): New.
* gcc/config/riscv/riscv.opt (code_model): New.

gcc/testsuite/ChangeLog:


  * gcc/testsuite/gcc.target/riscv/large-model.c: New test.


Hi Jeff,

Thanks for your review.

> return (absolute_symbolic_oeprand (op, mode)>   || 
> plt_symbolic_operand (op, mode)
>|| register_operand (op, mode);
Sorry for the unformatted indet. Fixed it at the V2 patch.
>> @@ -1972,7 +1992,19 @@ static rtx
>>   riscv_force_address (rtx x, machine_mode mode)
>>   {
>> if (!riscv_legitimate_address_p (mode, x, false))
>>  -x = force_reg (Pmode, x);
>> +{
>> +  if (can_create_pseudo_p ())
>> + return force_reg (Pmode, x);
> Note that $ra is fixed now.  So if you need a scratch register, you can
> fall back to $ra.

> More importantly, what are the circumstances where you can be asked to
> force an address after the register allocation/reloading phase is
> complete?  Or does it happen within the register allocators (the latter
> would be an indicator we need a secondary reload).

This address forcing is from riscv_output_mi_thunk:

insn = emit_call_insn (gen_sibcall (fnaddr, const0_rtx, callee_cc)).

This hook is called after IRA/LRA so it cannot use pseudo registers.

When compiler tries to expand 'sibcall', it calls
riscv_legitimize_call_address and 'fnaddr'

is not a legal call_insn_operand. Then, the address goes a
long-distance trip to legitimize.

Here is a example that using output thunks

===
class base
{
  virtual int foo(int a);
};

class derived : public virtual base
{
  virtual int foo(int a);
};

int base::foo(int a) { return a; }
int derived::foo(int a) { return a; }

base* make() { return new derived; }
===

>>   riscv_in_small_data_p (const_tree x)

> How does large code model impact our ability to access small data
> through $gp?  Aren't they independent?

I thought constant pool entries may be put into the small data section.

But it seems I was wrong. Removed this part at V2 patch.


>> +  if ((offset & 3) && riscv_can_use_per_function_literal_pools_p ())
>> +ASM_OUTPUT_ALIGN (f, 2);
>> +}
> So the comment implies you're aligning the section.  If that were the
> case, then why doesn't the function alignment come from
> FUNCTION_BOUNDARY when we first start emitting the function?

> Or is it the case that the comment is incorrect and you've actually got
> mixed code/rodata?

I forgot there is an alignment from FUNCTION_BOUNDARY.  Removed this
part at V2 patch.

>> +(define_insn "*large_load_address"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +(mem:DI (match_operand 1 "pcrel_symbol_operand" "")))]
>> +  "TARGET_64BIT && riscv_cmodel == CM_LARGE"
>> +  "ld\t%0,%1"
>> +  [(set_attr "type" "load")
>> +   (set (attr "length") (const_int 8))])
> So it would seem like you're relying on the assembler to expand the ld?
> Is there any reasonable way to expose this properly to the compiler?
> I'd start by emitting the right instructions in the template.  Once
> that's working, then we could look to split the components into distinct
> insns.

> I also worry that we've got a mem->reg move instruction that is not
> implemented in the standard movXX patterns.  Traditionally that's been a
> recipe for problems.  It was certainly a requirement for reload, but I
> don't know offhand if it's a hard requirement for LRA.

> Can you try to merge that in with the standard movdi pattern?

This is a tricky idea for loading the constant pool anchor.

The idea comes from the pattern '*local_pic_load'.

If removing this rtl pattern, GCC will generate 'lla a5,.LC0 + ld
a0,0(a5)' to get the anchor address.

But with this pattern, GCC can generate 'ld a0,.LC0'.

And the code generation is easier for the linker to relax.


> Overall it looks pretty good.   Does Andestech have a copyright
> assignment in place?  Or are you contributing under the DCO rule?

As Kito

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>   Thanks so much for your comments.
>
> 在 2023/11/9 19:41, Richard Biener 写道:
> > I'm not sure if the testcase is valid though?
> >
> > @defbuiltin{{void} __builtin_return (void *@var{result})}
> > This built-in function returns the value described by @var{result} from
> > the containing function.  You should specify, for @var{result}, a value
> > returned by @code{__builtin_apply}.
> > @enddefbuiltin
> >
> > I don't see __builtin_apply being used here?
>
> The prototype of the test case is from "__objc_block_forward" in
> libobjc/sendmsg.c.
>
>   void *args, *res;
>
>   args = __builtin_apply_args ();
>   res = __objc_forward (rcv, op, args);
>   if (res)
> __builtin_return (res);
>   else
> ...
>
> The __builtin_apply_args puts the return values on stack by the alignment.
> But the forward function can do anything and return a void* pointer.
> IMHO the alignment might be broken. So I just simplified it to use a
> void* pointer as the input argument of  "__builtin_return" and skip
> "__builtin_apply_args".

But doesn't __objc_forward then break the contract between
__builtin_apply_args and __builtin_return?

That said, __builtin_return is a very special function, it's not supposed
to deal with what you are fixing.  At least I think so.

IMHO the bug is in __objc_block_forward.

Richard.

>
> Thanks
> Gui Haochen

[PATCH] vect: Look through pattern stmt in fold_left_reduction.

2023-11-10 Thread Robin Dapp

Hi,

more fallout from the COND_OP change was shown in PR112464.

It appears as if we "look through" a statement pattern in
vect_finish_replace_stmt but not before when we replace the newly
created vector statement's lhs.  Then the lhs is the statement pattern's
lhs while in vect_finish_replace_stmt we assert that it's from the
statement the pattern replaced.

This patch uses vect_orig_stmt on the scalar destination's definition
so the replaced statement is used everywhere.

Bootstrapped and regtested on x86 and aarch64, riscv testsuite
unchanged.

Regards
 Robin

gcc/ChangeLog:

PR tree-optimization/112464

* tree-vect-loop.cc (vectorize_fold_left_reduction): Use
vect_orig_stmt on scalar_dest_def_info.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112464.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr112464.c | 10 ++
 gcc/tree-vect-loop.cc|  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112464.c

diff --git a/gcc/testsuite/gcc.target/i386/pr112464.c 
b/gcc/testsuite/gcc.target/i386/pr112464.c
new file mode 100644
index 000..2290c714986
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112464.c
@@ -0,0 +1,10 @@
+/* { dg-do compile }  */
+/* { dg-options "-Ofast -mavx512dq -ftrapv" } */
+
+long *e;
+int n, i, err;
+void fn() {
+  for (; i < n; i++)
+if (e[i])
+  err++;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8abc1937d74..f61adad1ca3 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7087,7 +7087,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 opmask, &vec_opmask);
 }
 
-  gimple *sdef = scalar_dest_def_info->stmt;
+  gimple *sdef = vect_orig_stmt (scalar_dest_def_info)->stmt;
   tree scalar_dest = gimple_get_lhs (sdef);
   tree scalar_type = TREE_TYPE (scalar_dest);
   tree reduc_var = gimple_phi_result (reduc_def_stmt);
-- 
2.41.0

Re: [PATCH] Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]

2023-11-10 Thread Jakub Jelinek

On Fri, Nov 10, 2023 at 08:09:26AM +, Richard Biener wrote:
> > The following patch adds 6 new type-generic builtins,
> > __builtin_clzg
> > __builtin_ctzg
> > __builtin_clrsbg
> > __builtin_ffsg
> > __builtin_parityg
> > __builtin_popcountg
> > The g at the end stands for generic because the unsuffixed variant
> > of the builtins already have unsigned int or int arguments.
> > 
> > The main reason to add these is to support arbitrary unsigned (for
> > clrsb/ffs signed) bit-precise integer types and also __int128 which
> > wasn't supported by the existing builtins, so that e.g. 
> > type-generic functions could then support not just bit-precise unsigned
> > integer type whose width matches a standard or extended integer type,
> > but others too.
> > 
> > None of these new builtins promote their first argument, so the argument
> > can be e.g. unsigned char or unsigned short or unsigned __int20 etc.
> 
> But is that a good idea?  Is that how type generic functions work in C?

Most current type generic functions deal just with floating point args and
don't promote there float to double as the ... promotions would normally do:
__builtin_signbit
__builtin_fpclassify
__builtin_isfinite
__builtin_isinf_sign
__builtin_isinf
__builtin_isnan
__builtin_isnormal
__builtin_isgreater
__builtin_isgreaterequal
__builtin_isless
__builtin_islessequal
__builtin_islessgreater
__builtin_isunordered
__builtin_iseqsig
__builtin_issignaling

__builtin_clear_padding is uninteresting, because the argument must be a
pointer.

Then
__builtin_add_overflow
__builtin_sub_overflow
__builtin_mul_overflow
do promote the first 2 arguments, but that doesn't really matter, because
all we care about is the argument values, not their type.

And finally
__builtin_add_overflow_p
__builtin_sub_overflow_p
__builtin_mul_overflow_p
do promote the first 2 arguments, and don't promote the third one, which is
the only one where we care about the type, so that is the behavior that
I've used also for the new builtins.  I think if we added e.g.
__builtin_classify_type now and not more than 3 decades ago it would behave
like that too.
Only not promoting the argument will make it directly usable in the
stdc_leading_zeros, stdc_leading_ones, stdc_trailing_zeros, stdc_trailing_ones,
stdc_first_leading_zero, ..., stdc_count_zeros, stdc_count_ones, ...
C23 stdbit.h type-generic macros, otherwise one would need to play with
_Generic and special-case there unsigned char and unsigned short (which
normally promote to int), but e.g. unsigned _BitInt(8) doesn't.
I expect Joseph will have compatibility version of the macro for when these
builtins aren't supported, but given the standard says that {un,}signed _BitInt
with width matching standard/extended integer types other than bool needs to
be handled also, either it will not use _Generic at all and just go after
sizeof (argument), or maybe will use _Generic and for the default case will
go after sizeof.  Seems clang returns -1 for __builtin_classify_type (0uwb)
rather than 18 GCC returns, so one can't portably distinguish bitints.

> I think it introduces non-obvious/unexpected behavior in user code.

If we keep the patch behavior of requiring unsigned
standard/extended/bit-precise types other than bool for the
clz/ctz/parity/popcount cases, the choice is between always erroring on
__builtin_clzg ((unsigned char) 1) - where the promoted argument is signed,
and accepting it as unsigned char case without promotion, so I think users
would be more confused to see an error on that.
If we'd switch to accepting both signed and unsigned
standard/extended/bit-precise integer types other than bool for all the
builtins, whether we promote or not doesn't matter for ctz/parity/popcount
but does for clz.
The clrsb/ffs cases accept signed argument on the other side, so both
promoted and unpromoted argument would mean something and be accepted,
in the ffs case it again doesn't really matter for the result, but for clrsb
is significant.
Would it help to just document it that the argument isn't promoted?

We document that for __builtin_*_overflow_p:
"The value of the third argument is ignored, just the side effects in the third 
argument
are evaluated, and no integral argument promotions are performed on the last
argument."

> If people do not want to "compensate" for this maybe insted also add
> __builtin_*{8,16} (like we have for the bswap variants)?

Note, clang has __builtin_clzs and __builtin_ctzs for unsigned short (but
not the other 4), but nothing for the unsigned char cases.
I was just hoping we don't need to add further variants if we have
type-generic ones.

> Otherwise this looks reasonable.  I'm not sure why we need separate
> CFN_CLZ and CFN_BUILT_IN_CLZG?  (why CFN_BUILT_IN_CLZG and not CFN_CLZG?)
> That is, I'm confused about
> 
>  CASE_CFN_CLRSB:
> +case CFN_BUILT_IN_CLRSBG:
> 
> why does CASE_CFN_CLRSB not include CLRSBG?  It includes IFN_CLRSB, no?
> And IFN_CLRSB already has the two and one arg case and

Re: [RFC][V2] RISC-V: Support -mcmodel=large.

2023-11-10 Thread KuanLin Chen

Sorry. It missed a semicolon in the previos patch. Please find the new one
in the attachment. Thanks.


0001-RISC-V-Support-mcmodel-large.patch
Description: Binary data

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-10 Thread Richard Biener

On Wed, Nov 8, 2023 at 9:22 AM Hongtao Liu  wrote:
>
> On Wed, Nov 8, 2023 at 3:53 PM Richard Biener
>  wrote:
> >
> > On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu  wrote:
> > >
> > > On Tue, Nov 7, 2023 at 10:34 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt  
> > > > > > wrote:
> > > > > > >
> > > > > > > analyze_and_compute_bitop_with_inv_effect assumes the first 
> > > > > > > operand is
> > > > > > > loop invariant which is not the case when it's INTEGER_CST.
> > > > > > >
> > > > > > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > So this addresses a missed optimization, right?  It seems to me that
> > > > > > even with two SSA names we are only "lucky" when rhs1 is the 
> > > > > > invariant
> > > > > > one.  So instead of swapping this way I'd do
> > > > > Yes, it's a miss optimization.
> > > > > And I think expr_invariant_in_loop_p (loop, match_op[1]) should be
> > > > > enough, if match_op[1] is a loop invariant.it must be false for the
> > > > > below conditions(there couldn't be any header_phi from its
> > > > > definition).
> > > >
> > > > Yes, all I said is that when you now care for op1 being INTEGER_CST
> > > > it could also be an invariant SSA name and thus only after swapping 
> > > > op0/op1
> > > > we could have a successful match, no?
> > > Sorry, the commit message is a little bit misleading.
> > > At first, I just wanted to handle the INTEGER_CST case (with TREE_CODE
> > > (match_op[1]) == INTEGER_CST), but then I realized that this could
> > > probably be extended to the normal SSA_NAME case as well, so I used
> > > expr_invariant_in_loop_p, which should theoretically be able to handle
> > > the SSA_NAME case as well.
> > >
> > > if (expr_invariant_in_loop_p (loop, match_op[1])) is true, w/o
> > > swapping it must return NULL_TREE for below conditions.
> > > if (expr_invariant_in_loop_p (loop, match_op[1])) is false, w/
> > > swapping it must return NULL_TREE too.
> > > So it can cover the both cases you mentioned, no need for a loop to
> > > iterate 2 match_ops for all conditions.
> >
> > Sorry if it appears we're going in circles ;)
> >
> > > 3692  if (TREE_CODE (match_op[1]) != SSA_NAME
> > > 3693  || !expr_invariant_in_loop_p (loop, match_op[0])
> > > 3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT 
> > > (match_op[1])))
> >
> > but this only checks match_op[1] (an SSA name at this point) for being 
> > defined
> > by the header PHI.  What if expr_invariant_in_loop_p (loop, mach_op[1])
> > and header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[0]))
> > which I think can happen when both ops are SSA name?
> The whole condition is like
>
> 3692  if (TREE_CODE (match_op[1]) != SSA_NAME
> 3693  || !expr_invariant_in_loop_p (loop, match_op[0])
> 3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT 
> (match_op[1])))
> 3695  || gimple_bb (header_phi) != loop->header  - This would
> be true if match_op[1] is SSA_NAME and expr_invariant_in_loop_p

But it could be expr_invariant_in_loop_p (match_op[1]) and
header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[0]))

all I say is that for two SSA names we could not match the condition
(aka not fail)
when we swap op0/op1.  Not only when op1 is INTEGER_CST.

> 3696  || gimple_phi_num_args (header_phi) != 2)
>
> If expr_invariant_in_loop_p (loop, mach_op[1]) is true and it's an SSA_NAME
> according to code in expr_invariant_in_loop_p, def_bb of gphi is
> either NULL or not belong to this loop, either case will make will
> make gimple_bb (header_phi) != loop->header true.
>
> 1857  if (TREE_CODE (expr) == SSA_NAME)
> 1858{
> 1859  def_bb = gimple_bb (SSA_NAME_DEF_STMT (expr));
> 1860  if (def_bb
> 1861  && flow_bb_inside_loop_p (loop, def_bb))  -- def_bb is
> NULL or it doesn't belong to the loop
> 1862return false;
> 1863
> 1864  return true;
> 1865}
> 1866
> 1867  if (!EXPR_P (expr))
>
> >
> > The only canonicalization we have is that constant operands are put second 
> > so
> > it would have been more natural to write the matching with the other operand
> > order (but likely you'd have been unlucky for the existing testcases then).
> >
> > > 3695  || gimple_bb (header_phi) != loop->header
> > > 3696  || gimple_phi_num_args (header_phi) != 2)
> > > 3697return NULL_TREE;
> > > 3698
> > > 3699  if (PHI_ARG_DEF_FROM_EDGE (header_phi, loop_latch_edge (loop)) != 
> > > phidef)
> > > 3700return NULL_TREE;
> > >
> > >
> > > >
> > > > > >
> > > > > >  unsigned i;
> > > > > >  for (i = 0; i < 2; ++i)
> > > > > >if (TREE_CODE (match_op[i]) == SSA_NAME
> > > > > >&& ...)
> > > > > > break; /* found! */
> > > > > >
> > > > > >   if (i == 2)
> > > > > > return NULL_TREE;
> > > > > >   if (i == 0

Re: [PATCH] Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, Jakub Jelinek wrote:

> On Fri, Nov 10, 2023 at 08:09:26AM +, Richard Biener wrote:
> > > The following patch adds 6 new type-generic builtins,
> > > __builtin_clzg
> > > __builtin_ctzg
> > > __builtin_clrsbg
> > > __builtin_ffsg
> > > __builtin_parityg
> > > __builtin_popcountg
> > > The g at the end stands for generic because the unsuffixed variant
> > > of the builtins already have unsigned int or int arguments.
> > > 
> > > The main reason to add these is to support arbitrary unsigned (for
> > > clrsb/ffs signed) bit-precise integer types and also __int128 which
> > > wasn't supported by the existing builtins, so that e.g. 
> > > type-generic functions could then support not just bit-precise unsigned
> > > integer type whose width matches a standard or extended integer type,
> > > but others too.
> > > 
> > > None of these new builtins promote their first argument, so the argument
> > > can be e.g. unsigned char or unsigned short or unsigned __int20 etc.
> > 
> > But is that a good idea?  Is that how type generic functions work in C?
> 
> Most current type generic functions deal just with floating point args and
> don't promote there float to double as the ... promotions would normally do:
> __builtin_signbit
> __builtin_fpclassify
> __builtin_isfinite
> __builtin_isinf_sign
> __builtin_isinf
> __builtin_isnan
> __builtin_isnormal
> __builtin_isgreater
> __builtin_isgreaterequal
> __builtin_isless
> __builtin_islessequal
> __builtin_islessgreater
> __builtin_isunordered
> __builtin_iseqsig
> __builtin_issignaling
> 
> __builtin_clear_padding is uninteresting, because the argument must be a
> pointer.
> 
> Then
> __builtin_add_overflow
> __builtin_sub_overflow
> __builtin_mul_overflow
> do promote the first 2 arguments, but that doesn't really matter, because
> all we care about is the argument values, not their type.
> 
> And finally
> __builtin_add_overflow_p
> __builtin_sub_overflow_p
> __builtin_mul_overflow_p
> do promote the first 2 arguments, and don't promote the third one, which is
> the only one where we care about the type, so that is the behavior that
> I've used also for the new builtins.  I think if we added e.g.
> __builtin_classify_type now and not more than 3 decades ago it would behave
> like that too.
> Only not promoting the argument will make it directly usable in the
> stdc_leading_zeros, stdc_leading_ones, stdc_trailing_zeros, 
> stdc_trailing_ones,
> stdc_first_leading_zero, ..., stdc_count_zeros, stdc_count_ones, ...
> C23 stdbit.h type-generic macros, otherwise one would need to play with
> _Generic and special-case there unsigned char and unsigned short (which
> normally promote to int), but e.g. unsigned _BitInt(8) doesn't.

googling doesn't find me stdc_leading_zeros - are those supposed to work
for non-_BitInt types as well and don't promote the argument in that
case?

If we are spcificially targeting those I wonder why we don't name
the builtins after those?  But yes, if promotion is undesirable
for implementing them then I agree.  IIRC _BitInt(n) is not subject
to integer promotions.

> I expect Joseph will have compatibility version of the macro for when these
> builtins aren't supported, but given the standard says that {un,}signed 
> _BitInt
> with width matching standard/extended integer types other than bool needs to
> be handled also, either it will not use _Generic at all and just go after
> sizeof (argument), or maybe will use _Generic and for the default case will
> go after sizeof.  Seems clang returns -1 for __builtin_classify_type (0uwb)
> rather than 18 GCC returns, so one can't portably distinguish bitints.
> 
> > I think it introduces non-obvious/unexpected behavior in user code.
> 
> If we keep the patch behavior of requiring unsigned
> standard/extended/bit-precise types other than bool for the
> clz/ctz/parity/popcount cases, the choice is between always erroring on
> __builtin_clzg ((unsigned char) 1) - where the promoted argument is signed,
> and accepting it as unsigned char case without promotion, so I think users
> would be more confused to see an error on that.
> If we'd switch to accepting both signed and unsigned
> standard/extended/bit-precise integer types other than bool for all the
> builtins, whether we promote or not doesn't matter for ctz/parity/popcount
> but does for clz.
> The clrsb/ffs cases accept signed argument on the other side, so both
> promoted and unpromoted argument would mean something and be accepted,
> in the ffs case it again doesn't really matter for the result, but for clrsb
> is significant.
> Would it help to just document it that the argument isn't promoted?
> 
> We document that for __builtin_*_overflow_p:
> "The value of the third argument is ignored, just the side effects in the 
> third argument
> are evaluated, and no integral argument promotions are performed on the last
> argument."
> 
> > If people do not want to "compensate" for this maybe insted also add
> > __builtin_*{8,16} (like w

[PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-10 Thread HAO CHEN GUI

Hi,
  Originally 16-byte memory to memory is expanded via pattern.
expand_block_move does an optimization on P8 LE to leverage V2DI reversed
load/store for memory to memory move. Now it's done by 16-byte by pieces
move and the optimization is lost. This patch adds an insn_and_split
pattern to retake the optimization.

  Compared to the previous version, the main change is to remove volatile
memory operands check from the insn condition as it's no need.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Fix regression cases caused 16-byte by pieces move

The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern.  expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DI load/store.  This patch creates an insn_and_split pattern to
retake the optimization.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..26fa32829af 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN
+   && TARGET_VSX
+   && !TARGET_P9_VECTOR
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding


Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.


Yes, I think the init-regs should be enhanced to reduce unnecessary 
initialization. My previous internal patchs did this in a separate 
patch. Maybe I should split the live_subreg problem out of the second 
patch and not couple it with these patches. That way it can be reviewed 
separately.



I agree with Richi of course that compile-time is a concern.
The patch seems to add quite a bit of new data to ira_allocno,
but perhaps that's OK.  ira_object + ira_allocno is already quite big.

However:

@@ -387,8 +398,8 @@ struct ira_allocno
/* An array of structures describing conflict information and live
   ranges for each object associated with the allocno.  There may be
   more than one such object in cases where the allocno represents a
- multi-word register.  */
-  ira_object_t objects[2];
+ multi-hardreg pesudo.  */
+  std::vector objects;
/* Registers clobbered by intersected calls.  */
 HARD_REG_SET crossed_calls_clobbered_regs;
/* Array of usage costs (accumulated and the one updated during

adds an extra level of indirection (and separate extra storage) for
every allocno, not just multi-hardreg ones.  It'd be worth optimising
the data structures' representation of single-hardreg pseudos even if
that slows down the multi-hardreg code, since single-hardreg pseudos are
so much more common.  And the different single-hardreg and multi-hardreg
representations could be hidden behind accessors, to make life easier
for consumers.  (Of course, performance of the accessors is also then
an issue. :))


Okay, I'll try. Thank you so much.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding





On 2023/11/8 11:55, juzhe.zh...@rivai.ai wrote:

Thanks Lehua.

Appreciate for supporting subreg liveness tracking with tons of work.

A nit comments, I think you should mention these following PRs:

106694
89967
106146
99161

No need send V2 now. You can send V2 after Richard and Vlad reviewed.

Okay, thanks :)

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] vect: Use statement vectype for conditional mask.

2023-11-10 Thread Richard Biener

On Wed, Nov 8, 2023 at 5:18 PM Robin Dapp  wrote:
>
> Hi,
>
> as Tamar reported in PR112406 we still ICE on aarch64 in SPEC2017
> when creating COND_OPs in ifcvt.
>
> The problem is that we fail to deduce the mask's type from the statement
> vectype and then end up with a non-matching mask in expand.  This patch
> checks if the current op is equal to the mask operand and, if so, uses
> the truth type from the stmt_vectype.  Is that a valid approach?
>
> Bootstrapped and regtested on aarch64, x86 is running.
>
> Besides, the testcase is Tamar's reduced example, originally from
> SPEC.  I hope it's ok to include it as is (as imagick is open source
> anyway).

For the fortran testcase we don't even run into this but hit an
internal def and assert on

  gcc_assert (STMT_VINFO_VEC_STMTS (def_stmt_info).length () == ncopies);

I think this shows missing handling of .COND_* in the bool pattern recognition
as we get the 'bool' condition as boolean data vector rather than a mask.  The
same is true for the testcase with the invariant condition.  This causes us to
select the wrong vector type here.  The "easiest" might be to look at
how COND_EXPR is handled in vect_recog_bool_pattern and friends and
handle .COND_* IFNs the same for the mask operand.

Richard.

> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR middle-end/112406
>
> * tree-vect-stmts.cc (vect_get_vec_defs_for_operand): Handle
> masks of conditional ops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr112406.c: New test.
> ---
>  gcc/testsuite/gcc.dg/pr112406.c | 37 +
>  gcc/tree-vect-stmts.cc  | 20 +-
>  2 files changed, 56 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr112406.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr112406.c b/gcc/testsuite/gcc.dg/pr112406.c
> new file mode 100644
> index 000..46459c68c4a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr112406.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-options "-march=armv8-a+sve -w -Ofast" } */
> +
> +typedef struct {
> +  int red
> +} MagickPixelPacket;
> +
> +GetImageChannelMoments_image, GetImageChannelMoments_image_0,
> +GetImageChannelMoments___trans_tmp_1, GetImageChannelMoments_M11_0,
> +GetImageChannelMoments_pixel_3, GetImageChannelMoments_y,
> +GetImageChannelMoments_p;
> +
> +double GetImageChannelMoments_M00_0, GetImageChannelMoments_M00_1,
> +GetImageChannelMoments_M01_1;
> +
> +MagickPixelPacket GetImageChannelMoments_pixel;
> +
> +SetMagickPixelPacket(int color, MagickPixelPacket *pixel) {
> +  pixel->red = color;
> +}
> +
> +GetImageChannelMoments() {
> +  for (; GetImageChannelMoments_y; GetImageChannelMoments_y++) {
> +SetMagickPixelPacket(GetImageChannelMoments_p,
> + &GetImageChannelMoments_pixel);
> +GetImageChannelMoments_M00_1 += GetImageChannelMoments_pixel.red;
> +if (GetImageChannelMoments_image)
> +  GetImageChannelMoments_M00_1++;
> +GetImageChannelMoments_M01_1 +=
> +GetImageChannelMoments_y * GetImageChannelMoments_pixel_3;
> +if (GetImageChannelMoments_image_0)
> +  GetImageChannelMoments_M00_0++;
> +GetImageChannelMoments_M01_1 +=
> +GetImageChannelMoments_y * GetImageChannelMoments_p++;
> +  }
> +  GetImageChannelMoments___trans_tmp_1 = atan(GetImageChannelMoments_M11_0);
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 65883e04ad7..6793b01bf44 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1238,10 +1238,28 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
>tree vector_type;
>
> +  /* For a COND_OP the mask operand's type must not be deduced from the
> +scalar type but from the statement's vectype.  */
> +  bool use_stmt_vectype = false;
> +  gcall *call;
> +  if ((call = dyn_cast  (STMT_VINFO_STMT (stmt_vinfo)))
> + && gimple_call_internal_p (call))
> +   {
> + internal_fn ifn = gimple_call_internal_fn (call);
> + int mask_idx = -1;
> + if (ifn != IFN_LAST
> + && (mask_idx = internal_fn_mask_index (ifn)) != -1)
> +   {
> + tree maskop = gimple_call_arg (call, mask_idx);
> + if (op == maskop)
> +   use_stmt_vectype = true;
> +   }
> +   }
> +
>if (vectype)
> vector_type = vectype;
>else if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
> -  && VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
> +  && (use_stmt_vectype || VECTOR_BOOLEAN_TYPE_P (stmt_vectype)))
> vector_type = truth_type_for (stmt_vectype);
>else
> vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE 
> (op));
> --
> 2.41.0

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding


Hi Jeff,

On 2023/11/9 3:13, Jeff Law wrote:
The other thing to ponder.  Jivan and I have been banging on Joern's 
sub-object tracking bits for a totally different problem in the RISC-V 
space.  But there may be some overlap.


Essentially Joern's code tracks liveness for a few chunks in registers. 
bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
propagating liveness from the destination through to the sources.  SO 
for example if we have


(set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))

If we had previously determined that only bits 0..15 were live in DEST, 
then we'll propagate that into the source registers.


The goal is to ultimately transform something like

(set (dest:mode) (any_extend:mode (reg:narrower_mode)))

into

(set (dest:mode) (subreg:mode (reg:narrower_mode)))

Where the latter typically will get simplified and propagated away.


Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
it from a correctness standpoint.  It'll also need the usual cleanups.


Anyway, point being I think it'll be worth looking at Lehua's bits and 
Joern's bits to see if there's anything that can and should be shared. 
Given I'm getting fairly familiar with Joern's bits, that likely falls 
to me.


Maybe subreg live range track classes (in patch 2) could be shared. 
Including range's UNION, Diff, and other operations should be similar. 
I'll see if I'm going to extract a separate patch to review this part. 
What do you think?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]

2023-11-10 Thread Jakub Jelinek

On Fri, Nov 10, 2023 at 09:19:14AM +, Richard Biener wrote:
> > Only not promoting the argument will make it directly usable in the
> > stdc_leading_zeros, stdc_leading_ones, stdc_trailing_zeros, 
> > stdc_trailing_ones,
> > stdc_first_leading_zero, ..., stdc_count_zeros, stdc_count_ones, ...
> > C23 stdbit.h type-generic macros, otherwise one would need to play with
> > _Generic and special-case there unsigned char and unsigned short (which
> > normally promote to int), but e.g. unsigned _BitInt(8) doesn't.
> 
> googling doesn't find me stdc_leading_zeros - are those supposed to work
> for non-_BitInt types as well and don't promote the argument in that
> case?

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
is the C23 draft the C23 wiki points at.

E.g.

#include 
unsigned int stdc_leading_zeros_uc(unsigned char value);
unsigned int stdc_leading_zeros_us(unsigned short value);
unsigned int stdc_leading_zeros_ui(unsigned int value);
unsigned int stdc_leading_zeros_ul(unsigned long value);
unsigned int stdc_leading_zeros_ull(unsigned long long value);
generic_return_type stdc_leading_zeros(generic_value_type value);

Returns
Returns the number of consecutive 0 bits in value, starting from
the most significant bit.
The type-generic function (marked by its generic_value_type argument)
returns the appropriate value based on the type of the input value,
so long as it is a:
— standard unsigned integer type, excluding bool;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard
  or extended integer type, excluding bool.
The generic_return_type type shall be a suitable large unsigned integer
type capable of representing the computed result.

My understanding is that because unsigned char and unsigned short
are standard unsigned integer types, it ought to support those too,
not diagnose them as invalid, and shall return number of consecutive 0 bits
in them (which is something different between value for unsigned char
and int unless unsigned char has the same precision as int).

> If we are spcificially targeting those I wonder why we don't name
> the builtins after those?  But yes, if promotion is undesirable
> for implementing them then I agree.  IIRC _BitInt(n) is not subject
> to integer promotions.

Because the builtins are just something matching in behavior to existing
builtins which can be used for those macros, not exact implementation of
those.  E.g. while
#define stdc_leading_zeros(value) \
((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) 
~(__typeof (value)) 0)))
implements (I believe) stdc_leading_zeros above, the second argument to the
builtin could be something different needed for other cases, e.g. -1 if one
wants to implement ffs-like behavior on unsigned argument, and e.g.
stdc_leading_ones would be implemented probably like:
#define stdc_leading_ones(value) \
((unsigned int) __builtin_clzg ((__typeof (value)) ~(value), 
__builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
Or
#define stdc_first_trailing_one(value) \
((unsigned int) (__builtin_ctzg (value, -1) + 1))
vs.
#define stdc_trailing_zeros(value) \
((unsigned int) __builtin_ctzg (value, __builtin_popcountg ((__typeof (value)) 
~(__typeof (value)) 0)))
No need to add 14 new type-generic builtins, we can just add the building
blocks to implement those.

Jakub

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Richard Biener

On Thu, 9 Nov 2023, ??? wrote:

> Hi, Richard.
> 
> >> I think it would be better to split out building a tree from VF from both
> >> arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P.
> 
> I am trying to split out building tree from both arms as you suggested..
> Could you take a look the following codes ?
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..24a86187d11 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10315,19 +10315,47 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>/* iv_loop is the loop to be vectorized. Generate:
> vec_step = [VF*S, VF*S, VF*S, VF*S]  */
>gimple_seq seq = NULL;
> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> +  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
>   {
> -   expr = build_int_cst (integer_type_node, vf);
> -   expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> +   /* When we're using loop_len produced by SELEC_VL, the non-final
> +  iterations are not always processing VF elements.  So vectorize
> +  induction variable instead of
> +
> +_21 = vect_vec_iv_.6_22 + { VF, ... };
> +
> +  We should generate:
> +
> +   _35 = .SELECT_VL (ivtmp_33, VF);
> +   vect_cst__22 = [vec_duplicate_expr] _35;
> +   _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
> +   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +   tree len
> + = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
> +   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
> +  unshare_expr (len)),
> +&seq, true, NULL_TREE);
>   }
>else
> - expr = build_int_cst (TREE_TYPE (step_expr), vf);
> + {
> +   bool float_p = SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr));
> +   expr = build_int_cst (float_p ? integer_type_node
> + : TREE_TYPE (step_expr),
> + vf);
> +   if (float_p)
> + expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> + }
> +

I meant you keep the existing flow in the function, specifically
I think you should handle SCALAR_FLOAT_TYPE_P like it was previously
handled, just build 'vf' in the dynamic way.

>new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr),
>  expr, step_expr);
>if (seq)
>   {
> -   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> -   gcc_assert (!new_bb);
> +   if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
> +   else
> + {
> +   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> +   gcc_assert (!new_bb);
> + }
>   }
>  }
>  
> @@ -10335,9 +10363,9 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>gcc_assert (CONSTANT_CLASS_P (new_name)
> || TREE_CODE (new_name) == SSA_NAME);
>new_vec = build_vector_from_val (step_vectype, t);
> -  vec_step = vect_init_vector (loop_vinfo, stmt_info,
> -new_vec, step_vectype, NULL);
> -
> +  vec_step
> += vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype,
> + LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL);

again this makes the flow hard to follow.  I suppose refactoring this
overall to

if (nested_in_vect_loop)
...
else if (LOOP_VINFO_USING_SELECT_VL_P (..))
...
else
...

and duplicate this tail into the cases makes it easier to follow.

For nested_in_vect_loop we never have LOOP_VINFO_USING_SELECT_VL_P?

Richard.


> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-09 20:16
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford; rguenther; kito.cheng; kito.cheng
> Subject: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization 
> for RVV
> On Wed, Nov 8, 2023 at 11:53?AM Juzhe-Zhong  wrote:
> >
> > PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
> >
> > SELECT_VL result is not necessary always VF in non-final iteration.
> >
> > Current GIMPLE IR is wrong:
> >
> > # vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)>
> > ...
> > _24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... };
> >
> > After this patch which is correct for SELECT_VL:
> >
> > # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
> > ...
> > _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
> > _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
> >
> > kito, could you give more explanation ?
> >
> > PR middle/112438
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vectorizable_induction): Fix bug.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/pr112438.c: New test.
> >
> > ---
> >  .../gcc.target/riscv/rvv/autovec/pr112438.c   | 35 +
> >  gcc/tree-vect-loop.cc | 39 +++
> >  2 files changed, 67 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
> > b/gcc/testsuite/gc

RE: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Kyrylo Tkachov

Hi Wilco,

> -Original Message-
> From: Wilco Dijkstra 
> Sent: Monday, November 6, 2023 12:12 PM
> To: GCC Patches 
> Cc: Richard Sandiford ; Richard Earnshaw
> 
> Subject: Re: [PATCH] AArch64: Cleanup memset expansion
> 
> ping
> 
> Cleanup memset implementation.  Similar to memcpy/memmove, use an
> offset and
> bytes throughout.  Simplify the complex calculations when optimizing for size
> by using a fixed limit.
> 
> Passes regress/bootstrap, OK for commit?
> 

This looks like a good cleanup but I have a question...

> gcc/ChangeLog:
>     * config/aarch64/aarch64.cc (aarch64_progress_pointer): Remove
> function.
>     (aarch64_set_one_block_and_progress_pointer): Simplify and clean up.
>     (aarch64_expand_setmem): Clean up implementation, use byte offsets,
>     simplify size calculation.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8..578a253d6e0e133e1959255
> 3fc873b3e73f9f218 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25229,15 +25229,6 @@ aarch64_move_pointer (rtx pointer, poly_int64
> amount)
>  next, amount);
>  }
> 
> -/* Return a new RTX holding the result of moving POINTER forward by the
> -   size of the mode it points to.  */
> -
> -static rtx
> -aarch64_progress_pointer (rtx pointer)
> -{
> -  return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE
> (pointer)));
> -}
> -
>  /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
> 
>  static void
> @@ -25393,46 +25384,22 @@ aarch64_expand_cpymem (rtx *operands,
> bool is_memmove)
>    return true;
>  }
> 
> -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset
> where
> -   SRC is a register we have created with the duplicated value to be set.  */
> +/* Set one block of size MODE at DST at offset OFFSET to value in SRC.  */
>  static void
> -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
> -   machine_mode mode)
> -{
> -  /* If we are copying 128bits or 256bits, we can do that straight from
> - the SIMD register we prepared.  */
> -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> -    {
> -  mode = GET_MODE (src);
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, mode, 0);
> -  /* Emit the memset.  */
> -  emit_insn (aarch64_gen_store_pair (mode, *dst, src,
> -    aarch64_progress_pointer (*dst), 
> src));
> -
> -  /* Move the pointers forward.  */
> -  *dst = aarch64_move_pointer (*dst, 32);
> -  return;
> -    }
> -  if (known_eq (GET_MODE_BITSIZE (mode), 128))
> +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)
> +{
> +  /* Emit explict store pair instructions for 32-byte writes.  */
> +  if (known_eq (GET_MODE_SIZE (mode), 32))
>  {
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, GET_MODE (src), 0);
> -  /* Emit the memset.  */
> -  emit_move_insn (*dst, src);
> -  /* Move the pointers forward.  */
> -  *dst = aarch64_move_pointer (*dst, 16);
> +  mode = V16QImode;
> +  rtx dst1 = adjust_address (dst, mode, offset);
> +  rtx dst2 = adjust_address (dst, mode, offset + 16);
> +  emit_insn (aarch64_gen_store_pair (mode, dst1, src, dst2, src));
>    return;
>  }
> -  /* For copying less, we have to extract the right amount from src.  */
> -  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
> -
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, mode, 0);
> -  /* Emit the memset.  */
> -  emit_move_insn (*dst, reg);
> -  /* Move the pointer forward.  */
> -  *dst = aarch64_progress_pointer (*dst);
> +  if (known_lt (GET_MODE_SIZE (mode), 16))
> +    src = lowpart_subreg (mode, src, GET_MODE (src));
> +  emit_move_insn (adjust_address (dst, mode, offset), src);
>  }
> 
>  /* Expand a setmem using the MOPS instructions.  OPERANDS are the same
> @@ -25461,7 +25428,7 @@ aarch64_expand_setmem_mops (rtx *operands)
>  bool
>  aarch64_expand_setmem (rtx *operands)
>  {
> -  int n, mode_bits;
> +  int mode_bytes;
>    unsigned HOST_WIDE_INT len;
>    rtx dst = operands[0];
>    rtx val = operands[2], src;
> @@ -25474,104 +25441,70 @@ aarch64_expand_setmem (rtx *operands)
>    || (STRICT_ALIGNMENT && align < 16))
>  return aarch64_expand_setmem_mops (operands);
> 
> -  bool size_p = optimize_function_for_size_p (cfun);
> -
>    /* Default the maximum to 256-bytes when considering only libcall vs
>   SIMD broadcast sequence.  */
>    unsigned max_set_size = 256;
>    unsigned mops_threshold = aarch64_mops_memset_size_threshold;
> 
> +  /* Reduce the maximum size with -Os.  */
> +  if (optimize_function_for_size_p (cfun))
> +    max_set_size = 96;
> +

 This is a new "magic" number in this code. It lo

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread juzhe.zh...@rivai.ai

Hi, Richard.

>> For nested_in_vect_loop we never have LOOP_VINFO_USING_SELECT_VL_P?
Could you give me an example of nested loop ?
For now, I can't produce a case.

Thanks a lot for the comments, I will try to refactor as you suggested.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-10 17:46
To: 钟居哲
CC: richard.guenther; gcc-patches; richard.sandiford; kito.cheng; kito.cheng
Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable 
vectorization for RVV
On Thu, 9 Nov 2023, ??? wrote:
 
> Hi, Richard.
> 
> >> I think it would be better to split out building a tree from VF from both
> >> arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P.
> 
> I am trying to split out building tree from both arms as you suggested..
> Could you take a look the following codes ?
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..24a86187d11 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10315,19 +10315,47 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>/* iv_loop is the loop to be vectorized. Generate:
> vec_step = [VF*S, VF*S, VF*S, VF*S]  */
>gimple_seq seq = NULL;
> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> +  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
>   {
> -   expr = build_int_cst (integer_type_node, vf);
> -   expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> +   /* When we're using loop_len produced by SELEC_VL, the non-final
> +  iterations are not always processing VF elements.  So vectorize
> +  induction variable instead of
> +
> +_21 = vect_vec_iv_.6_22 + { VF, ... };
> +
> +  We should generate:
> +
> +   _35 = .SELECT_VL (ivtmp_33, VF);
> +   vect_cst__22 = [vec_duplicate_expr] _35;
> +   _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
> +   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +   tree len
> + = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
> +   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
> +  unshare_expr (len)),
> +&seq, true, NULL_TREE);
>   }
>else
> - expr = build_int_cst (TREE_TYPE (step_expr), vf);
> + {
> +   bool float_p = SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr));
> +   expr = build_int_cst (float_p ? integer_type_node
> + : TREE_TYPE (step_expr),
> + vf);
> +   if (float_p)
> + expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> + }
> +
 
I meant you keep the existing flow in the function, specifically
I think you should handle SCALAR_FLOAT_TYPE_P like it was previously
handled, just build 'vf' in the dynamic way.
 
>new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr),
>  expr, step_expr);
>if (seq)
>   {
> -   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> -   gcc_assert (!new_bb);
> +   if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
> +   else
> + {
> +   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> +   gcc_assert (!new_bb);
> + }
>   }
>  }
>  
> @@ -10335,9 +10363,9 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>gcc_assert (CONSTANT_CLASS_P (new_name)
> || TREE_CODE (new_name) == SSA_NAME);
>new_vec = build_vector_from_val (step_vectype, t);
> -  vec_step = vect_init_vector (loop_vinfo, stmt_info,
> -new_vec, step_vectype, NULL);
> -
> +  vec_step
> += vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype,
> + LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL);
 
again this makes the flow hard to follow.  I suppose refactoring this
overall to
 
if (nested_in_vect_loop)
...
else if (LOOP_VINFO_USING_SELECT_VL_P (..))
...
else
...
 
and duplicate this tail into the cases makes it easier to follow.
 
For nested_in_vect_loop we never have LOOP_VINFO_USING_SELECT_VL_P?
 
Richard.
 
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-09 20:16
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford; rguenther; kito.cheng; kito.cheng
> Subject: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization 
> for RVV
> On Wed, Nov 8, 2023 at 11:53?AM Juzhe-Zhong  wrote:
> >
> > PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
> >
> > SELECT_VL result is not necessary always VF in non-final iteration.
> >
> > Current GIMPLE IR is wrong:
> >
> > # vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)>
> > ...
> > _24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... };
> >
> > After this patch which is correct for SELECT_VL:
> >
> > # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
> > ...
> > _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
> > _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
> >
> > kito, could you give more explanation ?
> >
> > PR middle/112438
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vectorizable_induction): Fix bug.
> >

RE: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-11-10 Thread Kyrylo Tkachov

Hi Wilco,

> -Original Message-
> From: Wilco Dijkstra 
> Sent: Monday, November 6, 2023 12:13 PM
> To: GCC Patches ; Richard Sandiford
> 
> Cc: Kyrylo Tkachov 
> Subject: Re: [PATCH] libatomic: Improve ifunc selection on AArch64
> 
> 
> 
> ping
> 
> 
> From: Wilco Dijkstra
> Sent: 04 August 2023 16:05
> To: GCC Patches ; Richard Sandiford
> 
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] libatomic: Improve ifunc selection on AArch64
> 
> 
> Add support for ifunc selection based on CPUID register.  Neoverse N1
> supports
> atomic 128-bit load/store, so use the FEAT_USCAT ifunc like newer Neoverse
> cores.
> 
> Passes regress, OK for commit?
> 
> libatomic/
> config/linux/aarch64/host-config.h (ifunc1): Use CPUID in ifunc
> selection.
> 
> ---
> 
> diff --git a/libatomic/config/linux/aarch64/host-config.h
> b/libatomic/config/linux/aarch64/host-config.h
> index
> 851c78c01cd643318aaa52929ce4550266238b79..e5dc33c030a4bab927874fa6
> c69425db463fdc4b 100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -26,7 +26,7 @@
> 
>  #ifdef HWCAP_USCAT
>  # if N == 16
> -#  define IFUNC_COND_1 (hwcap & HWCAP_USCAT)
> +#  define IFUNC_COND_1 ifunc1 (hwcap)
>  # else
>  #  define IFUNC_COND_1  (hwcap & HWCAP_ATOMICS)
>  # endif
> @@ -50,4 +50,28 @@
>  #undef MAYBE_HAVE_ATOMIC_EXCHANGE_16
>  #define MAYBE_HAVE_ATOMIC_EXCHANGE_16   1
> 
> +#ifdef HWCAP_USCAT
> +
> +#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
> +#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
> +
> +static inline bool
> +ifunc1 (unsigned long hwcap)
> +{
> +  if (hwcap & HWCAP_USCAT)
> +return true;
> +  if (!(hwcap & HWCAP_CPUID))
> +return false;
> +
> +  unsigned long midr;
> +  asm volatile ("mrs %0, midr_el1" : "=r" (midr));

>From what I recall that midr_el1 register is emulated by the kernel and so 
>userspace software has to check that the kernel supports that emulation 
>through hwcaps before reading it.
According to 
https://www.kernel.org/doc/html/v5.8/arm64/cpu-feature-registers.html you need 
to check getauxval(AT_HWCAP) & HWCAP_CPUID) before doing that read.

Thanks,
Kyrill

> +
> +  /* Neoverse N1 supports atomic 128-bit load/store.  */
> +  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
> +return true;
> +
> +  return false;
> +}
> +#endif
> +
>  #include_next

Re: [PATCH] vect: Look through pattern stmt in fold_left_reduction.

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 10:06 AM Robin Dapp  wrote:
>
> Hi,
>
> more fallout from the COND_OP change was shown in PR112464.
>
> It appears as if we "look through" a statement pattern in
> vect_finish_replace_stmt but not before when we replace the newly
> created vector statement's lhs.  Then the lhs is the statement pattern's
> lhs while in vect_finish_replace_stmt we assert that it's from the
> statement the pattern replaced.
>
> This patch uses vect_orig_stmt on the scalar destination's definition
> so the replaced statement is used everywhere.
>
> Bootstrapped and regtested on x86 and aarch64, riscv testsuite
> unchanged.

OK

> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR tree-optimization/112464
>
> * tree-vect-loop.cc (vectorize_fold_left_reduction): Use
> vect_orig_stmt on scalar_dest_def_info.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr112464.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr112464.c | 10 ++
>  gcc/tree-vect-loop.cc|  2 +-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112464.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr112464.c 
> b/gcc/testsuite/gcc.target/i386/pr112464.c
> new file mode 100644
> index 000..2290c714986
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112464.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-Ofast -mavx512dq -ftrapv" } */
> +
> +long *e;
> +int n, i, err;
> +void fn() {
> +  for (; i < n; i++)
> +if (e[i])
> +  err++;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..f61adad1ca3 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7087,7 +7087,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>  opmask, &vec_opmask);
>  }
>
> -  gimple *sdef = scalar_dest_def_info->stmt;
> +  gimple *sdef = vect_orig_stmt (scalar_dest_def_info)->stmt;
>tree scalar_dest = gimple_get_lhs (sdef);
>tree scalar_type = TREE_TYPE (scalar_dest);
>tree reduc_var = gimple_phi_result (reduc_def_stmt);
> --
> 2.41.0

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread HAO CHEN GUI

Hi Richard,

在 2023/11/10 17:06, Richard Biener 写道:
> On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI  wrote:
>>
>> Hi Richard,
>>   Thanks so much for your comments.
>>
>> 在 2023/11/9 19:41, Richard Biener 写道:
>>> I'm not sure if the testcase is valid though?
>>>
>>> @defbuiltin{{void} __builtin_return (void *@var{result})}
>>> This built-in function returns the value described by @var{result} from
>>> the containing function.  You should specify, for @var{result}, a value
>>> returned by @code{__builtin_apply}.
>>> @enddefbuiltin
>>>
>>> I don't see __builtin_apply being used here?
>>
>> The prototype of the test case is from "__objc_block_forward" in
>> libobjc/sendmsg.c.
>>
>>   void *args, *res;
>>
>>   args = __builtin_apply_args ();
>>   res = __objc_forward (rcv, op, args);
>>   if (res)
>> __builtin_return (res);
>>   else
>> ...
>>
>> The __builtin_apply_args puts the return values on stack by the alignment.
>> But the forward function can do anything and return a void* pointer.
>> IMHO the alignment might be broken. So I just simplified it to use a
>> void* pointer as the input argument of  "__builtin_return" and skip
>> "__builtin_apply_args".
> 
> But doesn't __objc_forward then break the contract between
> __builtin_apply_args and __builtin_return?
> 
> That said, __builtin_return is a very special function, it's not supposed
> to deal with what you are fixing.  At least I think so.
> 
> IMHO the bug is in __objc_block_forward.

If so, can we document that the memory objects pointed by input argument of
__builtin_return have to be aligned? Then we can force the alignment in
__builtin_return. The customer function can do anything if gcc doesn't state
that.

Thanks
Gui Haochen

> 
> Richard.
> 
>>
>> Thanks
>> Gui Haochen

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread juzhe.zh...@rivai.ai

Hi, Richard.

I am sorry for bothering you. I am trying to understand what you mean.

Is this following codes that you want ?

  /* Create the vector that holds the step of the induction.  */
  if (nested_in_vect_loop)
{
  /* iv_loop is nested in the loop to be vectorized. Generate:
 vec_step = [S, S, S, S]  */
  new_name = step_expr;
  /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop.  */
  gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
  t = unshare_expr (new_name);
  gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
  new_vec = build_vector_from_val (step_vectype, t);
  vec_step
= vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, NULL);
}
  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
{
  /* When we're using loop_len produced by SELEC_VL, the non-final
 iterations are not always processing VF elements.  So vectorize
 induction variable instead of

   _21 = vect_vec_iv_.6_22 + { VF, ... };

 We should generate:

   _35 = .SELECT_VL (ivtmp_33, VF);
   vect_cst__22 = [vec_duplicate_expr] _35;
   _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
  tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
  expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
 unshare_expr (len)),
   &seq, true, NULL_TREE);
  gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
  t = unshare_expr (new_name);
  gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
  new_vec = build_vector_from_val (step_vectype, t);
  vec_step
= vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, &si);
}
  else
{
  /* iv_loop is the loop to be vectorized. Generate:
  vec_step = [VF*S, VF*S, VF*S, VF*S]  */
  gimple_seq seq = NULL;
  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
  expr = build_int_cst (integer_type_node, vf);
  expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
}
  else
expr = build_int_cst (TREE_TYPE (step_expr), vf);
  new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr),
   expr, step_expr);
  if (seq)
{
  new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
  gcc_assert (!new_bb);
}
  t = unshare_expr (new_name);
  gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
  new_vec = build_vector_from_val (step_vectype, t);
  vec_step
= vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, NULL);
}

It seems that this following codes:

  t = unshare_expr (new_name);
  gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
  new_vec = build_vector_from_val (step_vectype, t);
  vec_step
= vect_init_vector 

appears 3 times.  I am not sure whether it is the way you want?

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-10 17:46
To: 钟居哲
CC: richard.guenther; gcc-patches; richard.sandiford; kito.cheng; kito.cheng
Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable 
vectorization for RVV
On Thu, 9 Nov 2023, ??? wrote:
 
> Hi, Richard.
> 
> >> I think it would be better to split out building a tree from VF from both
> >> arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P.
> 
> I am trying to split out building tree from both arms as you suggested..
> Could you take a look the following codes ?
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..24a86187d11 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10315,19 +10315,47 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>/* iv_loop is the loop to be vectorized. Generate:
> vec_step = [VF*S, VF*S, VF*S, VF*S]  */
>gimple_seq seq = NULL;
> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> +  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
>   {
> -   expr = build_int_cst (integer_type_node, vf);
> -   expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> +   /* When we're using loop_len produced by SELEC_VL, the non-final
> +  iterations are not always processing VF elements.  So vectorize
> +  induction variable instead of
> +
> +_21 = vect_vec_iv_.6_22 + { VF, ... };
> +
> +  We should generate:
> +
> +   _35 = .SELECT_VL (ivtmp_33, VF);
> +   vect_cst__22 = [vec_duplicate_expr] _35;
> +   _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
> +   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +   tree len
> + = vect_get_loop_len

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
> support, it makes a lot easier to add them comparing to the August version.
> Detail for AVX10 is shown below:
>
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
> It describes the Intel Advanced Vector Extensions 10 Instruction Set
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
>
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
> It provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
>
> Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs in
> the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
> anything at first. At the end of the option handling, we will check whether
> the two bits are set. If AVX10.1-256 is set, we will set the AVX512 related
> ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.
>
> It means that AVX10 options will be separated from the existing AVX512 and the
> newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
> (enable/disable/set vector size) the AVX512 features underneath independently.
> If there’s potential overlap or conflict between AVX10 and AVX512 options,
> some rules are provided to define the behavior, which will be described below.
>
> avx10.1 option will be provided as an alias of avx10.1-256.
>
> In the future, the AVX10 options will imply like this:
>
> AVX10.1-256 < AVX10.1-512
>  ^ ^
>  | |
>
> AVX10.2-256 < AVX10.2-512
>  ^ ^
>  | |
>
> AVX10.3-256 < AVX10.3-512
>  ^ ^
>  | |
>
> Each of them will have its own option to enable/disabled corresponding
> features. The alias avx10.x will also be provided.
>
> As mentioned in August version RFC, since we lean towards the adoption of
> AVX10 instead of AVX512 from now on, we don’t recommend users to combine the
> AVX10 and legacy AVX512 options.

I wonder whether adoption could be made easier by also providing a
-mavx10[.0] level that removes some of the more obscure sub-ISA requirements
to cover more existing implementations (I'd not add -mavx10.0-512 here).
I'd require only skylake-AVX512 features here, basically all non-KNL AVX512
CPUs should have a "virtual" AVX10 level that allows to use that feature set,
restricted to 256bits so future AVX10-256 implementations can handle it
as well as all existing (and relevant, which excludes KNL) AVX512
implementations.

Otherwise AVX10 is really a hard sell (as AVX512 was originally).

> However, we would like to introduce some
> simple rules for user when it comes to combination.
>
> 1. Enabling AVX10 and AVX512 at the same command line with different vector
> size will lead to a warning message. The behavior of the compiler will be
> enabling AVX10 with longer, i.e., 512 bit vector size.
>
> If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f -mno-evex512,
> -mavx10.1-512 -mavx512f), it will be valid with the corresponding vector size.
>
> 2. -mno-avx10.1 option can’t disable any features enabled by AVX512 options or
> impact the vector size, and vice versa. The compiler will emit warnings if
> necessary.
>
> For the auto dispatch support including function multi versioning, function
> attribute usage, the behavior will be identical to compiler options.
>
> If you have any questions, feel free to ask in this thread.
>
> Thx,
> Haochen
>
>

[committed] amdgcn: Fix vector min/max ICE (pr112313)

2023-11-10 Thread Andrew Stubbs

I've just committed this patch to fix pr112313 (oops, I forgot to write 
the number in the commit message).


The problem was a missed case in the vector reduction expand code.

Andrewamdgcn: Fix vector min/max ICE

The DImode min/max instructions need a clobber that SImode does not, so
add the special case to the reduction expand code.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Add clobber to DImode
min/max instructions.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6a2aaefceca..ac299259213 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5514,7 +5514,15 @@ gcn_expand_reduc_scalar (machine_mode mode, rtx src, int 
unspec)
{
  rtx tmp = gen_reg_rtx (mode);
  emit_insn (gen_dpp_move (mode, tmp, in, shift_val));
- emit_insn (gen_rtx_SET (out, gen_rtx_fmt_ee (code, mode, tmp, in)));
+ rtx insn = gen_rtx_SET (out, gen_rtx_fmt_ee (code, mode, tmp, in));
+ if (scalar_mode == DImode)
+   {
+ rtx clobber = gen_rtx_CLOBBER (VOIDmode,
+gen_rtx_REG (DImode, VCC_REG));
+ insn = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (2, insn, clobber));
+   }
+ emit_insn (insn);
}
   else
{

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Richard Sandiford

Lehua Ding  writes:
> Hi Richard,
>
> On 2023/11/8 17:40, Richard Sandiford wrote:
>> Tracking subreg liveness will sometimes expose dead code that
>> wasn't obvious without it.  PR89606 has an example of this.
>> There the dead code was introduced by init-regs, and there's a
>> debate about (a) whether init-regs should still be run and (b) if it
>> should still be run, whether it should use subreg liveness tracking too.
>> 
>> But I think such dead code is possible even without init-regs.
>> So for the purpose of this series, I think the init-regs behaviour
>> in that PR creates a helpful example.
>
> Yes, I think the init-regs should be enhanced to reduce unnecessary 
> initialization. My previous internal patchs did this in a separate 
> patch. Maybe I should split the live_subreg problem out of the second 
> patch and not couple it with these patches. That way it can be reviewed 
> separately.

But my point was that this kind of dead code is possible even without
init-regs.  So I think we should have something that removes the dead
code.  And we can try it on that PR (without changing init-regs).

Thanks,
Richard

>
>> I agree with Richi of course that compile-time is a concern.
>> The patch seems to add quite a bit of new data to ira_allocno,
>> but perhaps that's OK.  ira_object + ira_allocno is already quite big.
>> 
>> However:
>> 
>> @@ -387,8 +398,8 @@ struct ira_allocno
>> /* An array of structures describing conflict information and live
>>ranges for each object associated with the allocno.  There may be
>>more than one such object in cases where the allocno represents a
>> - multi-word register.  */
>> -  ira_object_t objects[2];
>> + multi-hardreg pesudo.  */
>> +  std::vector objects;
>> /* Registers clobbered by intersected calls.  */
>>  HARD_REG_SET crossed_calls_clobbered_regs;
>> /* Array of usage costs (accumulated and the one updated during
>> 
>> adds an extra level of indirection (and separate extra storage) for
>> every allocno, not just multi-hardreg ones.  It'd be worth optimising
>> the data structures' representation of single-hardreg pseudos even if
>> that slows down the multi-hardreg code, since single-hardreg pseudos are
>> so much more common.  And the different single-hardreg and multi-hardreg
>> representations could be hidden behind accessors, to make life easier
>> for consumers.  (Of course, performance of the accessors is also then
>> an issue. :))
>
> Okay, I'll try. Thank you so much.

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Wilco Dijkstra

Hi Kyrill,

> +  /* Reduce the maximum size with -Os.  */
> +  if (optimize_function_for_size_p (cfun))
> +    max_set_size = 96;
> +

>  This is a new "magic" number in this code. It looks sensible, but how 
> did you arrive at it?

We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
for every 32 bytes, so the 96 means 4 instructions for typical sizes (sizes not
a multiple of 16 can add one extra instruction).

I checked codesize on SPECINT2017, and 96 had practically identical size.
Using 128 would also be a reasonable Os value with a very slight size increase, 
and 384 looks good for O2 - however I didn't want to tune these values as this
is a cleanup patch.

Cheers,
Wilco

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richard.
> 
> I am sorry for bothering you. I am trying to understand what you mean.
> 
> Is this following codes that you want ?
> 
>   /* Create the vector that holds the step of the induction.  */
>   if (nested_in_vect_loop)
> {
>   /* iv_loop is nested in the loop to be vectorized. Generate:
>  vec_step = [S, S, S, S]  */
>   new_name = step_expr;
>   /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop.  
> */
>   gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);
>   new_vec = build_vector_from_val (step_vectype, t);
>   vec_step
> = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, 
> NULL);
> }
>   else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> {
>   /* When we're using loop_len produced by SELEC_VL, the non-final
>  iterations are not always processing VF elements.  So vectorize
>  induction variable instead of
> 
>_21 = vect_vec_iv_.6_22 + { VF, ... };
> 
>  We should generate:
> 
>_35 = .SELECT_VL (ivtmp_33, VF);
>vect_cst__22 = [vec_duplicate_expr] _35;
>_21 = vect_vec_iv_.6_22 + vect_cst__22;  */
>   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>   tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
>   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
>  unshare_expr (len)),
>&seq, true, NULL_TREE);
>   gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);
>   new_vec = build_vector_from_val (step_vectype, t);
>   vec_step
> = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, 
> &si);
> }
>   else
> {
>   /* iv_loop is the loop to be vectorized. Generate:
>   vec_step = [VF*S, VF*S, VF*S, VF*S]  */
>   gimple_seq seq = NULL;
>   if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> {
>   expr = build_int_cst (integer_type_node, vf);
>   expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> }
>   else
> expr = build_int_cst (TREE_TYPE (step_expr), vf);
>   new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr),
>expr, step_expr);
>   if (seq)
> {
>   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
>   gcc_assert (!new_bb);
> }
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);
>   new_vec = build_vector_from_val (step_vectype, t);
>   vec_step
> = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, 
> NULL);
> }
> 
> It seems that this following codes:
> 
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);
>   new_vec = build_vector_from_val (step_vectype, t);
>   vec_step
> = vect_init_vector 
> 
> appears 3 times.  I am not sure whether it is the way you want?

I'd avoid that particular bit by having

   gimple_stmt_iterator *si = NULL;

before the if () and set that accordingly only in the
LOOP_VINFO_USING_SELECT_VL_P path.  But otherwise yes.

Richard.

> 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-10 17:46
> To: ???
> CC: richard.guenther; gcc-patches; richard.sandiford; kito.cheng; kito.cheng
> Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable 
> vectorization for RVV
> On Thu, 9 Nov 2023, ??? wrote:
>  
> > Hi, Richard.
> > 
> > >> I think it would be better to split out building a tree from VF from both
> > >> arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P.
> > 
> > I am trying to split out building tree from both arms as you suggested..
> > Could you take a look the following codes ?
> > 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 8abc1937d74..24a86187d11 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10315,19 +10315,47 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >/* iv_loop is the loop to be vectorized. Generate:
> > vec_step = [VF*S, VF*S, VF*S, VF*S]  */
> >gimple_seq seq = NULL;
> > -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> > +  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> >   {
> > -   expr = build_int_cst (integer_type_node, vf);
> > -   expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> > +   /* When we're

RE: [PATCH] tree-optimization/111950 - vectorizer loop copying

2023-11-10 Thread Richard Biener

On Thu, 9 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, November 9, 2023 11:54 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] tree-optimization/111950 - vectorizer loop copying
> > 
> > On Thu, 9 Nov 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, November 9, 2023 9:24 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org
> > > > Subject: RE: [PATCH] tree-optimization/111950 - vectorizer loop
> > > > copying
> > > >
> > > > On Thu, 9 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > >   guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > >   edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> > > > > > - guard_to = split_edge (epilog_e);
> > > > > > + guard_to = epilog_e->dest;
> > > > > >   guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, 
> > > > > > guard_to,
> > > > > >skip_vector ? anchor : 
> > > > > > guard_bb,
> > > > > >prob_epilog.invert (),
> > > > > > @@ -3443,8 +3229,30 @@ vect_do_peeling (loop_vec_info
> > > > > > loop_vinfo, tree niters, tree nitersm1,
> > > > > >   if (vect_epilogues)
> > > > > > epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > > > >   edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > - slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv,
> > > > > > guard_e,
> > > > > > - epilog_e);
> > > > > > + gphi_iterator gsi2 = gsi_start_phis (main_iv->dest);
> > > > > > + for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > > +  !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > +   {
> > > > > > + /* We are expecting all of the PHIs we have on epilog_e
> > > > > > +to be also on the main loop exit.  But sometimes
> > > > > > +a stray virtual definition can appear at epilog_e
> > > > > > +which we can then take as the same on all exits,
> > > > > > +we've removed the LC SSA PHI on the main exit before
> > > > > > +so we wouldn't need to create a loop PHI for it.  */
> > > > > > + if (virtual_operand_p (gimple_phi_result (*gsi))
> > > > > > + && (gsi_end_p (gsi2)
> > > > > > + || !virtual_operand_p (gimple_phi_result 
> > > > > > (*gsi2
> > > > > > +   add_phi_arg (*gsi,
> > > > > > +gimple_phi_arg_def_from_edge (*gsi,
> > epilog_e),
> > > > > > +guard_e, UNKNOWN_LOCATION);
> > > > > > + else
> > > > > > +   {
> > > > > > + add_phi_arg (*gsi, gimple_phi_result (*gsi2),
> > guard_e,
> > > > > > +  UNKNOWN_LOCATION);
> > > > > > + gsi_next (&gsi2);
> > > > > > +   }
> > > > > > +   }
> > > > > > +
> > > > >
> > > > > I've been having some trouble incorporating this change into the
> > > > > early break
> > > > work.
> > > > > My understanding is that here you've removed the lookup that
> > > > > find_guard did and are assuming that the order between the PHI
> > > > > nodes between loop->exit and epilog->exit are the same - sporadic
> > > > > virtual
> > > > operands.
> > > > >
> > > > > But the loop->exit for early break has to materialize all PHI
> > > > > nodes from the main loop into the epilog loop since we need them
> > > > > to restart the
> > > > scalar loop iteration.
> > > > >
> > > > > This means that the number of PHI nodes between the first loop and
> > > > > the second Loop are not the same, so we end up mis-linking phi nodes.
> > > > > i.e. consider this loop
> > > > >
> > > > >
> > https://gist.github.com/Mistuke/65d476b18f991772fdec159a09b81869
> > > >
> > > > I don't see any multi-exits here?  I think you need exactly the same
> > > > PHIs you need for the branch to the epilogue, no?
> > > >
> > >
> > > Ah it's a failing testcase but not one with an early break,
> > >
> > > > If you can point me to a testcase that fails on your branch I can
> > > > try to have a look.
> > >
> > > I've updated the branch refs/users/tnfchris/heads/gcc-14-early-break
> > >
> > > Quite a few tests fail, a simple one is vect-early-break_5.c and
> > > vect-early-break_20.c
> > >
> > > But what you just said above makes me wonder.. at the moment before we
> > > have differening amount because we require to have the loop counters
> > > and IVs as PHI nodes such that vect_update_ivs_after_vectorizer can
> > > thread them through correctly as it searches for PHI nodes.  However
> > > for the epilog exit, those that are not live are not needed.  This is why 
> > > we get
> > different counts.
> > >
> > > Maybe.. the solution is that I need to do the same thing as
> > > vectorizable_live_operations In that when
> > > vect_update_ivs_after_vectorizer

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-11-10 Thread Wilco Dijkstra

Hi Kyrill,

> +  if (!(hwcap & HWCAP_CPUID))
> +    return false;
> +
> +  unsigned long midr;
> +  asm volatile ("mrs %0, midr_el1" : "=r" (midr));

> From what I recall that midr_el1 register is emulated by the kernel and so 
> userspace software
> has to check that the kernel supports that emulation through hwcaps before 
> reading it.
> According to 
> https://www.kernel.org/doc/html/v5.8/arm64/cpu-feature-registers.html you
> need to check getauxval(AT_HWCAP) & HWCAP_CPUID) before doing that read.

That's why I do that immediately before reading midr_el1 - see above.

Cheers,
Wilco

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread juzhe.zh...@rivai.ai

Thanks a lot. I think I finally understand what you mean now :).

Could you confirm this following codes:?

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8abc1937d74..5615b16bdcd 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10306,10 +10306,39 @@ vectorizable_induction (loop_vec_info loop_vinfo,


   /* Create the vector that holds the step of the induction.  */
+  gimple_stmt_iterator *step_iv_si = NULL;
   if (nested_in_vect_loop)
-/* iv_loop is nested in the loop to be vectorized. Generate:
-   vec_step = [S, S, S, S]  */
-new_name = step_expr;
+{
+  /* iv_loop is nested in the loop to be vectorized. Generate:
+vec_step = [S, S, S, S]  */
+  new_name = step_expr;
+  /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop.  */
+  gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
+}
+  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+{
+  /* When we're using loop_len produced by SELEC_VL, the non-final
+iterations are not always processing VF elements.  So vectorize
+induction variable instead of
+
+  _21 = vect_vec_iv_.6_22 + { VF, ... };
+
+We should generate:
+
+  _35 = .SELECT_VL (ivtmp_33, VF);
+  vect_cst__22 = [vec_duplicate_expr] _35;
+  _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
+  gimple_seq seq = NULL;
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+  tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
+  expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
+unshare_expr (len)),
+  &seq, true, NULL_TREE);
+  new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr,
+  step_expr);
+  gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
+  step_iv_si = &si;
+}
   else
 {
   /* iv_loop is the loop to be vectorized. Generate:
@@ -10336,7 +10365,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
  || TREE_CODE (new_name) == SSA_NAME);
   new_vec = build_vector_from_val (step_vectype, t);
   vec_step = vect_init_vector (loop_vinfo, stmt_info,
-  new_vec, step_vectype, NULL);
+  new_vec, step_vectype, step_iv_si);


   /* Create the following def-use cycle:
@@ -10382,6 +10411,8 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   gimple_seq seq = NULL;
   /* FORNOW. This restriction should be relaxed.  */
   gcc_assert (!nested_in_vect_loop);
+  /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false if ncopies > 1.  */
+  gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));


If it is Ok for you. I am gonna testing it on X86 bootstrap + regtest.

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-10 18:19
To: juzhe.zh...@rivai.ai
CC: Richard Biener; gcc-patches; richard.sandiford; kito.cheng; Kito.cheng
Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable 
vectorization for RVV
On Fri, 10 Nov 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richard.
> 
> I am sorry for bothering you. I am trying to understand what you mean.
> 
> Is this following codes that you want ?
> 
>   /* Create the vector that holds the step of the induction.  */
>   if (nested_in_vect_loop)
> {
>   /* iv_loop is nested in the loop to be vectorized. Generate:
>  vec_step = [S, S, S, S]  */
>   new_name = step_expr;
>   /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop.  
> */
>   gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);
>   new_vec = build_vector_from_val (step_vectype, t);
>   vec_step
> = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, 
> NULL);
> }
>   else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> {
>   /* When we're using loop_len produced by SELEC_VL, the non-final
>  iterations are not always processing VF elements.  So vectorize
>  induction variable instead of
> 
>_21 = vect_vec_iv_.6_22 + { VF, ... };
> 
>  We should generate:
> 
>_35 = .SELECT_VL (ivtmp_33, VF);
>vect_cst__22 = [vec_duplicate_expr] _35;
>_21 = vect_vec_iv_.6_22 + vect_cst__22;  */
>   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>   tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
>   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
>  unshare_expr (len)),
>&seq, true, NULL_TREE);
>   gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
>   t = unshare_expr (new_name);
>   gcc_assert (CONSTA

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding


On 2023/11/10 18:16, Richard Sandiford wrote:

Lehua Ding  writes:

Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.


Yes, I think the init-regs should be enhanced to reduce unnecessary
initialization. My previous internal patchs did this in a separate
patch. Maybe I should split the live_subreg problem out of the second
patch and not couple it with these patches. That way it can be reviewed
separately.


But my point was that this kind of dead code is possible even without
init-regs.  So I think we should have something that removes the dead
code.  And we can try it on that PR (without changing init-regs).


Got it, so we should add a fast remove dead code job after init-regs pass.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Thanks a lot. I think I finally understand what you mean now :).
> 
> Could you confirm this following codes:?
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..5615b16bdcd 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10306,10 +10306,39 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> 
> 
>/* Create the vector that holds the step of the induction.  */
> +  gimple_stmt_iterator *step_iv_si = NULL;
>if (nested_in_vect_loop)
> -/* iv_loop is nested in the loop to be vectorized. Generate:
> -   vec_step = [S, S, S, S]  */
> -new_name = step_expr;
> +{
> +  /* iv_loop is nested in the loop to be vectorized. Generate:
> +vec_step = [S, S, S, S]  */
> +  new_name = step_expr;
> +  /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop.  
> */
> +  gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> +}
> +  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> +{
> +  /* When we're using loop_len produced by SELEC_VL, the non-final
> +iterations are not always processing VF elements.  So vectorize
> +induction variable instead of
> +
> +  _21 = vect_vec_iv_.6_22 + { VF, ... };
> +
> +We should generate:
> +
> +  _35 = .SELECT_VL (ivtmp_33, VF);
> +  vect_cst__22 = [vec_duplicate_expr] _35;
> +  _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
> +  gimple_seq seq = NULL;
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> +  tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 
> 0);
> +  expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
> +unshare_expr (len)),
> +  &seq, true, NULL_TREE);
> +  new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr,
> +  step_expr);
> +  gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
> +  step_iv_si = &si;
> +}
>else
>  {
>/* iv_loop is the loop to be vectorized. Generate:
> @@ -10336,7 +10365,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>   || TREE_CODE (new_name) == SSA_NAME);
>new_vec = build_vector_from_val (step_vectype, t);
>vec_step = vect_init_vector (loop_vinfo, stmt_info,
> -  new_vec, step_vectype, NULL);
> +  new_vec, step_vectype, step_iv_si);
> 
> 
>/* Create the following def-use cycle:
> @@ -10382,6 +10411,8 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>gimple_seq seq = NULL;
>/* FORNOW. This restriction should be relaxed.  */
>gcc_assert (!nested_in_vect_loop);
> +  /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false if ncopies > 1.  
> */
> +  gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> 
> 
> If it is Ok for you. I am gonna testing it on X86 bootstrap + regtest.

Yep.

> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-10 18:19
> To: juzhe.zh...@rivai.ai
> CC: Richard Biener; gcc-patches; richard.sandiford; kito.cheng; Kito.cheng
> Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable 
> vectorization for RVV
> On Fri, 10 Nov 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richard.
> > 
> > I am sorry for bothering you. I am trying to understand what you mean.
> > 
> > Is this following codes that you want ?
> > 
> >   /* Create the vector that holds the step of the induction.  */
> >   if (nested_in_vect_loop)
> > {
> >   /* iv_loop is nested in the loop to be vectorized. Generate:
> >  vec_step = [S, S, S, S]  */
> >   new_name = step_expr;
> >   /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop. 
> >  */
> >   gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> >   t = unshare_expr (new_name);
> >   gcc_assert (CONSTANT_CLASS_P (new_name)
> >   || TREE_CODE (new_name) == SSA_NAME);
> >   new_vec = build_vector_from_val (step_vectype, t);
> >   vec_step
> > = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, 
> > NULL);
> > }
> >   else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> > {
> >   /* When we're using loop_len produced by SELEC_VL, the non-final
> >  iterations are not always processing VF elements.  So vectorize
> >  induction variable instead of
> > 
> >_21 = vect_vec_iv_.6_22 + { VF, ... };
> > 
> >  We should generate:
> > 
> >_35 = .SELECT_VL (ivtmp_33, VF);
> >vect_cst__22 = [vec_duplicate_expr] _35;
> >_21 = vect_vec_iv_.6_22 + vect_cst__22;  */
> >   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> >   tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 
> > 0);
> >   expr

Re: [2/4] aarch64: Fix tme intrinsic availability

2023-11-10 Thread Richard Sandiford

Andrew Carlotti  writes:
> The availability of tme intrinsics was previously gated at both
> initialisation time (using global target options) and usage time
> (accounting for function-specific target options).  This patch removes
> the check at initialisation time, and also moves the intrinsics out of
> the header file to allow for better error messages (matching the
> existing error messages for SVE intrinsics).
>
> gcc/ChangeLog:
>
>   PR target/112108
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
>   (aarch64_general_init_builtins): Remove feature check.
>   (aarch64_check_general_builtin_call): New.
>   (aarch64_expand_builtin_tme): Check feature availability.
>   * config/aarch64/aarch64-c.cc (aarch64_check_builtin_call): Add
>   check for non-SVE builtins.
>   * config/aarch64/aarch64-protos.h (aarch64_check_general_builtin_call):
>   New prototype.
>   * config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
>   (__ttest): Remove.
>   (_TMFAILURE_*): Define unconditionally.

My main concern with this is that it makes the functions available
even without including the header file.  That's fine from a namespace
pollution PoV, since the functions are in the implementation namespace.
But it might reduce code portability if GCC allows these ACLE functions
to be used without including the header file, while other compilers
require the header file.

For LS64 we instead used a pragma to trigger the definition of the
functions (requiring aarch64_general_simulate_builtin rather than
aarch64_general_add_builtin).  I think it'd be better to do the same here.

> gcc/testsuite/ChangeLog:
>
>   PR target/112108
>   * gcc.target/aarch64/acle/tme_guard-1.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-2.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-3.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-4.c: New test.
>
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 11a9ba2256f105d8cb9cdc4d6decb5b2be3d69af..ac0259a892e16adb5b241032ac3df1e7ab5370ef
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1765,19 +1765,19 @@ aarch64_init_tme_builtins (void)
>  = build_function_type_list (void_type_node, uint64_type_node, NULL);
>  
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
> += aarch64_general_add_builtin ("__tstart",
>  ftype_uint64_void,
>  AARCH64_TME_BUILTIN_TSTART);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
> -= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
> += aarch64_general_add_builtin ("__ttest",
>  ftype_uint64_void,
>  AARCH64_TME_BUILTIN_TTEST);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
> += aarch64_general_add_builtin ("__tcommit",
>  ftype_void_void,
>  AARCH64_TME_BUILTIN_TCOMMIT);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
> += aarch64_general_add_builtin ("__tcancel",
>  ftype_void_uint64,
>  AARCH64_TME_BUILTIN_TCANCEL);
>  }
> @@ -2034,8 +2034,7 @@ aarch64_general_init_builtins (void)
>if (!TARGET_ILP32)
>  aarch64_init_pauth_hint_builtins ();
>  
> -  if (TARGET_TME)
> -aarch64_init_tme_builtins ();
> +  aarch64_init_tme_builtins ();
>  
>if (TARGET_MEMTAG)
>  aarch64_init_memtag_builtins ();
> @@ -2137,6 +2136,24 @@ aarch64_check_required_extensions (location_t 
> location, tree fndecl,
>gcc_unreachable ();
>  }
>  
> +bool aarch64_check_general_builtin_call (location_t location,
> +  unsigned int fcode)

Formatting trivia: should be a line break after the "bool".  Would be
worth having a comment like:

/* Implement TARGET_CHECK_BUILTIN_CALL for the AARCH64_BUILTIN_GENERAL
   group.  */

"aarch64_general_check_builtin_call" would avoid splitting the name
of the target hook.

Thanks,
Richard

> +{
> +  tree fndecl = aarch64_builtin_decls[fcode];
> +  switch (fcode)
> +{
> +  case AARCH64_TME_BUILTIN_TSTART:
> +  case AARCH64_TME_BUILTIN_TCOMMIT:
> +  case AARCH64_TME_BUILTIN_TTEST:
> +  case AARCH64_TME_BUILTIN_TCANCEL:
> + return aarch64_check_required_extensions (location, fndecl,
> +   AARCH64_FL_TME, false);
> +
> +  default:
> + break;
> +}
> +  return true;
> +}
>  
>  typedef enum
>  {
> @@ -2559,6 +2576,11 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, 
> int fcode)
>  static rtx
>  aarch64_e

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-10 Thread Richard Biener

On Thu, Nov 2, 2023 at 9:43 AM Arsen Arsenović  wrote:
>
> Morning!
>
> This patch is a rebase and slight wording tweak of
> https://inbox.sourceware.org/20231006140501.3370874-1-ar...@aarsen.me
>
> Changes since v2:
> - Elaborate on the libintl requirement on non-glibc hosts, per Andrews
>   request
>
> Range diff since v2 (since it seems sufficiently readable here):
> @@ gcc/doc/install.texi: which lets GCC output diagnostics in languages
>   English.  Native Language Support is enabled by default if not doing a
>   canadian cross build.  The @option{--disable-nls} option disables NLS@.
>
> ++Note that this functionality requires either libintl (provided by GNU
> ++gettext) or C standard library that contains support for gettext (such
> ++as the GNU C Library).
> ++@xref{with-included-gettext,,--with-included-gettext} for more
> ++information on the conditions required to get gettext support.
> ++
>  +@item --with-libintl-prefix=@var{dir}
>  +@itemx --without-libintl-prefix
>  +Searches for libintl in @file{@var{dir}/include} and
> @@ gcc/doc/install.texi: which lets GCC output diagnostics in languages
>  +Specifies the type of library to search for when looking for libintl.
>  +@var{type} can be one of @code{auto}, @code{static} or @code{shared}.
>  +
> ++@anchor{with-included-gettext}
>   @item --with-included-gettext
>
>
> OK for trunk?  (granted that a regstrap + hand-test for working
> localization passes - build's ongoing asynchronously)

OK on Monday if there are no objections until then.  Please also
sync with the binutils tree then.

Thanks,
Richard.

> Thanks in advance, have a lovely day.
>
> Arsen Arsenović (2):
>   intl: remove, in favor of out-of-tree gettext
>   *: add modern gettext
>
>  .gitignore |1 +
>  Makefile.def   |   72 +-
>  Makefile.in| 1612 +++
>  config/gettext-sister.m4   |   35 +-
>  config/gettext.m4  |  357 +-
>  config/iconv.m4|  313 +-
>  config/intlmacosx.m4   |   69 +
>  configure  |   44 +-
>  configure.ac   |   44 +-
>  contrib/download_prerequisites |2 +
>  contrib/prerequisites.md5  |1 +
>  contrib/prerequisites.sha512   |1 +
>  gcc/Makefile.in|8 +-
>  gcc/aclocal.m4 |4 +
>  gcc/configure  | 2001 +++-
>  gcc/doc/install.texi   |   72 +-
>  intl/ChangeLog |  306 --
>  intl/Makefile.in   |  264 -
>  intl/README|   21 -
>  intl/VERSION   |1 -
>  intl/aclocal.m4|   33 -
>  intl/bindtextdom.c |  374 --
>  intl/config.h.in   |  280 --
>  intl/config.intl.in|   12 -
>  intl/configure | 8288 
>  intl/configure.ac  |  108 -
>  intl/dcgettext.c   |   59 -
>  intl/dcigettext.c  | 1238 -
>  intl/dcngettext.c  |   60 -
>  intl/dgettext.c|   60 -
>  intl/dngettext.c   |   62 -
>  intl/eval-plural.h |  114 -
>  intl/explodename.c |  192 -
>  intl/finddomain.c  |  195 -
>  intl/gettext.c |   64 -
>  intl/gettextP.h|  224 -
>  intl/gmo.h |  148 -
>  intl/hash-string.h |   59 -
>  intl/intl-compat.c |  151 -
>  intl/l10nflist.c   |  453 --
>  intl/libgnuintl.h  |  341 --
>  intl/loadinfo.h|  156 -
>  intl/loadmsgcat.c  | 1322 -
>  intl/localcharset.c|  398 --
>  intl/localcharset.h|   42 -
>  intl/locale.alias  |   78 -
>  intl/localealias.c |  419 --
>  intl/localename.c  |  772 ---
>  intl/log.c |  104 -
>  intl/ngettext.c|   68 -
>  intl/osdep.c   |   24 -
>  intl/plural-config.h   |1 -
>  intl/plural-exp.c  |  156 -
>  intl/plural-exp.h  |  132 -
>  intl/plural.c  | 1540 --
>  intl/plural.y  |  434 --
>  intl/relocatable.c |  439 --
>  intl/relocatable.h |   67 -
>  intl/textdomain.c  |  142 -
>  libcpp/aclocal.m4  |5 +
>  libcpp/configure   | 2139 -
>  libstdc++-v3/configure |  727 +--
>  62 files changed, 5467 insertions(+), 21441 deletions(-)
>  create mode 100644 config/intlmacosx.m4
>  delete mode 100644 intl/ChangeLog
>  delete mode 100644 intl/Makefile.in
>  delete mode 100644 intl/README
>  delete mode 100644 intl/VERSION
>  delete mode 100644 intl/aclocal.m4
>  delete mode 100644 intl/bindtextdom.c
>  delete mode 100644 intl/config.h.in
>  delete mode 100644 intl/config.intl.in
>  delete mode 100755 intl/configure
>  delete mode 100644 intl/configure.ac
>  delete mode 100644 intl/dcg

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Richard Sandiford

Lehua Ding  writes:
> On 2023/11/10 18:16, Richard Sandiford wrote:
>> Lehua Ding  writes:
>>> Hi Richard,
>>>
>>> On 2023/11/8 17:40, Richard Sandiford wrote:
 Tracking subreg liveness will sometimes expose dead code that
 wasn't obvious without it.  PR89606 has an example of this.
 There the dead code was introduced by init-regs, and there's a
 debate about (a) whether init-regs should still be run and (b) if it
 should still be run, whether it should use subreg liveness tracking too.

 But I think such dead code is possible even without init-regs.
 So for the purpose of this series, I think the init-regs behaviour
 in that PR creates a helpful example.
>>>
>>> Yes, I think the init-regs should be enhanced to reduce unnecessary
>>> initialization. My previous internal patchs did this in a separate
>>> patch. Maybe I should split the live_subreg problem out of the second
>>> patch and not couple it with these patches. That way it can be reviewed
>>> separately.
>> 
>> But my point was that this kind of dead code is possible even without
>> init-regs.  So I think we should have something that removes the dead
>> code.  And we can try it on that PR (without changing init-regs).
>
> Got it, so we should add a fast remove dead code job after init-regs pass.

I'm just not sure how fast it would be, given that it needs the subreg
liveness info.  Could it be done during RA itself, during one of the existing
instruction walks?  E.g. if IRA sees a dead instruction, it could remove it
rather than recording conflict information for it.

Thanks,
Richard

[PATCH] middle-end/112469 - fix missing converts in vec_cond_expr simplification

2023-11-10 Thread Richard Biener

The following avoids type inconsistencies in .COND_op generated by
simplifications of VEC_COND_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/112469
* match.pd (cond ? op a : b -> .COND_op (cond, a, b)): Add
missing view_converts.

* gcc.dg/torture/pr112469.c: New testcase.
---
 gcc/match.pd|  8 
 gcc/testsuite/gcc.dg/torture/pr112469.c | 12 
 2 files changed, 16 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112469.c

diff --git a/gcc/match.pd b/gcc/match.pd
index f559bfa4f2b..281c6c087e6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8959,13 +8959,13 @@ and,
(with { tree op_type = TREE_TYPE (@3); }
 (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
 && is_truth_type_for (op_type, TREE_TYPE (@0)))
- (cond_op @0 @1 @2
+ (cond_op @0 (view_convert @1) @2
  (simplify
   (vec_cond @0 @1 (view_convert? (uncond_op@3 @2)))
(with { tree op_type = TREE_TYPE (@3); }
 (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
 && is_truth_type_for (op_type, TREE_TYPE (@0)))
- (cond_op (bit_not @0) @2 @1)
+ (cond_op (bit_not @0) (view_convert @2) @1)
 
 (for uncond_op (UNCOND_UNARY)
  cond_op (COND_LEN_UNARY)
@@ -8974,13 +8974,13 @@ and,
(with { tree op_type = TREE_TYPE (@3); }
 (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
 && is_truth_type_for (op_type, TREE_TYPE (@0)))
- (cond_op @0 @1 @2 @4 @5
+ (cond_op @0 (view_convert @1) @2 @4 @5
  (simplify
   (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
(with { tree op_type = TREE_TYPE (@3); }
 (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
 && is_truth_type_for (op_type, TREE_TYPE (@0)))
- (cond_op (bit_not @0) @2 @1 @4 @5)
+ (cond_op (bit_not @0) (view_convert @2) @1 @4 @5)
 
 /* `(a ? -1 : 0) ^ b` can be converted into a conditional not.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/torture/pr112469.c 
b/gcc/testsuite/gcc.dg/torture/pr112469.c
new file mode 100644
index 000..9978bcd4560
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112469.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+int a, b, c;
+static int *d = &a;
+int e(int f) { return f == 0 ? 1 : f; }
+void g() {
+  a = 1;
+  for (; a <= 8; a++) {
+b = e(*d);
+c = -b;
+  }
+}
-- 
2.35.3

Re: [PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

2023-11-10 Thread Prathamesh Kulkarni

On Thu, 5 Oct 2023 at 00:00, Brendan Shanks  wrote:
>
> Hi,
>
> This patch implements pex_unix_exec_child using posix_spawn when
> available.
>
> This should especially benefit recent macOS (where vfork just calls
> fork), but should have equivalent or faster performance on all
> platforms.
> In addition, the implementation is substantially simpler than the
> vfork+exec code path.
>
> Tested on x86_64-linux.
Hi Brendan,
It seems this patch caused the following regressions on aarch64:

FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b (test for excess errors)

Looking at g++.log:
/home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:
error: failed posix_spawnp mapper 'this-will-not-work'
In module imported at
/home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:2:1:
unique1.bob: error: failed to read compiled module: No such file or directory
unique1.bob: note: compiled module file is 'gcm.cache/unique1.bob.gcm'
unique1.bob: note: imports must be built before being imported
unique1.bob: fatal error: returning to the gate for a mechanical issue
compilation terminated.

Link to log files:
https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1159/artifact/artifacts/00-sumfiles/
Could you please investigate ?

Thanks,
Prathamesh
>
> v2: Fix error handling (previously the function would be run twice in
> case of error), and don't use a macro that changes control flow.
>
> v3: Match file style for error-handling blocks, don't close
> in/out/errdes on error, and check close() for errors.
>
> libiberty/
> * configure.ac (AC_CHECK_HEADERS): Add spawn.h.
> (checkfuncs): Add posix_spawn, posix_spawnp.
> (AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
> * configure, config.in: Rebuild.
> * pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.
>
> Signed-off-by: Brendan Shanks 
> ---
>  libiberty/configure.ac |   8 +-
>  libiberty/pex-unix.c   | 168 +
>  2 files changed, 173 insertions(+), 3 deletions(-)
>
> diff --git a/libiberty/configure.ac b/libiberty/configure.ac
> index 0748c592704..2488b031bc8 100644
> --- a/libiberty/configure.ac
> +++ b/libiberty/configure.ac
> @@ -289,7 +289,7 @@ AC_SUBST_FILE(host_makefile_frag)
>  # It's OK to check for header files.  Although the compiler may not be
>  # able to link anything, it had better be able to at least compile
>  # something.
> -AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h)
> +AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h spawn.h)
>  AC_HEADER_SYS_WAIT
>  AC_HEADER_TIME
>
> @@ -412,7 +412,8 @@ funcs="$funcs setproctitle"
>  vars="sys_errlist sys_nerr sys_siglist"
>
>  checkfuncs="__fsetlocking canonicalize_file_name dup3 getrlimit getrusage \
> - getsysinfo gettimeofday on_exit pipe2 psignal pstat_getdynamic 
> pstat_getstatic \
> + getsysinfo gettimeofday on_exit pipe2 posix_spawn posix_spawnp psignal \
> + pstat_getdynamic pstat_getstatic \
>   realpath setrlimit spawnve spawnvpe strerror strsignal sysconf sysctl \
>   sysmp table times wait3 wait4"
>
> @@ -435,7 +436,8 @@ if test "x" = "y"; then
>  index insque \
>  memchr memcmp memcpy memmem memmove memset mkstemps \
>  on_exit \
> -pipe2 psignal pstat_getdynamic pstat_getstatic putenv \
> +pipe2 posix_spawn posix_spawnp psignal \
> +pstat_getdynamic pstat_getstatic putenv \
>  random realpath rename rindex \
>  sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
>   stpcpy stpncpy strcasecmp strchr strdup \
> diff --git a/libiberty/pex-unix.c b/libiberty/pex-unix.c
> index 33b5bce31c2..336799d1125 100644
> --- a/libiberty/pex-unix.c
> +++ b/libiberty/pex-unix.c
> @@ -58,6 +58,9 @@ extern int errno;
>  #ifdef HAVE_PROCESS_H
>  #include 
>  #endif
> +#ifdef HAVE_SPAWN_H
> +#include 
> +#endif
>
>  #ifdef vfork /* Autoconf may define this to fork for us. */
>  # define VFORK_STRING "fork"
> @@ -559,6 +562

[PATCH] Handle constant CONSTRUCTORs in operand_compare

2023-11-10 Thread Eric Botcazou

Hi,

this teaches operand_compare to compare constant CONSTRUCTORs, which is quite
helpful for so-called fat pointers in Ada, i.e. objects that are semantically
pointers but are represented by structures made up of two pointers.  This is
modeled on the implementation present in the ICF pass.

Bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2023-11-10  Eric Botcazou  

* fold-const.cc (operand_compare::operand_equal_p) :
Deal with nonempty constant CONSTRUCTORs.
(operand_compare::hash_operand) : Hash DECL_FIELD_OFFSET
and DECL_FIELD_BIT_OFFSET for FIELD_DECLs.


2023-11-10  Eric Botcazou  

* gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 40767736389..332bc8aead2 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -3315,9 +3315,65 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
 flags | OEP_ADDRESS_OF
 | OEP_MATCH_SIDE_EFFECTS);
   case CONSTRUCTOR:
-	/* In GIMPLE empty constructors are allowed in initializers of
-	   aggregates.  */
-	return !CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1);
+	{
+	  /* In GIMPLE empty constructors are allowed in initializers of
+	 aggregates.  */
+	  if (!CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1))
+	return true;
+
+	  /* See sem_variable::equals in ipa-icf for a similar approach.  */
+	  tree typ0 = TREE_TYPE (arg0);
+	  tree typ1 = TREE_TYPE (arg1);
+
+	  if (TREE_CODE (typ0) != TREE_CODE (typ1))
+	return false;
+	  else if (TREE_CODE (typ0) == ARRAY_TYPE)
+	{
+	  /* For arrays, check that the sizes all match.  */
+	  const HOST_WIDE_INT siz0 = int_size_in_bytes (typ0);
+	  if (TYPE_MODE (typ0) != TYPE_MODE (typ1)
+		  || siz0 < 0
+		  || siz0 != int_size_in_bytes (typ1))
+		return false;
+	}
+	  else if (!types_compatible_p (typ0, typ1))
+	return false;
+
+	  vec *v0 = CONSTRUCTOR_ELTS (arg0);
+	  vec *v1 = CONSTRUCTOR_ELTS (arg1);
+	  if (vec_safe_length (v0) != vec_safe_length (v1))
+	return false;
+
+	  /* Address of CONSTRUCTOR is defined in GENERIC to mean the value
+	 of the CONSTRUCTOR referenced indirectly.  */
+	  flags &= ~OEP_ADDRESS_OF;
+
+	  for (unsigned idx = 0; idx < vec_safe_length (v0); ++idx)
+	{
+	  constructor_elt *c0 = &(*v0)[idx];
+	  constructor_elt *c1 = &(*v1)[idx];
+
+	  /* Check that the values are the same...  */
+	  if (c0->value != c1->value
+		  && !operand_equal_p (c0->value, c1->value, flags))
+		return false;
+
+	  /* ... and that they apply to the same field!  */
+	  if (c0->index != c1->index
+		  && (TREE_CODE (typ0) == ARRAY_TYPE
+		  ? !operand_equal_p (c0->index, c1->index, flags)
+		  : !operand_equal_p (DECL_FIELD_OFFSET (c0->index),
+	  DECL_FIELD_OFFSET (c1->index),
+	  flags)
+			|| !operand_equal_p (DECL_FIELD_BIT_OFFSET (c0->index),
+	 DECL_FIELD_BIT_OFFSET (c1->index),
+	 flags)))
+		return false;
+	}
+
+	  return true;
+	}
+
   default:
 	break;
   }
@@ -3703,9 +3759,7 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
 	 elements.  Individual elements in the constructor must be
 	 indexed in increasing order and form an initial sequence.
 
-	 We make no effort to compare constructors in generic.
-	 (see sem_variable::equals in ipa-icf which can do so for
-	  constants).  */
+	 We make no effort to compare nonconstant ones in GENERIC.  */
 	  if (!VECTOR_TYPE_P (TREE_TYPE (arg0))
 	  || !VECTOR_TYPE_P (TREE_TYPE (arg1)))
 	return false;
@@ -3887,7 +3941,13 @@ operand_compare::hash_operand (const_tree t, inchash::hash &hstate,
 	/* In GIMPLE the indexes can be either NULL or matching i.  */
 	if (field == NULL_TREE)
 	  field = bitsize_int (idx);
-	hash_operand (field, hstate, flags);
+	if (TREE_CODE (field) == FIELD_DECL)
+	  {
+		hash_operand (DECL_FIELD_OFFSET (field), hstate, flags);
+		hash_operand (DECL_FIELD_BIT_OFFSET (field), hstate, flags);
+	  }
+	else
+	  hash_operand (field, hstate, flags);
 	hash_operand (value, hstate, flags);
 	  }
 	return;
-- { dg-do compile }
-- { dg-options "-O -gnatn -fdump-tree-optimized" }

package body Opt103 is

  function Read return Mode is
S : String := Get;
M : Mode;

  begin
--  There should be a single call to Value_Enumeration_Pos after inlining

if Mode'Valid_Value (S) then
  M := Mode'Value (S);
else
  raise Program_Error;
end if;

return M;
  end;

  function Translate (S : String) return Mode is
M : Mode;

  begin
--  There should be a single call to Value_Enumeration_Pos after inlining

if Mode'Valid_Value (S) then
  M := Mode'Value (S);
else
  raise Program_Error;
end if;

return M;
  end;

end Opt103;

-- { dg-final { scan-tree-dump-times ".value_enumeration_pos" 2 "optimized"  } }
package Opt103 is

  type Mo

[PATCH] aarch64: Call named function in gcc.target/aarch64/aapcs64/ice_1.c

2023-11-10 Thread Florian Weimer

This test looks like it intends to pass a small struct argument
through both a non-variadic and variadic argument, but due to
the typo, it does not achieve that.

gcc/testsuite/

* gcc.target/aarch64/aapcs64/ice_1.c (foo): Call named.

---
 gcc/testsuite/gcc.target/aarch64/aapcs64/ice_1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/ice_1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/ice_1.c
index 906ccebf616..edc35db2f6e 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/ice_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/ice_1.c
@@ -16,6 +16,6 @@ void unnamed (int, ...);
 
 void foo ()
 {
-  name (0, );
+  named (0, );
   unnamed (0, );
 }

base-commit: 5f6c5fe078c45bc32c8d21da6b14c27c0ed7be6e

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov



On Thu, 9 Nov 2023, Jeff Law wrote:

> > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > operations.  When I started to work on this issue, initially I wanted to try
> > something similar to your idea #2, but when checking the APIs, I realized
> > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > scheduled range), skipping it can save some resource allocation (like block
> > dependencies) and make it more efficient (not enter function schedule_block
> > etc.), from this perspective it seems an enhancement.  Does it sound
> > reasonable to you?
> It sounds reasonable, but only if doing so doesn't add significant
> implementation complexity.  ie, the gains from doing less work here are likely
> to be very marginal, so I'm more interested in clean, easy to maintain code.

I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
DEBUG_INSNs participate in dependency graph so that schedulers can remove or
mutate them as needed when moving real insns across them.

Cc'ing Alexandre Oliva who can correct me on that if necessary.

Alexander

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Richard Earnshaw





On 10/11/2023 10:17, Wilco Dijkstra wrote:

Hi Kyrill,


+  /* Reduce the maximum size with -Os.  */
+  if (optimize_function_for_size_p (cfun))
+    max_set_size = 96;
+



 This is a new "magic" number in this code. It looks sensible, but how did 
you arrive at it?


We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
for every 32 bytes, so the 96 means 4 instructions for typical sizes 
(sizes not

a multiple of 16 can add one extra instruction).

I checked codesize on SPECINT2017, and 96 had practically identical size.
Using 128 would also be a reasonable Os value with a very slight size 
increase,
and 384 looks good for O2 - however I didn't want to tune these values 
as this

is a cleanup patch.

Cheers,
Wilco


Shouldn't this be a param then?  Also, manifest constants in the middle 
of code are a potential nightmare, please move it to a #define (even if 
that's then used as the default value for the param).

Re: [PATCH] vect: Don't set excess bits in unform masks

2023-11-10 Thread Andrew Stubbs


On 23/10/2023 11:43, Richard Biener wrote:

On Fri, 20 Oct 2023, Andrew Stubbs wrote:


This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
mask enable extra lanes that were supposed to be unused and are therefore
undefined.

Richi suggested an alternative approach involving narrower types and then a
zero-extend to the actual mask type.  This solved the problem for the specific
test case that I had, but I'm not sure if it would work with V2 and V4 modes
(not that I've observed bad behaviour from them anyway, but still).  There
were some other caveats involving "two-lane division" that I don't fully
understand, so I went with the simpler implementation.

This patch does have the disadvantage of an additional "and" instruction in
the non-constant case even for machines that don't need it. I'm not sure how
to fix that without an additional target hook. (If GCC could use the 64-lane
vectors more effectively without the assistance of artificially reduced sizes
then this problem wouldn't exist.)

OK to commit?


-   convert_move (target, op0, 0);
+   rtx tmp = gen_reg_rtx (mode);
+   convert_move (tmp, op0, 0);
+
+   if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+ GET_MODE_PRECISION (mode)))

Usually this would be maybe_ne, but then ...

+ {
+   /* Ensure no excess bits are set.
+  GCN needs this, AVX does not.  */
+   expand_binop (mode, and_optab, tmp,
+ GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+ .to_constant())) - 1),
+ target, true, OPTAB_DIRECT);

here you have .to_constant ().  I think with having an integer mode
we know subparts is constant so I'd prefer

 auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
 if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
...

+ }
+   else
+ emit_move_insn (target, tmp);

note you need the emit_move_insn also for the expand_binop
path since it's not guaranteed that 'target' is used there.  Thus

   tmp = expand_binop (...)
   if (tmp != target)
 emit_move_insn (...)

Otherwise looks good to me.  The and is needed on x86 for
two and four bit masks, it would be more efficient to use
smaller modes for the sign-extension I guess.


I think this patch addresses these issues. I've confirmed it does the 
right thing on amdgcn.


OK?

Andrewvect: Don't set excess bits in unform masks

AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.

This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.

gcc/ChangeLog:

* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ed4dbb13d89..3e2a678710d 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7470,7 +7470,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
break;
  }
/* Use sign-extension for uniform boolean vectors with
-  integer modes.  */
+  integer modes.  Effectively "vec_duplicate" for bitmasks.  */
if (!TREE_SIDE_EFFECTS (exp)
&& VECTOR_BOOLEAN_TYPE_P (type)
&& SCALAR_INT_MODE_P (mode)
@@ -7479,7 +7479,19 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  {
rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
 expand_normal (elt));
-   convert_move (target, op0, 0);
+   rtx tmp = gen_reg_rtx (mode);
+   convert_move (tmp, op0, 0);
+
+   /* Ensure no excess bits are set.
+  GCN needs this for nunits < 64.
+  x86 needs this for nunits < 8.  */
+   auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+   if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+ tmp = expand_binop (mode, and_optab, tmp,
+ GEN_INT ((1 << nunits) - 1), target,
+ true, OPTAB_DIRECT);
+   if (tmp != target)
+ emit_move_insn (target, tmp);
break;
  }

Re: [PATCH] vect: Don't set excess bits in unform masks

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, Andrew Stubbs wrote:

> On 23/10/2023 11:43, Richard Biener wrote:
> > On Fri, 20 Oct 2023, Andrew Stubbs wrote:
> > 
> >> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in
> >> the
> >> mask enable extra lanes that were supposed to be unused and are therefore
> >> undefined.
> >>
> >> Richi suggested an alternative approach involving narrower types and then a
> >> zero-extend to the actual mask type.  This solved the problem for the
> >> specific
> >> test case that I had, but I'm not sure if it would work with V2 and V4
> >> modes
> >> (not that I've observed bad behaviour from them anyway, but still).  There
> >> were some other caveats involving "two-lane division" that I don't fully
> >> understand, so I went with the simpler implementation.
> >>
> >> This patch does have the disadvantage of an additional "and" instruction in
> >> the non-constant case even for machines that don't need it. I'm not sure
> >> how
> >> to fix that without an additional target hook. (If GCC could use the
> >> 64-lane
> >> vectors more effectively without the assistance of artificially reduced
> >> sizes
> >> then this problem wouldn't exist.)
> >>
> >> OK to commit?
> > 
> > -   convert_move (target, op0, 0);
> > +   rtx tmp = gen_reg_rtx (mode);
> > +   convert_move (tmp, op0, 0);
> > +
> > +   if (known_ne (TYPE_VECTOR_SUBPARTS (type),
> > + GET_MODE_PRECISION (mode)))
> > 
> > Usually this would be maybe_ne, but then ...
> > 
> > + {
> > +   /* Ensure no excess bits are set.
> > +  GCN needs this, AVX does not.  */
> > +   expand_binop (mode, and_optab, tmp,
> > + GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
> > + .to_constant())) - 1),
> > + target, true, OPTAB_DIRECT);
> > 
> > here you have .to_constant ().  I think with having an integer mode
> > we know subparts is constant so I'd prefer
> > 
> >  auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >  if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
> > ...
> > 
> > + }
> > +   else
> > + emit_move_insn (target, tmp);
> > 
> > note you need the emit_move_insn also for the expand_binop
> > path since it's not guaranteed that 'target' is used there.  Thus
> > 
> >tmp = expand_binop (...)
> >if (tmp != target)
> >  emit_move_insn (...)
> > 
> > Otherwise looks good to me.  The and is needed on x86 for
> > two and four bit masks, it would be more efficient to use
> > smaller modes for the sign-extension I guess.
> 
> I think this patch addresses these issues. I've confirmed it does the right
> thing on amdgcn.
> 
> OK?

OK.

thanks,
Richard.

> Andrew
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [2/4] aarch64: Fix tme intrinsic availability

2023-11-10 Thread Andrew Carlotti

On Fri, Nov 10, 2023 at 10:34:29AM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > The availability of tme intrinsics was previously gated at both
> > initialisation time (using global target options) and usage time
> > (accounting for function-specific target options).  This patch removes
> > the check at initialisation time, and also moves the intrinsics out of
> > the header file to allow for better error messages (matching the
> > existing error messages for SVE intrinsics).
> >
> > gcc/ChangeLog:
> >
> > PR target/112108
> > * config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
> > (aarch64_general_init_builtins): Remove feature check.
> > (aarch64_check_general_builtin_call): New.
> > (aarch64_expand_builtin_tme): Check feature availability.
> > * config/aarch64/aarch64-c.cc (aarch64_check_builtin_call): Add
> > check for non-SVE builtins.
> > * config/aarch64/aarch64-protos.h (aarch64_check_general_builtin_call):
> > New prototype.
> > * config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
> > (__ttest): Remove.
> > (_TMFAILURE_*): Define unconditionally.
> 
> My main concern with this is that it makes the functions available
> even without including the header file.  That's fine from a namespace
> pollution PoV, since the functions are in the implementation namespace.
> But it might reduce code portability if GCC allows these ACLE functions
> to be used without including the header file, while other compilers
> require the header file.
> 
> For LS64 we instead used a pragma to trigger the definition of the
> functions (requiring aarch64_general_simulate_builtin rather than
> aarch64_general_add_builtin).  I think it'd be better to do the same here.

Good point - this is also the same as some simd intrinsic stuff I changed last
year.  I'll fix this in an updated patch, which will then also need a slightly
different version for backporting.

> > gcc/testsuite/ChangeLog:
> >
> > PR target/112108
> > * gcc.target/aarch64/acle/tme_guard-1.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-2.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-3.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-4.c: New test.
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 
> > 11a9ba2256f105d8cb9cdc4d6decb5b2be3d69af..ac0259a892e16adb5b241032ac3df1e7ab5370ef
> >  100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -1765,19 +1765,19 @@ aarch64_init_tme_builtins (void)
> >  = build_function_type_list (void_type_node, uint64_type_node, NULL);
> >  
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
> > += aarch64_general_add_builtin ("__tstart",
> >ftype_uint64_void,
> >AARCH64_TME_BUILTIN_TSTART);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
> > += aarch64_general_add_builtin ("__ttest",
> >ftype_uint64_void,
> >AARCH64_TME_BUILTIN_TTEST);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
> > += aarch64_general_add_builtin ("__tcommit",
> >ftype_void_void,
> >AARCH64_TME_BUILTIN_TCOMMIT);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
> > += aarch64_general_add_builtin ("__tcancel",
> >ftype_void_uint64,
> >AARCH64_TME_BUILTIN_TCANCEL);
> >  }
> > @@ -2034,8 +2034,7 @@ aarch64_general_init_builtins (void)
> >if (!TARGET_ILP32)
> >  aarch64_init_pauth_hint_builtins ();
> >  
> > -  if (TARGET_TME)
> > -aarch64_init_tme_builtins ();
> > +  aarch64_init_tme_builtins ();
> >  
> >if (TARGET_MEMTAG)
> >  aarch64_init_memtag_builtins ();
> > @@ -2137,6 +2136,24 @@ aarch64_check_required_extensions (location_t 
> > location, tree fndecl,
> >gcc_unreachable ();
> >  }
> >  
> > +bool aarch64_check_general_builtin_call (location_t location,
> > +unsigned int fcode)
> 
> Formatting trivia: should be a line break after the "bool".  Would be
> worth having a comment like:
> 
> /* Implement TARGET_CHECK_BUILTIN_CALL for the AARCH64_BUILTIN_GENERAL
>group.  */
> 
> "aarch64_general_check_builtin_call" would avoid splitting the name
> of the target hook.
> 
> Thanks,
> Richard
> 
> > +{
> > +  tree fndecl = aarch64_builtin_decls[fcode];
> > +  switch (fcode)
> > +{
> > +  case AARCH64_TME_BUILTIN_TSTART:
> > +  case AARCH64_TME_BUILTIN_TCOMMIT:
> > +

[PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Juzhe-Zhong

PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

1. Since SELECT_VL result is not necessary always VF in non-final iteration.

Current GIMPLE IR is wrong:

# vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
...
_35 = .SELECT_VL (ivtmp_33, VF);
_21 = vect_vec_iv_.8_22 + { VF, ... };

E.g. Consider the total iterations N = 6, the VF = 4.
Since SELECT_VL output is defined as not always to be VF in non-final iteration
which needs to depend on hardware implementation.

Suppose we have a RVV CPU core with vsetvl doing even distribution workload 
optimization.
It may process 3 elements at the 1st iteration and 3 elements at the last 
iteration.
Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 
4], ... }; 
is wrong which is adding VF, which is 4, actually, we didn't process 4 elements.

It should be adding 3 elements which is the result of SELECT_VL.
So, here the correct IR should be:

  _36 = .SELECT_VL (ivtmp_34, VF);
  _22 = (int) _36;
  vect_cst__21 = [vec_duplicate_expr] _22;

2. This issue only happens on non-SLP vectorization single rgroup since:
   
 if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
{
  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
  if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
  OPTIMIZE_FOR_SPEED)
  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
  && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
  && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
}

3. This issue doesn't appears on nested loop no matter 
LOOP_VINFO_USING_SELECT_VL_P is true or false.

Since:

  # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
  # vect_diff_15.7_20 = PHI 
  _19 = vect_vec_iv_.6_5 + { 1, ... };
  vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, 
vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
  ivtmp_1 = ivtmp_4 + 4294967295;
  
   [local count: 6549826]:
  # vect_diff_18.5_11 = PHI 
  # ivtmp_26 = PHI 
  _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
  goto ; [100.00%]

Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 
4], ... }; update induction variable
independent on VF (or don't care about how many elements are processed in the 
iteration).

The update is loop invariant. So it won't be the problem even if 
LOOP_VINFO_USING_SELECT_VL_P is true.
   
Testing passed, Ok for trunk ?

PR tree-optimization/112438

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_induction):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112438.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/pr112438.c   | 33 +++
 gcc/tree-vect-loop.cc | 30 -
 2 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
new file mode 100644
index 000..51f90df38a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model 
-ffast-math -fdump-tree-optimized-details" } */
+
+void
+foo (int n, int *__restrict in, int *__restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo2 (int n, float * __restrict in, 
+float * __restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo3 (int n, float * __restrict in, 
+float * __restrict out, float x)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i* i;
+}
+}
+
+/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8abc1937d74..b152072c969 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10306,10 +10306,36 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 
 
   /* Create the vector that holds the step of the induction.  */
+  gimple_stmt_iterator *step_iv_si = NULL;
   if (nested_in_vect_loop)
 /* iv_loop is nested in the loop to be vectorized. Generate:
vec_step = [S, S, S, S]  */
 new_name = step_expr;
+  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+{
+  /* When we're using loop_len produced by SELEC_VL, the non-final
+iterations are not always processing VF elements.  So vectorize
+induction variable instead of
+
+  _21 = vect_vec_iv_.6_22 + { VF, ... };
+
+We should generate:
+
+  _35 = .SELECT_VL (ivtmp_33, VF);
+  vect_cst__22 = [vec_duplicate_expr] _35;
+  _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
+  gcc_assert (!slp_node);
+  gimp

Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

2023-11-10 Thread Prathamesh Kulkarni

On Mon, 6 Nov 2023 at 15:50, Tamar Christina  wrote:
>
> Hi All,
>
> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> canonical and allows a target to expand this sequence efficiently.  Such
> sequences are common in scientific code working with gradients.
>
> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> which I remove since this is a less efficient form.  The testsuite is also
> updated in light of this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Hi Tamar,
It seems the patch caused following regressions on arm:

Running gcc:gcc.dg/dg.exp ...
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized "ABS_EXPR" 1

Running gcc:gcc.dg/tree-ssa/tree-ssa.exp ...
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= -" 1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= .COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= ABS_EXPR" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = -" 4
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = ABS_EXPR <" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = \\.COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized "ABS" 1
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple ".COPYSIGN" 4
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple "ABS" 4
FAIL: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"
Link to log files:
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-build/1240/artifact/artifacts/00-sumfiles/

Even for following test-case:
double g (double a)
{
  double t1 = fabs (a);
  double t2 = -t1;
  return t2;
}

It seems, the pattern gets applied but doesn't get eventually
simplified to copysign(a, -1).
forwprop dump shows:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
double g (double a)
{
  double t2;
  double t1;

   :
  t1_2 = ABS_EXPR ;
  t2_3 = -t1_2;
  return t2_3;

}

while on x86_64:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
gimple_simplified to t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
Removing dead stmt:t1_2 = ABS_EXPR ;
double g (double a)
{
  double t2;
  double t1;

   :
  t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
  return t2_3;

}

Thanks,
Prathamesh


>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/109154
> * match.pd: Add new neg+abs rule, remove inverse copysign rule.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/109154
> * gcc.dg/fold-copysign-1.c: Updated.
> * gcc.dg/pr55152-2.c: Updated.
> * gcc.dg/tree-ssa/abs-4.c: Updated.
> * gcc.dg/tree-ssa/backprop-6.c: Updated.
> * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> * gcc.target/aarch64/fneg-abs_1.c: New test.
> * gcc.target/aarch64/fneg-abs_2.c: New test.
> * gcc.target/aarch64/fneg-abs_3.c: New test.
> * gcc.target/aarch64/fneg-abs_4.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> db95931df0672cf4ef08cca36085c3aa6831519e..7a023d510c283c43a87b1795a74761b8af979b53
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1106,13 +1106,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (hypots @0 (copysigns @1 @2))
> (hypots @0 @1
>
> -/* copysign(x, CST) -> [-]abs (x).  */
> -(for copysigns (COPYSIGN_ALL)
> - (simplify
> -  (copysigns @0 REAL_CST@1)
> -  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (negate (abs @0))
> -   (abs @0
> +/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
> +
> +(simplify
> + (negate (abs @0))
> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>
>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> index 
> f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
>  100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> @@ -12,5 +12,5 @@ double bar (double x)
>return __builtin_copysign (x, minuszero);
>  }
>
> -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
> diff --git a/gcc/testsuite/gcc

RE: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-11-10 Thread Kyrylo Tkachov




> -Original Message-
> From: Wilco Dijkstra 
> Sent: Friday, November 10, 2023 10:23 AM
> To: Kyrylo Tkachov ; GCC Patches  patc...@gcc.gnu.org>; Richard Sandiford 
> Subject: Re: [PATCH] libatomic: Improve ifunc selection on AArch64
> 
> Hi Kyrill,
> 
> > +  if (!(hwcap & HWCAP_CPUID))
> > +return false;
> > +
> > +  unsigned long midr;
> > +  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
> 
> > From what I recall that midr_el1 register is emulated by the kernel and so
> userspace software
> > has to check that the kernel supports that emulation through hwcaps before
> reading it.
> > According to https://www.kernel.org/doc/html/v5.8/arm64/cpu-feature-
> registers.html you
> > need to check getauxval(AT_HWCAP) & HWCAP_CPUID) before doing that
> read.
> 
> That's why I do that immediately before reading midr_el1 - see above.

Errr, yes. Obviously I wasn't fully awake when I looked at it!
Sorry for the noise.
Ok for trunk then.
Kyrill

> 
> Cheers,
> Wilco

Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

2023-11-10 Thread Tamar Christina


Hi Prathamesh,

Yes Arm requires SIMD for copysign. The testcases fail because they don't turn 
on Neon.

I'll update them.

Regards,
Tamar

From: Prathamesh Kulkarni 
Sent: Friday, November 10, 2023 12:24 PM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org ; nd ; 
rguent...@suse.de ; j...@ventanamicro.com 

Subject: Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to 
copysign (x, -1) [PR109154]

On Mon, 6 Nov 2023 at 15:50, Tamar Christina  wrote:
>
> Hi All,
>
> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> canonical and allows a target to expand this sequence efficiently.  Such
> sequences are common in scientific code working with gradients.
>
> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> which I remove since this is a less efficient form.  The testsuite is also
> updated in light of this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Hi Tamar,
It seems the patch caused following regressions on arm:

Running gcc:gcc.dg/dg.exp ...
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized "ABS_EXPR" 1

Running gcc:gcc.dg/tree-ssa/tree-ssa.exp ...
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= -" 1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= .COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= ABS_EXPR" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = -" 4
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = ABS_EXPR <" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = \\.COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized "ABS" 1
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple ".COPYSIGN" 4
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple "ABS" 4
FAIL: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"
Link to log files:
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-build/1240/artifact/artifacts/00-sumfiles/

Even for following test-case:
double g (double a)
{
  double t1 = fabs (a);
  double t2 = -t1;
  return t2;
}

It seems, the pattern gets applied but doesn't get eventually
simplified to copysign(a, -1).
forwprop dump shows:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
double g (double a)
{
  double t2;
  double t1;

   :
  t1_2 = ABS_EXPR ;
  t2_3 = -t1_2;
  return t2_3;

}

while on x86_64:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
gimple_simplified to t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
Removing dead stmt:t1_2 = ABS_EXPR ;
double g (double a)
{
  double t2;
  double t1;

   :
  t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
  return t2_3;

}

Thanks,
Prathamesh


>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/109154
> * match.pd: Add new neg+abs rule, remove inverse copysign rule.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/109154
> * gcc.dg/fold-copysign-1.c: Updated.
> * gcc.dg/pr55152-2.c: Updated.
> * gcc.dg/tree-ssa/abs-4.c: Updated.
> * gcc.dg/tree-ssa/backprop-6.c: Updated.
> * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> * gcc.target/aarch64/fneg-abs_1.c: New test.
> * gcc.target/aarch64/fneg-abs_2.c: New test.
> * gcc.target/aarch64/fneg-abs_3.c: New test.
> * gcc.target/aarch64/fneg-abs_4.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> db95931df0672cf4ef08cca36085c3aa6831519e..7a023d510c283c43a87b1795a74761b8af979b53
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1106,13 +1106,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (hypots @0 (copysigns @1 @2))
> (hypots @0 @1
>
> -/* copysign(x, CST) -> [-]abs (x).  */
> -(for copysigns (COPYSIGN_ALL)
> - (simplify
> -  (copysigns @0 REAL_CST@1)
> -  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (negate (abs @0))
> -   (abs @0
> +/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
> +
> +(simplify
> + (negate (abs @0))
> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>
>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> index 
> f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
>  100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc

Re: [PATCH] RISC-V: Add combine optimization by slideup for vec_init vectorization

2023-11-10 Thread Robin Dapp

Hi Juzhe,

LGTM.  The test patterns are a bit unwieldy but not a blocker
IMHO.  Could probably done shorter using macro magic?

Regards
 Robin

[PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-10 Thread Robin Dapp

Hi,

this patch fixes several more FAILs that would only show up in 32-bit runs.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c:
Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
Ditto.
---
 .../riscv/rvv/autovec/binop/vmul-zvfh-run.c   | 34 -
 .../riscv/rvv/autovec/binop/vsub-zvfh-run.c   | 72 +--
 .../autovec/cond/cond_narrow_shift_run-3.c|  2 +-
 .../riscv/rvv/autovec/cond/pr111401.c |  2 +-
 .../autovec/conversions/vfcvt-itof-zvfh-run.c |  4 +-
 .../autovec/conversions/vfcvt_rtz-zvfh-run.c  |  4 +-
 .../conversions/vfncvt-ftoi-zvfh-run.c| 18 ++---
 .../conversions/vfncvt-itof-template.h| 36 ++
 .../conversions/vfncvt-itof-zvfh-run.c| 31 
 .../rvv/autovec/conversions/vfncvt-zvfh-run.c |  4 +-
 .../conversions/vfwcvt-ftoi-zvfh-run.c| 10 +--
 .../conversions/vfwcvt-itof-zvfh-run.c|  4 +-
 .../rvv/autovec/conversions/vfwcvt-zvfh-run.c | 40 +--
 .../riscv/rvv/autovec/slp-mask-run-1.c|  2 +-
 .../rvv/autovec/ternop/ternop_run_zvfh-1.c|  4 +-
 .../rvv/autovec/ternop/ternop_run_zvfh-10.c   |  4 +-
 .../rvv/autovec/ternop/ternop_run_zvfh-11.c   | 50 ++---
 .../rvv/autovec/ternop/ternop_run_zvfh-12.c   | 49 ++---
 .../rvv/autovec/ternop/ternop_run_zvfh-2.c| 24 ---
 .../rvv/autovec/ternop/ternop_run_zvfh-3.c| 21 +++---
 .../rvv/autovec/ternop/ternop_run_zvfh-4.c|  4 +-
 .../rvv/autovec/ternop/ternop_run_zvfh-5.c| 50 ++---
 .../rvv/autovec/ternop/ternop_run_zvfh-6.c| 50 ++---
 .../rvv/autovec/ternop/ternop_run_zvfh-7.c|  4 +-
 .../rvv/autovec/ternop/ternop_run_zvfh-8.c| 21 +++---
 .../rvv/autovec/ternop/ternop_run_zvfh-9.c| 22 +++---
 .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 30 
 .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   |  2 +-
 .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   |  2 +-
 .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 24 ++-
 .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 34 -
 .../autovec/vls-vlmax/vec_extract-zvfh-run.c  |  4 +-
 .../rvv/autovec/vls-vlmax/vec_set-zvfh-run.c  |  4 +-
 33 files changed, 335 insertions(+), 331 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c
index a4271810e58..1082695c5de 100644
--- a/gcc/testsuite/gcc.target/risc

Re: [PATCH] RISC-V: Fix bug that XTheadMemPair extension caused fcsr not to be saved and restored before and after interrupt.

2023-11-10 Thread Christoph Müllner

On Fri, Nov 10, 2023 at 8:14 AM Jin Ma  wrote:
>
> The t0 register is used as a temporary register for interrupts, so it needs
> special treatment. It is necessary to avoid using "th.ldd" in the interrupt
> program to stop the subsequent operation of the t0 register, so they need to
> exchange positions in the function "riscv_for_each_saved_reg".

RISCV_PROLOGUE_TEMP_REGNUM needs indeed to be treated special
in case of ISRs and fcsr. This patch just moves the TARGET_XTHEADMEMPAIR
block after the ISR/fcsr block.

Reviewed-by: Christoph Müllner 

>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the 
> interrupt
> operation before the XTheadMemPair.
> ---
>  gcc/config/riscv/riscv.cc | 56 +--
>  .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
>  2 files changed, 46 insertions(+), 28 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index e25692b86fc..fa2d4d4b779 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
> riscv_save_restore_fn fn,
>   && riscv_is_eh_return_data_register (regno))
> continue;
>
> +  /* In an interrupt function, save and restore some necessary CSRs in 
> the stack
> +to avoid changes in CSRs.  */
> +  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> + && cfun->machine->interrupt_handler_p
> + && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> + || (TARGET_ZFINX
> + && (cfun->machine->frame.mask & ~(1 << 
> RISCV_PROLOGUE_TEMP_REGNUM)
> +   {
> + unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> + if (!epilogue)
> +   {
> + riscv_save_restore_reg (word_mode, regno, offset, fn);
> + offset -= fcsr_size;
> + emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> + offset, riscv_save_reg);
> +   }
> + else
> +   {
> + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> + offset - fcsr_size, riscv_restore_reg);
> + emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> + riscv_save_restore_reg (word_mode, regno, offset, fn);
> + offset -= fcsr_size;
> +   }
> + continue;
> +   }
> +
>if (TARGET_XTHEADMEMPAIR)
> {
>   /* Get the next reg/offset pair.  */
> @@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
> riscv_save_restore_fn fn,
> }
> }
>
> -  /* In an interrupt function, save and restore some necessary CSRs in 
> the stack
> -to avoid changes in CSRs.  */
> -  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> - && cfun->machine->interrupt_handler_p
> - && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> - || (TARGET_ZFINX
> - && (cfun->machine->frame.mask & ~(1 << 
> RISCV_PROLOGUE_TEMP_REGNUM)
> -   {
> - unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> - if (!epilogue)
> -   {
> - riscv_save_restore_reg (word_mode, regno, offset, fn);
> - offset -= fcsr_size;
> - emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> - offset, riscv_save_reg);
> -   }
> - else
> -   {
> - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> - offset - fcsr_size, riscv_restore_reg);
> - emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> - riscv_save_restore_reg (word_mode, regno, offset, fn);
> - offset -= fcsr_size;
> -   }
> - continue;
> -   }
> -
>riscv_save_restore_reg (word_mode, regno, offset, fn);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> new file mode 100644
> index 000..d06f05f5c7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> @@ -0,0 +1,18 @@
> +/* Verify that fcsr instructions emitted.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
> +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906 
> -funwind-tables" { target { rv64 } } } */
> +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906 
> -funwind-tables" { target { rv32 } } } */

Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, Tamar Christina wrote:

> 
> Hi Prathamesh,
> 
> Yes Arm requires SIMD for copysign. The testcases fail because they don't 
> turn on Neon.
> 
> I'll update them.

On x86_64 with -m32 I see

FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized "ABS_EXPR" 1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= ABS_EXPR" 
1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= -" 1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= .COPYSIGN" 
2
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop 
"Deleting[^n]* = -" 4
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop 
"Deleting[^n]* = .COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop 
"Deleting[^n]* = ABS_EXPR <" 1
FAIL: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"

maybe add a copysign effective target?

> Regards,
> Tamar
> 
> From: Prathamesh Kulkarni 
> Sent: Friday, November 10, 2023 12:24 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org ; nd ; 
> rguent...@suse.de ; j...@ventanamicro.com 
> 
> Subject: Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to 
> copysign (x, -1) [PR109154]
> 
> On Mon, 6 Nov 2023 at 15:50, Tamar Christina  wrote:
> >
> > Hi All,
> >
> > This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> > canonical and allows a target to expand this sequence efficiently.  Such
> > sequences are common in scientific code working with gradients.
> >
> > There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> > which I remove since this is a less efficient form.  The testsuite is also
> > updated in light of this.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Hi Tamar,
> It seems the patch caused following regressions on arm:
> 
> Running gcc:gcc.dg/dg.exp ...
> FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
> FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized "ABS_EXPR" 1
> 
> Running gcc:gcc.dg/tree-ssa/tree-ssa.exp ...
> FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= -" 1
> FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= .COPYSIGN" 2
> FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= ABS_EXPR" 1
> FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
> "Deleting[^\\n]* = -" 4
> FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
> "Deleting[^\\n]* = ABS_EXPR <" 1
> FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
> "Deleting[^\\n]* = \\.COPYSIGN" 2
> FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized 
> ".COPYSIGN" 1
> FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized "ABS" 1
> FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple ".COPYSIGN" 4
> FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple "ABS" 4
> FAIL: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"
> Link to log files:
> https://ci.linaro.org/job/tcwg_gcc_check--master-arm-build/1240/artifact/artifacts/00-sumfiles/
> 
> Even for following test-case:
> double g (double a)
> {
>   double t1 = fabs (a);
>   double t2 = -t1;
>   return t2;
> }
> 
> It seems, the pattern gets applied but doesn't get eventually
> simplified to copysign(a, -1).
> forwprop dump shows:
> Applying pattern match.pd:1131, gimple-match-4.cc:4134
> double g (double a)
> {
>   double t2;
>   double t1;
> 
>:
>   t1_2 = ABS_EXPR ;
>   t2_3 = -t1_2;
>   return t2_3;
> 
> }
> 
> while on x86_64:
> Applying pattern match.pd:1131, gimple-match-4.cc:4134
> gimple_simplified to t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
> Removing dead stmt:t1_2 = ABS_EXPR ;
> double g (double a)
> {
>   double t2;
>   double t1;
> 
>:
>   t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
>   return t2_3;
> 
> }
> 
> Thanks,
> Prathamesh
> 
> 
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/109154
> > * match.pd: Add new neg+abs rule, remove inverse copysign rule.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/109154
> > * gcc.dg/fold-copysign-1.c: Updated.
> > * gcc.dg/pr55152-2.c: Updated.
> > * gcc.dg/tree-ssa/abs-4.c: Updated.
> > * gcc.dg/tree-ssa/backprop-6.c: Updated.
> > * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> > * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> > * gcc.target/aarch64/fneg-abs_1.c: New test.
> > * gcc.target/aarch64/fneg-abs_2.c: New test.
> > * gcc.target/aarch64/fneg-abs_3.c: New test.
> > * gcc.target/aarch64/fneg-abs_4.c: New test.
> > * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> > * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> > * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> > * gcc.target/

[PATCH] tree-optimization/110221 - SLP and loop mask/len

2023-11-10 Thread Richard Biener

The following fixes the issue that when SLP stmts are internal defs
but appear invariant because they end up only using invariant defs
then they get scheduled outside of the loop.  This nice optimization
breaks down when loop masks or lens are applied since those are not
explicitly tracked as dependences.  The following makes sure to never
schedule internal defs outside of the vectorized loop when the
loop uses masks/lens.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110221
* tree-vect-slp.cc (vect_schedule_slp_node): When loop
masking / len is applied make sure to not schedule
intenal defs outside of the loop.

* gfortran.dg/pr110221.f: New testcase.
---
 gcc/testsuite/gfortran.dg/pr110221.f | 17 +
 gcc/tree-vect-slp.cc | 10 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f

diff --git a/gcc/testsuite/gfortran.dg/pr110221.f 
b/gcc/testsuite/gfortran.dg/pr110221.f
new file mode 100644
index 000..8b57384313a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr110221.f
@@ -0,0 +1,17 @@
+C PR middle-end/68146
+C { dg-do compile }
+C { dg-options "-O2 -w" }
+C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { 
target avx512f } }
+  SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY)
+  IMPLICIT DOUBLE PRECISION (A,B,G,O-Y)
+  IMPLICIT COMPLEX*16 (C,Z)
+  DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*)
+  N=INT(V)
+  CALL GAMMA2(VG,GA)
+  DO 65 K=1,N
+CBY(K)=CYY
+65CONTINUE
+  CDJ(0)=V0/Z*CBJ(0)-CBJ(1)
+  DO 70 K=1,N
+70  CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1)
+  END
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3e5814c3a31..80e279d8f50 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo,
   /* Emit other stmts after the children vectorized defs which is
 earliest possible.  */
   gimple *last_stmt = NULL;
+  if (auto loop_vinfo = dyn_cast  (vinfo))
+   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+   /* But avoid scheduling internal defs outside of the loop when
+  we might have only implicitly tracked loop mask/len defs.  */
+   gimple_stmt_iterator si
+ = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
+   last_stmt = *si;
+ }
   bool seen_vector_def = false;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
-- 
2.35.3

Re: [PATCH] RISC-V: Fix bug that XTheadMemPair extension caused fcsr not to be saved and restored before and after interrupt.

2023-11-10 Thread Kito Cheng

LGTM

Christoph Müllner 於 2023年11月10日 週五，20:55寫道：

> On Fri, Nov 10, 2023 at 8:14 AM Jin Ma  wrote:
> >
> > The t0 register is used as a temporary register for interrupts, so it
> needs
> > special treatment. It is necessary to avoid using "th.ldd" in the
> interrupt
> > program to stop the subsequent operation of the t0 register, so they
> need to
> > exchange positions in the function "riscv_for_each_saved_reg".
>
> RISCV_PROLOGUE_TEMP_REGNUM needs indeed to be treated special
> in case of ISRs and fcsr. This patch just moves the TARGET_XTHEADMEMPAIR
> block after the ISR/fcsr block.
>
> Reviewed-by: Christoph Müllner 
>
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the
> interrupt
> > operation before the XTheadMemPair.
> > ---
> >  gcc/config/riscv/riscv.cc | 56 +--
> >  .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
> >  2 files changed, 46 insertions(+), 28 deletions(-)
> >  create mode 100644
> gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index e25692b86fc..fa2d4d4b779 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
> >   && riscv_is_eh_return_data_register (regno))
> > continue;
> >
> > +  /* In an interrupt function, save and restore some necessary CSRs
> in the stack
> > +to avoid changes in CSRs.  */
> > +  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> > + && cfun->machine->interrupt_handler_p
> > + && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> > + || (TARGET_ZFINX
> > + && (cfun->machine->frame.mask & ~(1 <<
> RISCV_PROLOGUE_TEMP_REGNUM)
> > +   {
> > + unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> > + if (!epilogue)
> > +   {
> > + riscv_save_restore_reg (word_mode, regno, offset, fn);
> > + offset -= fcsr_size;
> > + emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> > + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> > + offset, riscv_save_reg);
> > +   }
> > + else
> > +   {
> > + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> > + offset - fcsr_size,
> riscv_restore_reg);
> > + emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> > + riscv_save_restore_reg (word_mode, regno, offset, fn);
> > + offset -= fcsr_size;
> > +   }
> > + continue;
> > +   }
> > +
> >if (TARGET_XTHEADMEMPAIR)
> > {
> >   /* Get the next reg/offset pair.  */
> > @@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
> > }
> > }
> >
> > -  /* In an interrupt function, save and restore some necessary CSRs
> in the stack
> > -to avoid changes in CSRs.  */
> > -  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> > - && cfun->machine->interrupt_handler_p
> > - && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> > - || (TARGET_ZFINX
> > - && (cfun->machine->frame.mask & ~(1 <<
> RISCV_PROLOGUE_TEMP_REGNUM)
> > -   {
> > - unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> > - if (!epilogue)
> > -   {
> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
> > - offset -= fcsr_size;
> > - emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> > - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> > - offset, riscv_save_reg);
> > -   }
> > - else
> > -   {
> > - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> > - offset - fcsr_size,
> riscv_restore_reg);
> > - emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
> > - offset -= fcsr_size;
> > -   }
> > - continue;
> > -   }
> > -
> >riscv_save_restore_reg (word_mode, regno, offset, fn);
> >  }
> >
> > diff --git
> a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> > new file mode 100644
> > index 000..d06f05f5c7c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> > @@ -0,0 +1,18 @@
> > +/* Verify that fcsr instructions emitted.  */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target hard_float } */
> > +/* { dg-skip-if "" { *-*-* } {

Re: Re: [PATCH] RISC-V: Add combine optimization by slideup for vec_init vectorization

2023-11-10 Thread 钟居哲

Thanks. Robin. Committed.

>> The test patterns are a bit unwieldy but not a blocker
>>IMHO.  Could probably done shorter using macro magic?
I have no idea. But I think we can revisit it and refine tests when we have 
time.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-10 20:47
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add combine optimization by slideup for vec_init 
vectorization
Hi Juzhe,
 
LGTM.  The test patterns are a bit unwieldy but not a blocker
IMHO.  Could probably done shorter using macro magic?
 
Regards
Robin

Re: [PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 12:01 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 5 Oct 2023 at 00:00, Brendan Shanks  wrote:
> >
> > Hi,
> >
> > This patch implements pex_unix_exec_child using posix_spawn when
> > available.
> >
> > This should especially benefit recent macOS (where vfork just calls
> > fork), but should have equivalent or faster performance on all
> > platforms.
> > In addition, the implementation is substantially simpler than the
> > vfork+exec code path.
> >
> > Tested on x86_64-linux.
> Hi Brendan,
> It seems this patch caused the following regressions on aarch64:
>
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17  at line 3 (test for
> errors, line )
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17 (test for excess errors)
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a  at line 3 (test for
> errors, line )
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a (test for excess errors)
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b  at line 3 (test for
> errors, line )
> FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b (test for excess errors)
>
> Looking at g++.log:
> /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:
> error: failed posix_spawnp mapper 'this-will-not-work'
> In module imported at
> /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:2:1:
> unique1.bob: error: failed to read compiled module: No such file or directory
> unique1.bob: note: compiled module file is 'gcm.cache/unique1.bob.gcm'
> unique1.bob: note: imports must be built before being imported
> unique1.bob: fatal error: returning to the gate for a mechanical issue
> compilation terminated.
>
> Link to log files:
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1159/artifact/artifacts/00-sumfiles/
> Could you please investigate ?

The testcase needs adjustment, it looks for

// { dg-error "-:failed (exec|CreateProcess).*mapper.*
.*this-will-not-work" "" { target { ! { *-*-darwin[89]* *-*-darwin10*
} } } 0 }

adding |posix_spawnp probably works

>
> Thanks,
> Prathamesh
> >
> > v2: Fix error handling (previously the function would be run twice in
> > case of error), and don't use a macro that changes control flow.
> >
> > v3: Match file style for error-handling blocks, don't close
> > in/out/errdes on error, and check close() for errors.
> >
> > libiberty/
> > * configure.ac (AC_CHECK_HEADERS): Add spawn.h.
> > (checkfuncs): Add posix_spawn, posix_spawnp.
> > (AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
> > * configure, config.in: Rebuild.
> > * pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.
> >
> > Signed-off-by: Brendan Shanks 
> > ---
> >  libiberty/configure.ac |   8 +-
> >  libiberty/pex-unix.c   | 168 +
> >  2 files changed, 173 insertions(+), 3 deletions(-)
> >
> > diff --git a/libiberty/configure.ac b/libiberty/configure.ac
> > index 0748c592704..2488b031bc8 100644
> > --- a/libiberty/configure.ac
> > +++ b/libiberty/configure.ac
> > @@ -289,7 +289,7 @@ AC_SUBST_FILE(host_makefile_frag)
> >  # It's OK to check for header files.  Although the compiler may not be
> >  # able to link anything, it had better be able to at least compile
> >  # something.
> > -AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h 
> > string.h unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h 
> > sys/mman.h fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h 
> > machine/hal_sysinfo.h sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h 
> > stdio_ext.h process.h sys/prctl.h)
> > +AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h 
> > string.h unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h 
> > sys/mman.h fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h 
> > machine/hal_sysinfo.h sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h 
> > stdio_ext.h process.h sys/prctl.h spawn.h)
> >  AC_HEADER_SYS_WAIT
> >  AC_HEADER_TIME
> >
> > @@ -412,7 +412,8 @@ funcs="$funcs setproctitle"
> >  vars="sys_errlist sys_nerr sys_siglist"
> >
> >  checkfuncs="__fsetlocking canonicalize_file_name dup3 getrlimit getrusage \
> > - getsysinfo gettimeofday on_exit pipe2 psignal pstat_getdynamic 
> > pstat_getstatic \
> > + getsysinfo gettimeofday on_exit pipe2 posix_spawn posix_spawnp psignal \
> > + pstat_getdynamic pstat_getstatic \
> >   realpath setrlimit spawnve spawnvpe strerror strsignal sysconf sysctl \
> >   sysmp table times wait3 wait4"
> >
> > @@ -435,7 +436,8 @@ if test "x" = "y"; then
> >  index insque \
> >  memchr memcmp memcpy memmem memmove memset mkstemps \
> >  on_exit \
> > -pipe2 psignal pstat_getdynamic pstat_getstatic putenv \
> > +pipe2 posix_spawn posix_spawnp psignal \
> > +pstat_getdynamic pstat_getstatic putenv \
> >  random realpath rename rindex \
> >  sbrk setenv setproctitle setrlimit sig

Re: [PATCH] RISC-V: Fix bug that XTheadMemPair extension caused fcsr not to be saved and restored before and after interrupt.

2023-11-10 Thread Christoph Müllner

On Fri, Nov 10, 2023 at 2:20 PM Kito Cheng  wrote:
>
> LGTM

Committed after shortening the commit message's heading.

>
> Christoph Müllner 於 2023年11月10日 週五，20:55寫道：
>>
>> On Fri, Nov 10, 2023 at 8:14 AM Jin Ma  wrote:
>> >
>> > The t0 register is used as a temporary register for interrupts, so it needs
>> > special treatment. It is necessary to avoid using "th.ldd" in the interrupt
>> > program to stop the subsequent operation of the t0 register, so they need 
>> > to
>> > exchange positions in the function "riscv_for_each_saved_reg".
>>
>> RISCV_PROLOGUE_TEMP_REGNUM needs indeed to be treated special
>> in case of ISRs and fcsr. This patch just moves the TARGET_XTHEADMEMPAIR
>> block after the ISR/fcsr block.
>>
>> Reviewed-by: Christoph Müllner 
>>
>> >
>> > gcc/ChangeLog:
>> >
>> > * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the 
>> > interrupt
>> > operation before the XTheadMemPair.
>> > ---
>> >  gcc/config/riscv/riscv.cc | 56 +--
>> >  .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
>> >  2 files changed, 46 insertions(+), 28 deletions(-)
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
>> >
>> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> > index e25692b86fc..fa2d4d4b779 100644
>> > --- a/gcc/config/riscv/riscv.cc
>> > +++ b/gcc/config/riscv/riscv.cc
>> > @@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
>> > riscv_save_restore_fn fn,
>> >   && riscv_is_eh_return_data_register (regno))
>> > continue;
>> >
>> > +  /* In an interrupt function, save and restore some necessary CSRs 
>> > in the stack
>> > +to avoid changes in CSRs.  */
>> > +  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
>> > + && cfun->machine->interrupt_handler_p
>> > + && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
>> > + || (TARGET_ZFINX
>> > + && (cfun->machine->frame.mask & ~(1 << 
>> > RISCV_PROLOGUE_TEMP_REGNUM)
>> > +   {
>> > + unsigned int fcsr_size = GET_MODE_SIZE (SImode);
>> > + if (!epilogue)
>> > +   {
>> > + riscv_save_restore_reg (word_mode, regno, offset, fn);
>> > + offset -= fcsr_size;
>> > + emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
>> > + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
>> > + offset, riscv_save_reg);
>> > +   }
>> > + else
>> > +   {
>> > + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
>> > + offset - fcsr_size, 
>> > riscv_restore_reg);
>> > + emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
>> > + riscv_save_restore_reg (word_mode, regno, offset, fn);
>> > + offset -= fcsr_size;
>> > +   }
>> > + continue;
>> > +   }
>> > +
>> >if (TARGET_XTHEADMEMPAIR)
>> > {
>> >   /* Get the next reg/offset pair.  */
>> > @@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
>> > riscv_save_restore_fn fn,
>> > }
>> > }
>> >
>> > -  /* In an interrupt function, save and restore some necessary CSRs 
>> > in the stack
>> > -to avoid changes in CSRs.  */
>> > -  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
>> > - && cfun->machine->interrupt_handler_p
>> > - && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
>> > - || (TARGET_ZFINX
>> > - && (cfun->machine->frame.mask & ~(1 << 
>> > RISCV_PROLOGUE_TEMP_REGNUM)
>> > -   {
>> > - unsigned int fcsr_size = GET_MODE_SIZE (SImode);
>> > - if (!epilogue)
>> > -   {
>> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
>> > - offset -= fcsr_size;
>> > - emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
>> > - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
>> > - offset, riscv_save_reg);
>> > -   }
>> > - else
>> > -   {
>> > - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
>> > - offset - fcsr_size, 
>> > riscv_restore_reg);
>> > - emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
>> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
>> > - offset -= fcsr_size;
>> > -   }
>> > - continue;
>> > -   }
>> > -
>> >riscv_save_restore_reg (word_mode, regno, offset, fn);
>> >  }
>> >
>> > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c 
>> > b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
>> > new file mode 100644
>> > index 000..d06f05f5c7c
>> > --- /dev/null

Re: [PATCH] Handle constant CONSTRUCTORs in operand_compare

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 12:17 PM Eric Botcazou  wrote:
>
> Hi,
>
> this teaches operand_compare to compare constant CONSTRUCTORs, which is quite
> helpful for so-called fat pointers in Ada, i.e. objects that are semantically
> pointers but are represented by structures made up of two pointers.  This is
> modeled on the implementation present in the ICF pass.
>
> Bootstrapped/regtested on x86-64/Linux, OK for the mainline?

OK.

>
> 2023-11-10  Eric Botcazou  
>
> * fold-const.cc (operand_compare::operand_equal_p) :
> Deal with nonempty constant CONSTRUCTORs.
> (operand_compare::hash_operand) : Hash DECL_FIELD_OFFSET
> and DECL_FIELD_BIT_OFFSET for FIELD_DECLs.
>
>
> 2023-11-10  Eric Botcazou  
>
> * gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.
>
> --
> Eric Botcazou

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 12:25 PM Alexander Monakov  wrote:
>
>
> On Thu, 9 Nov 2023, Jeff Law wrote:
>
> > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > > operations.  When I started to work on this issue, initially I wanted to 
> > > try
> > > something similar to your idea #2, but when checking the APIs, I realized
> > > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > > scheduled range), skipping it can save some resource allocation (like 
> > > block
> > > dependencies) and make it more efficient (not enter function 
> > > schedule_block
> > > etc.), from this perspective it seems an enhancement.  Does it sound
> > > reasonable to you?
> > It sounds reasonable, but only if doing so doesn't add significant
> > implementation complexity.  ie, the gains from doing less work here are 
> > likely
> > to be very marginal, so I'm more interested in clean, easy to maintain code.
>
> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> design:
> DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> mutate them as needed when moving real insns across them.

Note that debug-only BBs do not exist - the BB would be there even without debug
insns!  So instead you have to handle BBs with just debug insns the same you
handle a completely empty BB.

> Cc'ing Alexandre Oliva who can correct me on that if necessary.
>
> Alexander

Re: [PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Richard Biener

On Fri, 10 Nov 2023, Juzhe-Zhong wrote:

> PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
> 
> 1. Since SELECT_VL result is not necessary always VF in non-final iteration.
> 
> Current GIMPLE IR is wrong:
> 
> # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
> ...
> _35 = .SELECT_VL (ivtmp_33, VF);
> _21 = vect_vec_iv_.8_22 + { VF, ... };
> 
> E.g. Consider the total iterations N = 6, the VF = 4.
> Since SELECT_VL output is defined as not always to be VF in non-final 
> iteration
> which needs to depend on hardware implementation.
> 
> Suppose we have a RVV CPU core with vsetvl doing even distribution workload 
> optimization.
> It may process 3 elements at the 1st iteration and 3 elements at the last 
> iteration.
> Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST 
> [4, 4], ... }; 
> is wrong which is adding VF, which is 4, actually, we didn't process 4 
> elements.
> 
> It should be adding 3 elements which is the result of SELECT_VL.
> So, here the correct IR should be:
> 
>   _36 = .SELECT_VL (ivtmp_34, VF);
>   _22 = (int) _36;
>   vect_cst__21 = [vec_duplicate_expr] _22;
> 
> 2. This issue only happens on non-SLP vectorization single rgroup since:
>
>  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> {
>   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>   if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
> OPTIMIZE_FOR_SPEED)
> && LOOP_VINFO_LENS (loop_vinfo).length () == 1
> && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
> && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
>   LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
> }
> 
> 3. This issue doesn't appears on nested loop no matter 
> LOOP_VINFO_USING_SELECT_VL_P is true or false.
> 
> Since:
> 
>   # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
>   # vect_diff_15.7_20 = PHI 
>   _19 = vect_vec_iv_.6_5 + { 1, ... };
>   vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, 
> vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
>   ivtmp_1 = ivtmp_4 + 4294967295;
>   
>[local count: 6549826]:
>   # vect_diff_18.5_11 = PHI 
>   # ivtmp_26 = PHI 
>   _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
>   goto ; [100.00%]
> 
> Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 
> 4], ... }; update induction variable
> independent on VF (or don't care about how many elements are processed in the 
> iteration).
> 
> The update is loop invariant. So it won't be the problem even if 
> LOOP_VINFO_USING_SELECT_VL_P is true.
>
> Testing passed, Ok for trunk ?

OK.

Richard.

>   PR tree-optimization/112438
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vectorizable_induction):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/pr112438.c: New test.
> 
> ---
>  .../gcc.target/riscv/rvv/autovec/pr112438.c   | 33 +++
>  gcc/tree-vect-loop.cc | 30 -
>  2 files changed, 62 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> new file mode 100644
> index 000..51f90df38a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model 
> -ffast-math -fdump-tree-optimized-details" } */
> +
> +void
> +foo (int n, int *__restrict in, int *__restrict out)
> +{
> +  for (int i = 0; i < n; i += 1)
> +{
> +  out[i] = in[i] + i;
> +}
> +}
> +
> +void
> +foo2 (int n, float * __restrict in, 
> +float * __restrict out)
> +{
> +  for (int i = 0; i < n; i += 1)
> +{
> +  out[i] = in[i] + i;
> +}
> +}
> +
> +void
> +foo3 (int n, float * __restrict in, 
> +float * __restrict out, float x)
> +{
> +  for (int i = 0; i < n; i += 1)
> +{
> +  out[i] = in[i] + i* i;
> +}
> +}
> +
> +/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }.  
> */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..b152072c969 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10306,10 +10306,36 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>  
>  
>/* Create the vector that holds the step of the induction.  */
> +  gimple_stmt_iterator *step_iv_si = NULL;
>if (nested_in_vect_loop)
>  /* iv_loop is nested in the loop to be vectorized. Generate:
> vec_step = [S, S, S, S]  */
>  new_name = step_expr;
> +  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> +{
> +  /* When we're using loop_len produced by SELEC_VL, the non-final
> +  iterations are not always processing VF elemen

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 11:10 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>
> 在 2023/11/10 17:06, Richard Biener 写道:
> > On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI  wrote:
> >>
> >> Hi Richard,
> >>   Thanks so much for your comments.
> >>
> >> 在 2023/11/9 19:41, Richard Biener 写道:
> >>> I'm not sure if the testcase is valid though?
> >>>
> >>> @defbuiltin{{void} __builtin_return (void *@var{result})}
> >>> This built-in function returns the value described by @var{result} from
> >>> the containing function.  You should specify, for @var{result}, a value
> >>> returned by @code{__builtin_apply}.
> >>> @enddefbuiltin
> >>>
> >>> I don't see __builtin_apply being used here?
> >>
> >> The prototype of the test case is from "__objc_block_forward" in
> >> libobjc/sendmsg.c.
> >>
> >>   void *args, *res;
> >>
> >>   args = __builtin_apply_args ();
> >>   res = __objc_forward (rcv, op, args);
> >>   if (res)
> >> __builtin_return (res);
> >>   else
> >> ...
> >>
> >> The __builtin_apply_args puts the return values on stack by the alignment.
> >> But the forward function can do anything and return a void* pointer.
> >> IMHO the alignment might be broken. So I just simplified it to use a
> >> void* pointer as the input argument of  "__builtin_return" and skip
> >> "__builtin_apply_args".
> >
> > But doesn't __objc_forward then break the contract between
> > __builtin_apply_args and __builtin_return?
> >
> > That said, __builtin_return is a very special function, it's not supposed
> > to deal with what you are fixing.  At least I think so.
> >
> > IMHO the bug is in __objc_block_forward.
>
> If so, can we document that the memory objects pointed by input argument of
> __builtin_return have to be aligned? Then we can force the alignment in
> __builtin_return. The customer function can do anything if gcc doesn't state
> that.

I don't think they have to be aligned - they have to adhere to the ABI
which __builtin_apply_args ensures.  But others might know more details
here.

> Thanks
> Gui Haochen
>
> >
> > Richard.
> >
> >>
> >> Thanks
> >> Gui Haochen

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 2:52 AM liuhongt  wrote:
>
> When I'm working on PR112443, I notice there's some misoptimizations: after we
> fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine it
> back to v{,p}blendv{v,ps,pd} since the pattern is too complicated, so I think
> maybe we should hanlde it in the gimple level.
>
> The dump is like
>
>   _1 = c_3(D) >= { 0, 0, 0, 0 };
>   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>   _7 = VIEW_CONVERT_EXPR(_2);
>   _8 = VIEW_CONVERT_EXPR(b_6(D));
>   _9 = VIEW_CONVERT_EXPR(a_5(D));
>   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _11 = VEC_COND_EXPR <_10, _8, _9>;
>
>
> It can be optimized to
>
>   _6 = VIEW_CONVERT_EXPR(b_4(D));
>   _7 = VIEW_CONVERT_EXPR(a_3(D));
>   _10 = VIEW_CONVERT_EXPR(c_1(D));
>   _5 = _10 >= { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _8 = VEC_COND_EXPR <_5, _6, _7>;
>   _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
>
> since _7 is either -1 or 0, _7 < 0 should is euqal to _1 = c_3(D) > { 0, 0, 
> 0, 0 };
> The patch add a gimple pattern to handle that.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * match.pd (VCE:(a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE:a cmp
> VCE:b) ? c : d): New gimple simplication.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vl-blendv-3.c: New test.
> * gcc.target/i386/blendv-3.c: New test.
> ---
>  gcc/match.pd  | 17 +++
>  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
>  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
>  3 files changed, 69 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index dbc811b2b38..e6f9c4fa1fd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5170,6 +5170,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
>(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))

Would be nice to have a comment here.

> +(for cmp (simple_comparison)
> + (simplify
> +  (vec_cond
> +(lt@4 (view_convert?@5 (vec_cond (cmp @0 @1)
> +integer_all_onesp
> +integer_zerop))
> + integer_zerop) @2 @3)
> +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> +   && TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@5))
> +   && VECTOR_TYPE_P (type))
> +   (with {
> +  tree itype = TREE_TYPE (@5);
> +  tree vbtype = TREE_TYPE (@4);}
> + (vec_cond (cmp:vbtype (view_convert:itype @0)
> +  (view_convert:itype @1)) @2 @3)

It looks like the outer vec_cond isn't actually relevant to the simplification?

 (lt (view_convert? (vec_cond (cmp @0 @1) integer_all_onesp
integer_zerop)) integer_zerop)

is the relevant part?  I wonder what canonicalizes the inner vec_cond?
 Did you ever see
the (view_convert ... missing?

> +
>  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
>  (simplify
>   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> new file mode 100644
> index 000..2777e72ab5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include "blendv-3.c"
> diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/blendv-3.c
> new file mode 100644
> index 000..fa0fb067a73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include 
> +
> +__m256i
> +foo (__m256i a, __m256i b, __m256i c)
> +{
> +  return _mm256_blendv_epi8 (a, b, ~c < 0);
> +}
> +
> +__m256d
> +foo1 (__m256d a, __m256d b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_pd (a, b, (__m256d)d);
> +}
> +
> +__m256
> +foo2 (__m256 a, __m256 b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_ps (a, b, (__m256)d);
> +}
> +
> +__m128i
> +foo4 (__m128i a, __m128i b, __m128i c)
> +{
> +  return _mm_blendv_epi8 (a, b, ~c < 0);
> +}
> +
> +__m128d
> +foo5 (__m128d a, __m128d b, __m128i c)
> +{
> +  __m128i d = ~c < 0;
> +  return _mm_blendv_pd (a, b, (__m128d)d);
> +}
> +
> +__m128
> +foo6 (_

[PATCH v2] RISC-V: Fixbug for that XTheadMemPair causes interrupt to fail.

2023-11-10 Thread Jin Ma

The t0 register is used as a temporary register for interrupts, so it needs
special treatment. It is necessary to avoid using "th.ldd" in the interrupt
program to stop the subsequent operation of the t0 register, so they need to
exchange positions in the function "riscv_for_each_saved_reg".

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
operation before the XTheadMemPair.
---
 gcc/config/riscv/riscv.cc | 56 +--
 .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
 2 files changed, 46 insertions(+), 28 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e25692b86fc..fa2d4d4b779 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
  && riscv_is_eh_return_data_register (regno))
continue;
 
+  /* In an interrupt function, save and restore some necessary CSRs in the 
stack
+to avoid changes in CSRs.  */
+  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
+ && cfun->machine->interrupt_handler_p
+ && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
+ || (TARGET_ZFINX
+ && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
+   {
+ unsigned int fcsr_size = GET_MODE_SIZE (SImode);
+ if (!epilogue)
+   {
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+ emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset, riscv_save_reg);
+   }
+ else
+   {
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset - fcsr_size, riscv_restore_reg);
+ emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+   }
+ continue;
+   }
+
   if (TARGET_XTHEADMEMPAIR)
{
  /* Get the next reg/offset pair.  */
@@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
}
}
 
-  /* In an interrupt function, save and restore some necessary CSRs in the 
stack
-to avoid changes in CSRs.  */
-  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
- && cfun->machine->interrupt_handler_p
- && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
- || (TARGET_ZFINX
- && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
-   {
- unsigned int fcsr_size = GET_MODE_SIZE (SImode);
- if (!epilogue)
-   {
- riscv_save_restore_reg (word_mode, regno, offset, fn);
- offset -= fcsr_size;
- emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
- riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
- offset, riscv_save_reg);
-   }
- else
-   {
- riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
- offset - fcsr_size, riscv_restore_reg);
- emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
- riscv_save_restore_reg (word_mode, regno, offset, fn);
- offset -= fcsr_size;
-   }
- continue;
-   }
-
   riscv_save_restore_reg (word_mode, regno, offset, fn);
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c 
b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
new file mode 100644
index 000..d06f05f5c7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
@@ -0,0 +1,18 @@
+/* Verify that fcsr instructions emitted.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
+/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906 
-funwind-tables" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906 
-funwind-tables" { target { rv32 } } } */
+
+
+extern int foo (void);
+
+void __attribute__ ((interrupt))
+sub (void)
+{
+  foo ();
+}
+
+/* { dg-final { scan-assembler-times "frcsr\t" 1 } } */
+/* { dg-final { scan-assembler-times "fscsr\t" 1 } } */

base-commit: e7f4040d9d6ec40c48ada940168885d7dde03af9
-- 
2.17.1

Re: [PATCH] tree-ssa-loop-ivopts : Add live analysis in regs used in decision making

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 7:42 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
>
> On 09/11/23 6:21 pm, Richard Biener wrote:
> > On Wed, Nov 8, 2023 at 4:00 PM Ajit Agarwal  wrote:
> >>
> >> tree-ssa-loop-ivopts : Add live analysis in regs used in decision making.
> >>
> >> Add live anaysis in regs used calculation in decision making of
> >> selecting ivopts candidates.
> >>
> >> 2023-11-08  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-ssa-loop-ivopts.cc (get_regs_used): New function.
> >> (determine_set_costs): Call to get_regs_used to use live
> >> analysis.
> >> ---
> >>  gcc/tree-ssa-loop-ivopts.cc | 73 +++--
> >>  1 file changed, 70 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> >> index c3336603778..e02fe7d434b 100644
> >> --- a/gcc/tree-ssa-loop-ivopts.cc
> >> +++ b/gcc/tree-ssa-loop-ivopts.cc
> >> @@ -6160,6 +6160,68 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
> >> *data, unsigned n_invs,
> >>return cost + n_cands;
> >>  }
> >>
> >> +/* Return regs used based on live-in and liveout of given ssa variables.  
> >> */
> >
> > Please explain how the following code relates to anything like "live
> > analysis" and
> > where it uses live-in and live-out.  And what "live-in/out of a given
> > SSA variable"
> > should be.
> >
> > Also explain why you are doing this at all.  The patch doesn't come
> > with a testcase
> > or with any other hint that motivated you.
> >
> > Richard.
> >
>
> The function get_regs_used increments the regs_used based on live-in
> and live-out analysis of given ssa name. Instead of setting live-in and
> live-out bitmap I increment the regs_used.
>
> Below is how I identify live-in and live-out and increments the regs_used
> variable:
>
> a) For a given def_bb of gimple statement of ssa name there should be
> live-out and increments the regs_used.
>
> b) Visit each use of SSA_NAME and if it isn't in the same block as the def,
>  we identify live on entry blocks and increments regs_used.
>
> The below function is the modification of set_var_live_on_entry of 
> tree-ssa-live.cc
> Where we set the bitmap of liveout and livein of basic block. Instead of 
> setting bitmap, regs_used is incremented.

It clearly doesn't work that way, and the number doesn't in any way relate to
the number of registers used or register pressure.

> I identify regs_used as the number of live-in and liveout of given ssa name 
> variable.
>
> For each iv candiate ssa variables I identify regs_used and take maximum of 
> regs
> used for all the iv candidates that will be used in 
> ivopts_estimate_register_pressure
> cost analysis.
>
> Motivation behind doing this optimization is I get good performance 
> improvement
> for several spec cpu 2017 benchmarks for FP and INT around 2% to 7%.

An interesting GIGO effect.

> Also setting regs_used as number of iv candiates, which is not
> optimized and robust way of decision making for ivopts optimization I decide
> on live-in and live-out analysis which is more correct and appropriate way of
> identifying regs_used.
>
> And also there are no regressions in bootstrapped/regtested on 
> powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> >> +static unsigned
> >> +get_regs_used (tree ssa_name)
> >> +{
> >> +  unsigned regs_used = 0;
> >> +  gimple *stmt;
> >> +  use_operand_p use;
> >> +  basic_block def_bb = NULL;
> >> +  imm_use_iterator imm_iter;
> >> +
> >> +  stmt = SSA_NAME_DEF_STMT (ssa_name);
> >> +  if (stmt)
> >> +{
> >> +  def_bb = gimple_bb (stmt);
> >> +  /* Mark defs in liveout bitmap temporarily.  */
> >> +  if (def_bb)
> >> +   regs_used++;
> >> +}
> >> +  else
> >> +def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
> >> +
> >> +  /* An undefined local variable does not need to be very alive.  */
> >> +  if (virtual_operand_p (ssa_name)
> >> +  || ssa_undefined_value_p (ssa_name, false))
> >> +return 0;
> >> +
> >> +  /* Visit each use of SSA_NAME and if it isn't in the same block as the 
> >> def,
> >> + add it to the list of live on entry blocks.  */
> >> +  FOR_EACH_IMM_USE_FAST (use, imm_iter, ssa_name)
> >> +{
> >> +  gimple *use_stmt = USE_STMT (use);
> >> +  basic_block add_block = NULL;
> >> +
> >> +  if (gimple_code (use_stmt) == GIMPLE_PHI)
> >> +   {
> >> + /* Uses in PHI's are considered to be live at exit of the SRC 
> >> block
> >> +as this is where a copy would be inserted.  Check to see if 
> >> it is
> >> +defined in that block, or whether its live on entry.  */
> >> + int index = PHI_ARG_INDEX_FROM_USE (use);
> >> + edge e = gimple_phi_arg_edge (as_a  (use_stmt), index);
> >> + if (e->src != def_bb)
> >> +   add_block = e->src;
> >> +   }
> >> +  else if (is_gimple_debug (use_stmt))
> >> +   continue;
> >> +  else
> >> +   {
> >> + /* If its not defi

Re: [PATCH v2] RISC-V: Fixbug for that XTheadMemPair causes interrupt to fail.

2023-11-10 Thread Kito Cheng

I thought Christoph was already committed? Do you mind describing the
difference between v1 and v2?

On Fri, Nov 10, 2023 at 9:55 PM Jin Ma  wrote:

> The t0 register is used as a temporary register for interrupts, so it needs
> special treatment. It is necessary to avoid using "th.ldd" in the interrupt
> program to stop the subsequent operation of the t0 register, so they need
> to
> exchange positions in the function "riscv_for_each_saved_reg".
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the
> interrupt
> operation before the XTheadMemPair.
> ---
>  gcc/config/riscv/riscv.cc | 56 +--
>  .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
>  2 files changed, 46 insertions(+), 28 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index e25692b86fc..fa2d4d4b779 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
>   && riscv_is_eh_return_data_register (regno))
> continue;
>
> +  /* In an interrupt function, save and restore some necessary CSRs
> in the stack
> +to avoid changes in CSRs.  */
> +  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> + && cfun->machine->interrupt_handler_p
> + && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> + || (TARGET_ZFINX
> + && (cfun->machine->frame.mask & ~(1 <<
> RISCV_PROLOGUE_TEMP_REGNUM)
> +   {
> + unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> + if (!epilogue)
> +   {
> + riscv_save_restore_reg (word_mode, regno, offset, fn);
> + offset -= fcsr_size;
> + emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> + offset, riscv_save_reg);
> +   }
> + else
> +   {
> + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> + offset - fcsr_size,
> riscv_restore_reg);
> + emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> + riscv_save_restore_reg (word_mode, regno, offset, fn);
> + offset -= fcsr_size;
> +   }
> + continue;
> +   }
> +
>if (TARGET_XTHEADMEMPAIR)
> {
>   /* Get the next reg/offset pair.  */
> @@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
> }
> }
>
> -  /* In an interrupt function, save and restore some necessary CSRs
> in the stack
> -to avoid changes in CSRs.  */
> -  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
> - && cfun->machine->interrupt_handler_p
> - && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
> - || (TARGET_ZFINX
> - && (cfun->machine->frame.mask & ~(1 <<
> RISCV_PROLOGUE_TEMP_REGNUM)
> -   {
> - unsigned int fcsr_size = GET_MODE_SIZE (SImode);
> - if (!epilogue)
> -   {
> - riscv_save_restore_reg (word_mode, regno, offset, fn);
> - offset -= fcsr_size;
> - emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
> - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> - offset, riscv_save_reg);
> -   }
> - else
> -   {
> - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
> - offset - fcsr_size,
> riscv_restore_reg);
> - emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
> - riscv_save_restore_reg (word_mode, regno, offset, fn);
> - offset -= fcsr_size;
> -   }
> - continue;
> -   }
> -
>riscv_save_restore_reg (word_mode, regno, offset, fn);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> new file mode 100644
> index 000..d06f05f5c7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
> @@ -0,0 +1,18 @@
> +/* Verify that fcsr instructions emitted.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
> +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906
> -funwind-tables" { target { rv64 } } } */
> +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906
> -funwind-tables" { target { rv32 } } } */
> +
> +
> +extern int foo (void);
> +
> +void __attribute__ ((interrupt))
> +sub (void)
> +{
> +  foo ();
>

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov



On Fri, 10 Nov 2023, Richard Biener wrote:

> > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> > design:
> > DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> > mutate them as needed when moving real insns across them.
> 
> Note that debug-only BBs do not exist - the BB would be there even without 
> debug
> insns!

Yep, sorry, I misspoke when I earlier said

>> and cause divergence when passing through a debug-only BB which would not be
>> present at all without -g.

They are present in the region, but skipped via no_real_insns_p.

> So instead you have to handle BBs with just debug insns the same you
> handle a completely empty BB.

Yeah. There would be no problem if the scheduler never used no_real_insns_p
and handled empty and non-empty BBs the same way.

Alexander

回复:Re: [PATCH v2] RISC-V: Fixbug for that XTheadMemPair causes interrupt to fail.

2023-11-10 Thread 马进(方耀)

I'm very sorry, I misunderstood. There's no difference between them, please 
ignore it.






 马进 
阿里巴巴及蚂蚁集团  
 电话：057128223456-89384085 
 邮箱：yaofang...@alibaba-inc.com 
 地址：浙江-杭州-西溪B区 B2-7-E6-090 
 
 阿里巴巴及蚂蚁集团   企业主页 
 信息安全声明：本邮件包含信息归发件人所在组织所有，发件人所在组织对该邮件拥有所有权利。
请接收者注意保密，未经发件人书面许可，不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely 
property of the sender's organization. 
 This mail communication is confidential. Recipients named above are obligated 
to maintain secrecy and are not permitted to disclose the contents of this 
communication to others.  
--
发件人：Kito Cheng
日　期：2023年11月10日 22:04:26
收件人：Jin Ma
抄　送：; ; 

主　题：Re: [PATCH v2] RISC-V: Fixbug for that XTheadMemPair causes interrupt to 
fail.

I thought Christoph was already committed? Do you mind describing the 
difference between v1 and v2?
On Fri, Nov 10, 2023 at 9:55 PM Jin Ma  wrote:
The t0 register is used as a temporary register for interrupts, so it needs
 special treatment. It is necessary to avoid using "th.ldd" in the interrupt
 program to stop the subsequent operation of the t0 register, so they need to
 exchange positions in the function "riscv_for_each_saved_reg".

 gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
 operation before the XTheadMemPair.
 ---
  gcc/config/riscv/riscv.cc | 56 +--
  .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
  2 files changed, 46 insertions(+), 28 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c

 diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
 index e25692b86fc..fa2d4d4b779 100644
 --- a/gcc/config/riscv/riscv.cc
 +++ b/gcc/config/riscv/riscv.cc
 @@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
   && riscv_is_eh_return_data_register (regno))
 continue;

 +  /* In an interrupt function, save and restore some necessary CSRs in 
the stack
 +to avoid changes in CSRs.  */
 +  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
 + && cfun->machine->interrupt_handler_p
 + && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
 + || (TARGET_ZFINX
 + && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
 +   {
 + unsigned int fcsr_size = GET_MODE_SIZE (SImode);
 + if (!epilogue)
 +   {
 + riscv_save_restore_reg (word_mode, regno, offset, fn);
 + offset -= fcsr_size;
 + emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
 + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
 + offset, riscv_save_reg);
 +   }
 + else
 +   {
 + riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
 + offset - fcsr_size, riscv_restore_reg);
 + emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
 + riscv_save_restore_reg (word_mode, regno, offset, fn);
 + offset -= fcsr_size;
 +   }
 + continue;
 +   }
 +
if (TARGET_XTHEADMEMPAIR)
 {
   /* Get the next reg/offset pair.  */
 @@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
 }
 }

 -  /* In an interrupt function, save and restore some necessary CSRs in 
the stack
 -to avoid changes in CSRs.  */
 -  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
 - && cfun->machine->interrupt_handler_p
 - && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
 - || (TARGET_ZFINX
 - && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
 -   {
 - unsigned int fcsr_size = GET_MODE_SIZE (SImode);
 - if (!epilogue)
 -   {
 - riscv_save_restore_reg (word_mode, regno, offset, fn);
 - offset -= fcsr_size;
 - emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
 - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
 - offset, riscv_save_reg);
 -   }
 - else
 -   {
 - riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
 - offset - fcsr_size, riscv_restore_reg);
 - emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
 - riscv_save_restore_reg (word_mode, regno, offset, fn);
 - offset -= fcsr_size;
 -   }
 - continue;
 -   }
 -
riscv_save_restore_reg (word_mode, regno, offset, fn);
  }

 diff --git a/gcc/testsuite/gcc.target

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Richard Biener

On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
>
>
> On Fri, 10 Nov 2023, Richard Biener wrote:
>
> > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> > > design:
> > > DEBUG_INSNs participate in dependency graph so that schedulers can remove 
> > > or
> > > mutate them as needed when moving real insns across them.
> >
> > Note that debug-only BBs do not exist - the BB would be there even without 
> > debug
> > insns!
>
> Yep, sorry, I misspoke when I earlier said
>
> >> and cause divergence when passing through a debug-only BB which would not 
> >> be
> >> present at all without -g.
>
> They are present in the region, but skipped via no_real_insns_p.
>
> > So instead you have to handle BBs with just debug insns the same you
> > handle a completely empty BB.
>
> Yeah. There would be no problem if the scheduler never used no_real_insns_p
> and handled empty and non-empty BBs the same way.

And I suppose it would be OK to do that.  Empty BBs are usually removed by
CFG cleanup so the situation should only happen in rare corner cases where
the fix would be to actually run CFG cleanup ...

Richard.

> Alexander

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Jeff Law





On 11/10/23 03:39, Richard Sandiford wrote:

Lehua Ding  writes:

On 2023/11/10 18:16, Richard Sandiford wrote:

Lehua Ding  writes:

Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.


Yes, I think the init-regs should be enhanced to reduce unnecessary
initialization. My previous internal patchs did this in a separate
patch. Maybe I should split the live_subreg problem out of the second
patch and not couple it with these patches. That way it can be reviewed
separately.


But my point was that this kind of dead code is possible even without
init-regs.  So I think we should have something that removes the dead
code.  And we can try it on that PR (without changing init-regs).


Got it, so we should add a fast remove dead code job after init-regs pass.


I'm just not sure how fast it would be, given that it needs the subreg
liveness info.  Could it be done during RA itself, during one of the existing
instruction walks?  E.g. if IRA sees a dead instruction, it could remove it
rather than recording conflict information for it.

Yea, it's a real concern.  I haven't done the analysis yet, but I have a 
 sense that Joern's ext-dce work which Jivan and I are working on 
(which does sub-object liveness tracking) is having a compile-time 
impact as well.


Jeff

Re: [PATCH] tree-ssa-loop-ivopts : Add live analysis in regs used in decision making

2023-11-10 Thread Ajit Agarwal

Hello Richard:

On 10/11/23 7:29 pm, Richard Biener wrote:
> On Fri, Nov 10, 2023 at 7:42 AM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>>
>> On 09/11/23 6:21 pm, Richard Biener wrote:
>>> On Wed, Nov 8, 2023 at 4:00 PM Ajit Agarwal  wrote:

 tree-ssa-loop-ivopts : Add live analysis in regs used in decision making.

 Add live anaysis in regs used calculation in decision making of
 selecting ivopts candidates.

 2023-11-08  Ajit Kumar Agarwal  

 gcc/ChangeLog:

 * tree-ssa-loop-ivopts.cc (get_regs_used): New function.
 (determine_set_costs): Call to get_regs_used to use live
 analysis.
 ---
  gcc/tree-ssa-loop-ivopts.cc | 73 +++--
  1 file changed, 70 insertions(+), 3 deletions(-)

 diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
 index c3336603778..e02fe7d434b 100644
 --- a/gcc/tree-ssa-loop-ivopts.cc
 +++ b/gcc/tree-ssa-loop-ivopts.cc
 @@ -6160,6 +6160,68 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
 *data, unsigned n_invs,
return cost + n_cands;
  }

 +/* Return regs used based on live-in and liveout of given ssa variables.  
 */
>>>
>>> Please explain how the following code relates to anything like "live
>>> analysis" and
>>> where it uses live-in and live-out.  And what "live-in/out of a given
>>> SSA variable"
>>> should be.
>>>
>>> Also explain why you are doing this at all.  The patch doesn't come
>>> with a testcase
>>> or with any other hint that motivated you.
>>>
>>> Richard.
>>>
>>
>> The function get_regs_used increments the regs_used based on live-in
>> and live-out analysis of given ssa name. Instead of setting live-in and
>> live-out bitmap I increment the regs_used.
>>
>> Below is how I identify live-in and live-out and increments the regs_used
>> variable:
>>
>> a) For a given def_bb of gimple statement of ssa name there should be
>> live-out and increments the regs_used.
>>
>> b) Visit each use of SSA_NAME and if it isn't in the same block as the def,
>>  we identify live on entry blocks and increments regs_used.
>>
>> The below function is the modification of set_var_live_on_entry of 
>> tree-ssa-live.cc
>> Where we set the bitmap of liveout and livein of basic block. Instead of 
>> setting bitmap, regs_used is incremented.e
> 
> It clearly doesn't work that way, and the number doesn't in any way relate to
> the number of registers used or register pressure.
> 

I agree with you that actual regs_used is not actually the registers used 
calculated
based on livein and liveout. 

Above decision making is using the variable  reg_used which is not actually 
related
to registers used or registers used.

My decision making is based on livein and liveout instead of actual registers 
used.

I tried to sync up with variables names same as used in 
ivopts_estimate_register_pressure.

My logic is changing the actual implementation of 
ivopts_estimate_register_pressure
considering the livein and liveout instead of actual registers used. Idea behind
is to use the livein and liveout considering the regions that doing ivopts 
increases
or decreases the register pressure based on livein and liveout. My calculation 
of register
pressure should be based livein and liveout across the region  based on ivopts 
instead of calculating the register used based on number of iv candidates.

This is how my notion of register pressure.

I can change code to give variables names meaningful stated in above decison 
making.

 >> I identify regs_used as the number of live-in and liveout of given ssa name 
 >> variable.
>>
>> For each iv candiate ssa variables I identify regs_used and take maximum of 
>> regs
>> used for all the iv candidates that will be used in 
>> ivopts_estimate_register_pressure
>> cost analysis.
>>
>> Motivation behind doing this opttks for FP and INT around 2% to 7%.
> 
> An interesting GIGO effect.

Why you think its GIGO effect. The gains are happening because of decision 
making
on register pressure stated above.

Please elaborate if you think otherwise.

Thanks & Regards
Ajit
> 
>> Also setting regs_used as number of iv candiates, which is not
>> optimized and robust way of decision making for ivopts optimization I decide
>> on live-in and live-out analysis which is more correct and appropriate way of
>> identifying regs_used.
>>
>> And also there are no regressions in bootstrapped/regtested on 
>> powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
 +static unsigned
 +get_regs_used (tree ssa_name)
 +{
 +  unsigned regs_used = 0;
 +  gimple *stmt;
 +  use_operand_p use;
 +  basic_block def_bb = NULL;
 +  imm_use_iterator imm_iter;
 +
 +  stmt = SSA_NAME_DEF_STMT (ssa_name);
 +  if (stmt)
 +{
 +  def_bb = gimple_bb (stmt);
 +  /* Mark defs in liveout bitmap temporarily.  */
 +  if (def_bb)
 +   regs_used++;

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov


On Fri, 10 Nov 2023, Richard Biener wrote:

> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
> >
> >
> > On Fri, 10 Nov 2023, Richard Biener wrote:
> >
> > > > I'm afraid ignoring debug-only BBs goes contrary to overall 
> > > > var-tracking design:
> > > > DEBUG_INSNs participate in dependency graph so that schedulers can 
> > > > remove or
> > > > mutate them as needed when moving real insns across them.
> > >
> > > Note that debug-only BBs do not exist - the BB would be there even 
> > > without debug
> > > insns!
> >
> > Yep, sorry, I misspoke when I earlier said
> >
> > >> and cause divergence when passing through a debug-only BB which would 
> > >> not be
> > >> present at all without -g.
> >
> > They are present in the region, but skipped via no_real_insns_p.
> >
> > > So instead you have to handle BBs with just debug insns the same you
> > > handle a completely empty BB.
> >
> > Yeah. There would be no problem if the scheduler never used no_real_insns_p
> > and handled empty and non-empty BBs the same way.
> 
> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> CFG cleanup so the situation should only happen in rare corner cases where
> the fix would be to actually run CFG cleanup ...

Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
may be a preferable compromise for sched-rgn as well.

I'm afraid one does not simply remove all uses of no_real_insns_p from
sched-rgn, but would be happy to be wrong about that.

Alexander

RE: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Kyrylo Tkachov

> -Original Message-
> From: Richard Earnshaw 
> Sent: Friday, November 10, 2023 11:31 AM
> To: Wilco Dijkstra ; Kyrylo Tkachov
> ; GCC Patches 
> Cc: Richard Sandiford ; Richard Earnshaw
> 
> Subject: Re: [PATCH] AArch64: Cleanup memset expansion
> 
> 
> 
> On 10/11/2023 10:17, Wilco Dijkstra wrote:
> > Hi Kyrill,
> >
> >> +  /* Reduce the maximum size with -Os.  */
> >> +  if (optimize_function_for_size_p (cfun))
> >> +    max_set_size = 96;
> >> +
> >
> >>  This is a new "magic" number in this code. It looks sensible, but how
> did you arrive at it?
> >
> > We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
> > for every 32 bytes, so the 96 means 4 instructions for typical sizes
> > (sizes not
> > a multiple of 16 can add one extra instruction).

It would be useful to have that reasoning in the comment.

> >
> > I checked codesize on SPECINT2017, and 96 had practically identical size.
> > Using 128 would also be a reasonable Os value with a very slight size
> > increase,
> > and 384 looks good for O2 - however I didn't want to tune these values
> > as this
> > is a cleanup patch.
> >
> > Cheers,
> > Wilco
> 
> Shouldn't this be a param then?  Also, manifest constants in the middle
> of code are a potential nightmare, please move it to a #define (even if
> that's then used as the default value for the param).

I agree on making this a #define but I wouldn't insist on a param.
Code size IMO has a much more consistent right or wrong answer as it's 
statically determinable.
It this was a speed-related param then I'd expect the flexibility for the power 
user to override such heuristics would be more widely useful.
But for code size the compiler should always be able to get it right.

If Richard would still like the param then I'm fine with having the param, but 
I'd be okay with the comment above and making this a #define.
Thanks,
Kyrill

Re: [PATCH] c++: constantness of local var in constexpr fn [PR111703, PR112269]

2023-11-10 Thread Patrick Palka

On Wed, 1 Nov 2023, Patrick Palka wrote:

> On Tue, 31 Oct 2023, Patrick Palka wrote:
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?  Does it look OK for release branches as well for sake of PR111703?

Ping.

> > 
> > -- >8 --
> > 
> > potential_constant_expression was incorrectly treating most local
> > variables from a constexpr function as (potentially) constant because it
> > wasn't considering the 'now' parameter.  This patch fixes this by
> > relaxing some var_in_maybe_constexpr_fn checks accordingly, which turns
> > out to partially fix two recently reported regressions:
> > 
> > PR111703 is a regression caused by r11-550-gf65a3299a521a4 for
> > restricting constexpr evaluation during warning-dependent folding.
> > The mechanism is intended to restrict only constant evaluation of the
> > instantiated non-dependent expression, but it also ends up restricting
> > constant evaluation (as part of satisfaction) during instantiation of
> > the expression, in particular when resolving the ck_rvalue conversion of
> > the 'x' argument into a copy constructor call.
> 
> Oops, this analysis is inaccurate for this specific testcase (although
> the general idea is the same)...  We don't call fold_for_warn on 'f(x)'
> but rather on its 'x' argument that has been processed by
> convert_arguments into an IMPLICIT_CONV_EXPR.  And it's the
> instantiation of this IMPLICIT_CONV_EXPR that turns it into a copy
> constructor call.  There is no ck_rvalue conversion at all here since
> 'f' is a function pointer, not an actual function, and so ICSes don't
> get computed (IIUC).  If 'f' is changed to be an actual function then
> there's no issue since build_over_call doesn't perform argument
> conversions when in a template context and therefore doesn't call
> check_function_arguments on the converted arguments (from which the
> problematic fold_for_warn call occurs).
> 
> > This seems like a bug in
> > the mechanism[1], though I don't know if we want to refine the mechanism
> > or get rid of it completely since the original testcases which motivated
> > the mechanism are fixed more simply by r13-1225-gb00b95198e6720.  In any
> > case, this patch partially fixes this by making us correctly treat 'x'
> > and therefore 'f(x)' in the below testcase as non-constant, which
> > prevents the problematic warning-dependent folding from occurring at
> > all.  If this bug crops up again then I figure we could decide what to
> > do with the mechanism then.
> > 
> > PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy
> > into tsubst_copy_and_build.  tsubst_copy used to exit early when 'args'
> > was empty, behavior which that commit deliberately didn't preserve.
> > This early exit masked the fact that COMPLEX_EXPR wasn't handled by
> > tsubst at all, and is a tree code that apparently we could see during
> > warning-dependent folding on some targets.  A complete fix is to add
> > handling for this tree code in tsubst_expr, but this patch should fix
> > the reported testsuite failures since the situations where COMPLEX_EXPR
> > crops up in  turn out to not be constant expressions in the
> > first place after this patch.

N.B. adding COMPLEX_EXPR handling to tsubst_expr is complicated by the
fact that these COMPLEX_EXRRs are created by convert_to_complex (a
middle-end routine) which occasionally creates SAVE_EXPR sub trees which
we don't expect to see inside templated trees...

> > 
> > [1]: The mechanism incorrectly assumes that instantiation of the
> > non-dependent expression shouldn't induce any template instantiation
> > since ahead of time checking of the expression should've already induced
> > whatever template instantiation was needed, but in this case although
> > overload resolution was performed ahead of time, a ck_rvalue conversion
> > gets resolved to a copy constructor call only at instantiation time.
> > 
> > PR c++/111703
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (potential_constant_expression_1) :
> > Only consider var_in_maybe_constexpr_fn if 'now' is false.
> > : Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-fn8.C: New test.
> > ---
> >  gcc/cp/constexpr.cc   |  4 ++--
> >  gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C | 24 +++
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index c05760e6789..8a6b210144a 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -9623,7 +9623,7 @@ potential_constant_expression_1 (tree t, bool 
> > want_rval, bool strict, bool now,
> >   return RECUR (DECL_VALUE_EXPR (t), rval);
> > }
> >if (want_rval
> > - && !var_in_maybe_constexpr_fn (t)
> > + && (now || !var_in_maybe_constexpr_fn (t))
> >   && !type_dependent_expression_p (t)
> >   && !decl_

Re: [PATCH] riscv: thead: Add support for the XTheadInt ISA extension

2023-11-10 Thread Christoph Müllner

On Tue, Nov 7, 2023 at 4:04 AM Jin Ma  wrote:
>
> The XTheadInt ISA extension provides acceleration interruption
> instructions as defined in T-Head-specific:
>
> * th.ipush
> * th.ipop

Overall, it looks ok to me.
There are just a few small issues to clean up (see below).


>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
> (th_int_get_save_adjustment): Likewise.
> (th_int_adjust_cfi_prologue): Likewise.
> * config/riscv/riscv.cc (TH_INT_INTERRUPT): New macro.
> (riscv_expand_prologue): Add the processing of XTheadInt.
> (riscv_expand_epilogue): Likewise.
> * config/riscv/riscv.md: New unspec.
> * config/riscv/thead.cc (BITSET_P): New macro.
> * config/riscv/thead.md (th_int_push): New pattern.
> (th_int_pop): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadint-push-pop.c: New test.
> ---
>  gcc/config/riscv/riscv-protos.h   |  3 +
>  gcc/config/riscv/riscv.cc | 58 +-
>  gcc/config/riscv/riscv.md |  4 +
>  gcc/config/riscv/thead.cc | 78 +++
>  gcc/config/riscv/thead.md | 67 
>  .../gcc.target/riscv/xtheadint-push-pop.c | 36 +
>  6 files changed, 245 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadint-push-pop.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 85d4f6ed9ea..05d1fc2b3a0 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -627,6 +627,9 @@ extern void th_mempair_prepare_save_restore_operands 
> (rtx[4], bool,
>   int, HOST_WIDE_INT,
>   int, HOST_WIDE_INT);
>  extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode);
> +extern unsigned int th_int_get_mask(unsigned int);

Space between function name and parenthesis.

> +extern unsigned int th_int_get_save_adjustment();

Space between function name and parenthesis.
An empty parameter list should be written as "(void)".

> +extern rtx th_int_adjust_cfi_prologue (unsigned int);
>  #ifdef RTX_CODE
>  extern const char*
>  th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 08ff05dcc3f..c623101b05e 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -101,6 +101,16 @@ along with GCC; see the file COPYING3.  If not see
>  /* True the mode switching has static frm, or false.  */
>  #define STATIC_FRM_P(c) ((c)->machine->mode_sw_info.static_frm_p)
>
> +/* True if we can use the instructions in the XTheadInt extension
> +   to handle interrupts, or false.  */
> +#define TH_INT_INTERRUPT(c)\
> +  (TARGET_XTHEADINT\
> +   /* The XTheadInt extension only supports rv32.  */  \
> +   && !TARGET_64BIT\
> +   && (c)->machine->interrupt_handler_p\
> +   /* This instruction can be executed in M-mode only.*/   \

Dot, space, space, end of comment.

Maybe better:
/* The XTheadInt instructions can only be executed in M-mode.  */

> +   && (c)->machine->interrupt_mode == MACHINE_MODE)
> +
>  /* Information about a function's frame layout.  */
>  struct GTY(())  riscv_frame_info {
>/* The size of the frame in bytes.  */
> @@ -6703,6 +6713,7 @@ riscv_expand_prologue (void)
>unsigned fmask = frame->fmask;
>int spimm, multi_push_additional, stack_adj;
>rtx insn, dwarf = NULL_RTX;
> +  unsigned th_int_mask = 0;
>
>if (flag_stack_usage_info)
>  current_function_static_stack_size = constant_lower_bound 
> (remaining_size);
> @@ -6771,6 +6782,28 @@ riscv_expand_prologue (void)
>REG_NOTES (insn) = dwarf;
>  }
>
> +  th_int_mask = th_int_get_mask(frame->mask);

There should be exactly one space between function name and parenthesis.

> +  if (th_int_mask && TH_INT_INTERRUPT (cfun))
> +{
> +  frame->mask &= ~th_int_mask;
> +
> +  /* RISCV_PROLOGUE_TEMP may be used to handle some CSR for
> +interrupts, such as fcsr. */

Dot, space, space, end of comment.

> +  if ((TARGET_HARD_FLOAT  && frame->fmask)
> + || (TARGET_ZFINX && frame->mask))
> +   frame->mask |= (1 << RISCV_PROLOGUE_TEMP_REGNUM);
> +
> +  unsigned save_adjustment = th_int_get_save_adjustment ();
> +  frame->gp_sp_offset -= save_adjustment;
> +  remaining_size -= save_adjustment;
> +
> +  insn = emit_insn (gen_th_int_push ());
> +
> +  rtx dwarf = th_int_adjust_cfi_prologue (th_int_mask);
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +  REG_NOTES (insn) = dwarf;
> +}
> +
>/* Save the GP, FP registers.  */
>if

Re: [PATCH] c++: non-dependent .* folding [PR112427]

2023-11-10 Thread Patrick Palka

On Thu, 9 Nov 2023, Jason Merrill wrote:

> On 11/8/23 16:59, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > -- >8 --
> > 
> > Here when building up the non-dependent .* expression, we crash from
> > fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
> > IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
> > expect.  Like in r14-4899-gd80a26cca02587, this patch fixes this by
> > replacing the problematic piecemeal folding with a single call to
> > cp_fully_fold.
> > 
> > PR c++/112427
> > 
> > gcc/cp/ChangeLog:
> > 
> > * typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
> > cp_fully_fold instead of fold_build_pointer_plus and fold_convert.
> 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/non-dependent29.C: New test.
> > ---
> >   gcc/cp/typeck2.cc   |  5 -
> >   gcc/testsuite/g++.dg/template/non-dependent29.C | 13 +
> >   2 files changed, 17 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent29.C
> > 
> > diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
> > index 309903afed8..208004221da 100644
> > --- a/gcc/cp/typeck2.cc
> > +++ b/gcc/cp/typeck2.cc
> > @@ -2378,7 +2378,10 @@ build_m_component_ref (tree datum, tree component,
> > tsubst_flags_t complain)
> > /* Build an expression for "object + offset" where offset is the
> >  value stored in the pointer-to-data-member.  */
> > ptype = build_pointer_type (type);
> > -  datum = fold_build_pointer_plus (fold_convert (ptype, datum),
> > component);
> > +  datum = cp_convert (ptype, datum, complain);
> > +  datum = build2 (POINTER_PLUS_EXPR, ptype,
> > + datum, convert_to_ptrofftype (component));
> 
> We shouldn't need to build the POINTER_PLUS_EXPR at all in template context.
> OK with that change.

Hmm, that seems harmless at first glance, but I noticed
build_min_non_dep (called from build_x_binary_op in this case) is
careful to propagate TREE_SIDE_EFFECTS of the given tree, and so eliding
POINTER_PLUS_EXPR here could potentially mean that the tree we
ultimately return from build_x_binary_op when in a template context has
TREE_SIDE_EFFECTS not set when it used to.  Shall we still elide the
POINTER_PLUS_EXPR in a template context despite this?

(The TREE_SIDE_EFFECTS propagation in build_min_non_dep was added in
r71108 to avoid bogus ahead of time -Wunused-value warnings.  But then
r105273 later made us stop issuing -Wunused-value warnings ahead of time
altogether.  So perhaps we don't need to maintain the TREE_SIDE_EFFECTS
flag on templated trees at all anymore?)

> 
> Jason
> 
>

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Richard Earnshaw





On 10/11/2023 14:46, Kyrylo Tkachov wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Friday, November 10, 2023 11:31 AM
To: Wilco Dijkstra ; Kyrylo Tkachov
; GCC Patches 
Cc: Richard Sandiford ; Richard Earnshaw

Subject: Re: [PATCH] AArch64: Cleanup memset expansion



On 10/11/2023 10:17, Wilco Dijkstra wrote:

Hi Kyrill,


+  /* Reduce the maximum size with -Os.  */
+  if (optimize_function_for_size_p (cfun))
+    max_set_size = 96;
+



 This is a new "magic" number in this code. It looks sensible, but how

did you arrive at it?


We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
for every 32 bytes, so the 96 means 4 instructions for typical sizes
(sizes not
a multiple of 16 can add one extra instruction).


It would be useful to have that reasoning in the comment.



I checked codesize on SPECINT2017, and 96 had practically identical size.
Using 128 would also be a reasonable Os value with a very slight size
increase,
and 384 looks good for O2 - however I didn't want to tune these values
as this
is a cleanup patch.

Cheers,
Wilco


Shouldn't this be a param then?  Also, manifest constants in the middle
of code are a potential nightmare, please move it to a #define (even if
that's then used as the default value for the param).


I agree on making this a #define but I wouldn't insist on a param.
Code size IMO has a much more consistent right or wrong answer as it's 
statically determinable.
It this was a speed-related param then I'd expect the flexibility for the power 
user to override such heuristics would be more widely useful.
But for code size the compiler should always be able to get it right.

If Richard would still like the param then I'm fine with having the param, but 
I'd be okay with the comment above and making this a #define.


I don't immediately have a feel for how sensitive code would be to the 
precise value here.  Is this value something that might affect 
individual benchmarks in different ways?  Or something where a future 
architecture might want a different value?  For either of those reasons 
a param might be useful, but if this is primarily a code size trade off 
and the variation in performance is small, then it's probably not 
worthwhile having an additional hook.


R.

Re: [PATCH] c++: non-dependent .* folding [PR112427]

2023-11-10 Thread Patrick Palka

On Fri, 10 Nov 2023, Patrick Palka wrote:

> On Thu, 9 Nov 2023, Jason Merrill wrote:
> 
> > On 11/8/23 16:59, Patrick Palka wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > trunk?
> > > 
> > > -- >8 --
> > > 
> > > Here when building up the non-dependent .* expression, we crash from
> > > fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
> > > IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
> > > expect.  Like in r14-4899-gd80a26cca02587, this patch fixes this by
> > > replacing the problematic piecemeal folding with a single call to
> > > cp_fully_fold.
> > > 
> > >   PR c++/112427
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
> > >   cp_fully_fold instead of fold_build_pointer_plus and fold_convert.
> > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/template/non-dependent29.C: New test.
> > > ---
> > >   gcc/cp/typeck2.cc   |  5 -
> > >   gcc/testsuite/g++.dg/template/non-dependent29.C | 13 +
> > >   2 files changed, 17 insertions(+), 1 deletion(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent29.C
> > > 
> > > diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
> > > index 309903afed8..208004221da 100644
> > > --- a/gcc/cp/typeck2.cc
> > > +++ b/gcc/cp/typeck2.cc
> > > @@ -2378,7 +2378,10 @@ build_m_component_ref (tree datum, tree component,
> > > tsubst_flags_t complain)
> > > /* Build an expression for "object + offset" where offset is the
> > >value stored in the pointer-to-data-member.  */
> > > ptype = build_pointer_type (type);
> > > -  datum = fold_build_pointer_plus (fold_convert (ptype, datum),
> > > component);
> > > +  datum = cp_convert (ptype, datum, complain);
> > > +  datum = build2 (POINTER_PLUS_EXPR, ptype,
> > > +   datum, convert_to_ptrofftype (component));
> > 
> > We shouldn't need to build the POINTER_PLUS_EXPR at all in template context.
> > OK with that change.
> 
> Hmm, that seems harmless at first glance, but I noticed
> build_min_non_dep (called from build_x_binary_op in this case) is
> careful to propagate TREE_SIDE_EFFECTS of the given tree, and so eliding
> POINTER_PLUS_EXPR here could potentially mean that the tree we
> ultimately return from build_x_binary_op when in a template context has
> TREE_SIDE_EFFECTS not set when it used to.  Shall we still elide the
> POINTER_PLUS_EXPR in a template context despite this?
> 
> (The TREE_SIDE_EFFECTS propagation in build_min_non_dep was added in
> r71108 to avoid bogus ahead of time -Wunused-value warnings.  But then
> r105273 later made us stop issuing -Wunused-value warnings ahead of time
> altogether.  So perhaps we don't need to maintain the TREE_SIDE_EFFECTS
> flag on templated trees at all anymore?)

IMO it'd be nice to restore ahead of time -Wunused-value warnings;
it seems the original motivation for r105273 / PR8057 was to avoid
redundantly issuing a warning twice, once ahead of time and once at
instantiation time, which we now could do in a better way with
warning_suppressed_p etc.  If so, then IIUC eliding the POINTER_PLUS_EXPR
could mean we'd incorrectly issue a -Wunused-value warning for e.g.
'a.*f()' in a template context?

> 
> > 
> > Jason
> > 
> > 
>

[committed] i386: Clear stack protector scratch with zero/sign-extend instruction

2023-11-10 Thread Uros Bizjak

Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.

Handle only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.

Also use explicit check for word_mode instead of mode iterator in peephole2
patterns to avoid pattern explosion.

gcc/ChangeLog:

* config/i386/i386.md (stack_protect_set_1 peephole2):
Explicitly check operand 2 for word_mode.
(stack_protect_set_1 peephole2 #2): Ditto.
(stack_protect_set_2 peephole2): Ditto.
(stack_protect_set_3 peephole2): Ditto.
(*stack_protect_set_4z__di): New insn pattern.
(*stack_protect_set_4s__di): Ditto.
(stack_protect_set_4 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving zero/sign-extend instruction.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 046b6b7919e..01fc6ecc351 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24335,11 +24335,12 @@ (define_peephole2
   [(parallel [(set (match_operand:PTR 0 "memory_operand")
   (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
   UNSPEC_SP_SET))
- (set (match_operand:W 2 "general_reg_operand") (const_int 0))
+ (set (match_operand 2 "general_reg_operand") (const_int 0))
  (clobber (reg:CC FLAGS_REG))])
(set (match_operand 3 "general_reg_operand")
(match_operand 4 "const0_operand"))]
-  "GET_MODE_SIZE (GET_MODE (operands[3])) <= UNITS_PER_WORD
+  "GET_MODE (operands[2]) == word_mode
+   && GET_MODE_SIZE (GET_MODE (operands[3])) <= UNITS_PER_WORD
&& peep2_reg_dead_p (0, operands[3])
&& peep2_reg_dead_p (1, operands[2])"
   [(parallel [(set (match_dup 0)
@@ -24395,11 +24396,12 @@ (define_peephole2
   [(parallel [(set (match_operand:PTR 0 "memory_operand")
   (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
   UNSPEC_SP_SET))
- (set (match_operand:W 2 "general_reg_operand") (const_int 0))
+ (set (match_operand 2 "general_reg_operand") (const_int 0))
  (clobber (reg:CC FLAGS_REG))])
(set (match_operand:SWI48 3 "general_reg_operand")
(match_operand:SWI48 4 "general_gr_operand"))]
-  "peep2_reg_dead_p (0, operands[3])
+  "GET_MODE (operands[2]) == word_mode
+   && peep2_reg_dead_p (0, operands[3])
&& peep2_reg_dead_p (1, operands[2])"
   [(parallel [(set (match_dup 0)
   (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
@@ -24411,9 +24413,10 @@ (define_peephole2
(parallel [(set (match_operand:PTR 0 "memory_operand")
   (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
   UNSPEC_SP_SET))
- (set (match_operand:W 2 "general_reg_operand") (const_int 0))
+ (set (match_operand 2 "general_reg_operand") (const_int 0))
  (clobber (reg:CC FLAGS_REG))])]
-  "peep2_reg_dead_p (0, operands[3])
+  "GET_MODE (operands[2]) == word_mode
+   && peep2_reg_dead_p (0, operands[3])
&& peep2_reg_dead_p (2, operands[2])
&& !reg_mentioned_p (operands[3], operands[0])
&& !reg_mentioned_p (operands[3], operands[1])"
@@ -24448,16 +24451,71 @@ (define_peephole2
   [(parallel [(set (match_operand:PTR 0 "memory_operand")
   (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
   UNSPEC_SP_SET))
- (set (match_operand:W 2 "general_reg_operand") (const_int 0))
+ (set (match_operand 2 "general_reg_operand") (const_int 0))
  (clobber (reg:CC FLAGS_REG))])
(set (match_operand:SWI48 3 "general_reg_operand")
(match_operand:SWI48 4 "address_no_seg_operand"))]
-  "peep2_reg_dead_p (0, operands[3])
+  "GET_MODE (operands[2]) == word_mode
+   && peep2_reg_dead_p (0, operands[3])
&& peep2_reg_dead_p (1, operands[2])"
   [(parallel [(set (match_dup 0)
   (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
  (set (match_dup 3) (match_dup 4))])])
 
+(define_insn "*stack_protect_set_4z__di"
+  [(set (match_operand:PTR 0 "memory_operand" "=m")
+   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
+   UNSPEC_SP_SET))
+   (set (match_operand:DI 1 "register_operand" "=&r")
+   (zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" "rm")))]
+  "TARGET_64BIT && reload_completed"
+{
+  output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands);
+  if (ix86_use_lea_for_mov (insn, operands + 1))
+return "lea{l}\t{%E2, %k1|%k1, %E2}";
+  else
+return "mov{l}\t{%2, %k1|%k1, %2}";
+}
+  [(set_attr "type" "multi")
+   (set_attr "length" "24")])
+
+(define_insn "*stack_protect_set_4s__di"
+  [(set (ma

[pushed] Allow md iterators to include other iterators

2023-11-10 Thread Richard Sandiford

This patch allows an .md iterator to include the contents of
previous iterators, possibly with an extra condition attached.

Too much indirection might become hard to follow, so for the
AArch64 changes I tried to stick to things that seemed likely
to be uncontroversial:

(a) structure iterators that combine modes for different sizes
and vector counts

(b) iterators that explicitly duplicate another iterator
(for iterating over the cross product)

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* read-rtl.cc (md_reader::read_mapping): Allow iterators to
include other iterators.
* doc/md.texi: Document the change.
* config/aarch64/iterators.md (DREG2, VQ2, TX2, DX2, SX2): Include
the iterator that is being duplicated, rather than reproducing it.
(VSTRUCT_D): Redefine using VSTRUCT_[234]D.
(VSTRUCT_Q): Likewise VSTRUCT_[234]Q.
(VSTRUCT_2QD, VSTRUCT_3QD, VSTRUCT_4QD, VSTRUCT_QD): Redefine using
the individual D and Q iterators.
---
 gcc/config/aarch64/iterators.md | 60 +
 gcc/doc/md.texi | 13 +++
 gcc/read-rtl.cc | 21 ++--
 3 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 1593a8fd04f..a920de99ffc 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -106,7 +106,7 @@ (define_mode_iterator VDZ [V8QI V4HI V4HF V4BF V2SI V2SF DI 
DF])
 (define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF])
 
 ;; Copy of the above.
-(define_mode_iterator DREG2 [V8QI V4HI V4HF V2SI V2SF DF])
+(define_mode_iterator DREG2 [DREG])
 
 ;; Advanced SIMD modes for integer divides.
 (define_mode_iterator VQDIV [V4SI V2DI])
@@ -124,7 +124,7 @@ (define_mode_iterator VDQ_BHSI [V8QI V16QI V4HI V8HI V2SI 
V4SI])
 (define_mode_iterator VQ [V16QI V8HI V4SI V2DI V8HF V4SF V2DF V8BF])
 
 ;; Copy of the above.
-(define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
+(define_mode_iterator VQ2 [VQ])
 
 ;; Quad vector modes suitable for moving.  Includes BFmode.
 (define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
@@ -320,21 +320,13 @@ (define_mode_iterator VS [V2SI V4SI])
 (define_mode_iterator TX [TI TF TD])
 
 ;; Duplicate of the above
-(define_mode_iterator TX2 [TI TF TD])
+(define_mode_iterator TX2 [TX])
 
 (define_mode_iterator VTX [TI TF TD V16QI V8HI V4SI V2DI V8HF V4SF V2DF V8BF])
 
 ;; Advanced SIMD opaque structure modes.
 (define_mode_iterator VSTRUCT [OI CI XI])
 
-;; Advanced SIMD 64-bit vector structure modes.
-(define_mode_iterator VSTRUCT_D [V2x8QI V2x4HI V2x2SI V2x1DI
-V2x4HF V2x2SF V2x1DF V2x4BF
-V3x8QI V3x4HI V3x2SI V3x1DI
-V3x4HF V3x2SF V3x1DF V3x4BF
-V4x8QI V4x4HI V4x2SI V4x1DI
-V4x4HF V4x2SF V4x1DF V4x4BF])
-
 ;; Advanced SIMD 64-bit 2-vector structure modes.
 (define_mode_iterator VSTRUCT_2D [V2x8QI V2x4HI V2x2SI V2x1DI
  V2x4HF V2x2SF V2x1DF V2x4BF])
@@ -347,6 +339,9 @@ (define_mode_iterator VSTRUCT_3D [V3x8QI V3x4HI V3x2SI 
V3x1DI
 (define_mode_iterator VSTRUCT_4D [V4x8QI V4x4HI V4x2SI V4x1DI
  V4x4HF V4x2SF V4x1DF V4x4BF])
 
+;; Advanced SIMD 64-bit vector structure modes.
+(define_mode_iterator VSTRUCT_D [VSTRUCT_2D VSTRUCT_3D VSTRUCT_4D])
+
 ;; Advanced SIMD 64-bit 2-vector structure modes minus V2x1DI and V2x1DF.
 (define_mode_iterator VSTRUCT_2DNX [V2x8QI V2x4HI V2x2SI V2x4HF
V2x2SF V2x4BF])
@@ -371,14 +366,6 @@ (define_mode_iterator VSTRUCT_3DX [V3x1DI V3x1DF])
 ;; Advanced SIMD 64-bit 4-vector structure modes with 64-bit elements.
 (define_mode_iterator VSTRUCT_4DX [V4x1DI V4x1DF])
 
-;; Advanced SIMD 128-bit vector structure modes.
-(define_mode_iterator VSTRUCT_Q [V2x16QI V2x8HI V2x4SI V2x2DI
-V2x8HF V2x4SF V2x2DF V2x8BF
-V3x16QI V3x8HI V3x4SI V3x2DI
-V3x8HF V3x4SF V3x2DF V3x8BF
-V4x16QI V4x8HI V4x4SI V4x2DI
-V4x8HF V4x4SF V4x2DF V4x8BF])
-
 ;; Advanced SIMD 128-bit 2-vector structure modes.
 (define_mode_iterator VSTRUCT_2Q [V2x16QI V2x8HI V2x4SI V2x2DI
  V2x8HF V2x4SF V2x2DF V2x8BF])
@@ -391,49 +378,32 @@ (define_mode_iterator VSTRUCT_3Q [V3x16QI V3x8HI V3x4SI 
V3x2DI
 (define_mode_iterator VSTRUCT_4Q [V4x16QI V4x8HI V4x4SI V4x2DI
  V4x8HF V4x4SF V4x2DF V4x8BF])
 
+;; Advanced SIMD 128-bit vector structure modes.
+(define_mode_iterator VSTRUCT_Q [VSTRUCT_2Q VSTRUCT_3Q VSTRUCT_4Q])
+
 ;; Advanced SIMD 2-vector structure modes.
-(define_mode_iterator VSTRUCT_2QD [V2x8QI V2x4HI V2x2SI V2x1DI
-  V2x4H

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Dimitar Dimitrov

On Fri, Nov 10, 2023 at 04:53:57PM +0800, Lehua Ding wrote:
> > > The divide by zero error above is interesting. I'm not sure why
> > > ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in
> > > the following rtx:
> > > (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [
> > > encoding ])) -1
> > >   (nil))
> > 
> > I just cross compiled an arm-none-eabi compiler and didn't encounter
> > this error, can you give me a little more config info about build? For
> > example, flags_for_target, etc. Thanks again.
> > 
> 
> Forgot, please also provide the version information of newlib code.
> 

These are the GIT commit hashes which I tested:
  gcc 39d81b667373b0033f44702a4b532a4618dde9ff
  binutils c96ceed9dce7617f270aa4742645706e535f74b7
  newlib 39f734a857e2692224715b03b99fc7bd83e94a0f

This is the script I'm using to build arm-none-eabi:
   https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-arm.sh
The build steps and config parameters are easily seen there.

Note that the Linaro CI is also detecting issues. It hits ICEs when
building libgcc:
  
https://patchwork.sourceware.org/project/gcc/patch/20231108034740.834590-8-lehua.d...@rivai.ai/

Regards,
Dimitar

[PATCH] libgccjit: Fix GGC segfault when using -flto

2023-11-10 Thread Antoni Boucher

Hi.
This patch fixes the segfault when using -flto with libgccjit (bug
111396).

You mentioned in bugzilla that this didn't fix the reproducer for you,
but it does for me.
At first, the test case would not pass, but running "make install" made
it pass.
Not sure if this is normal.

Could you please check if this fixes the issue on your side as well?
Since this patch changes files outside of gcc/jit, what tests should I
run to make sure it didn't break anything?

Thanks for the review.
From f26d0f37e8d83bce1f5aa53c393961a8bd518d16 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Fri, 10 Nov 2023 09:52:32 -0500
Subject: [PATCH] libgccjit: Fix GGC segfault when using -flto

gcc/ChangeLog:
	PR jit/111396
	* ipa-fnsummary.cc (ipa_fnsummary_cc_finalize): Call
	ipa_free_size_summary.
	* ipa-icf.cc (ipa_icf_cc_finalize): New function.
	* ipa-profile.cc (ipa_profile_cc_finalize): New function.
	* ipa-prop.cc (ipa_prop_cc_finalize): New function.
	* ipa-prop.h (ipa_prop_cc_finalize): New function.
	* ipa-sra.cc (ipa_sra_cc_finalize): New function.
	* ipa-utils.h (ipa_profile_cc_finalize, ipa_icf_cc_finalize,
	ipa_sra_cc_finalize): New functions.
	* toplev.cc (toplev::finalize): Call ipa_icf_cc_finalize,
	ipa_prop_cc_finalize, ipa_profile_cc_finalize and
	ipa_sra_cc_finalize
	Include ipa-utils.h.

gcc/testsuite/ChangeLog:
	PR jit/111396
	* jit.dg/all-non-failing-tests.h: Add new test-ggc-bugfix.
	* jit.dg/test-ggc-bugfix.c: New test.
---
 gcc/ipa-fnsummary.cc |  1 +
 gcc/ipa-icf.cc   |  9 ++
 gcc/ipa-profile.cc   | 10 ++
 gcc/ipa-prop.cc  | 18 +++
 gcc/ipa-prop.h   |  2 ++
 gcc/ipa-sra.cc   | 12 +++
 gcc/ipa-utils.h  |  7 
 gcc/testsuite/jit.dg/all-non-failing-tests.h | 12 ++-
 gcc/testsuite/jit.dg/test-ggc-bugfix.c   | 34 
 gcc/toplev.cc|  5 +++
 10 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/jit.dg/test-ggc-bugfix.c

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index a2495ffe63e..34e011c4b50 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -5090,4 +5090,5 @@ void
 ipa_fnsummary_cc_finalize (void)
 {
   ipa_free_fn_summary ();
+  ipa_free_size_summary ();
 }
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index bbdfd445397..ba6c6899ce6 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -3657,3 +3657,12 @@ make_pass_ipa_icf (gcc::context *ctxt)
 {
   return new ipa_icf::pass_ipa_icf (ctxt);
 }
+
+/* Reset all state within ipa-icf.cc so that we can rerun the compiler
+   within the same process.  For use by toplev::finalize.  */
+
+void
+ipa_icf_cc_finalize (void)
+{
+  ipa_icf::optimizer = NULL;
+}
diff --git a/gcc/ipa-profile.cc b/gcc/ipa-profile.cc
index 78a40a118bc..8083b8195a8 100644
--- a/gcc/ipa-profile.cc
+++ b/gcc/ipa-profile.cc
@@ -1065,3 +1065,13 @@ make_pass_ipa_profile (gcc::context *ctxt)
 {
   return new pass_ipa_profile (ctxt);
 }
+
+/* Reset all state within ipa-profile.cc so that we can rerun the compiler
+   within the same process.  For use by toplev::finalize.  */
+
+void
+ipa_profile_cc_finalize (void)
+{
+  delete call_sums;
+  call_sums = NULL;
+}
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 827bdb691ba..32cfb7754be 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5904,5 +5904,23 @@ ipcp_transform_function (struct cgraph_node *node)
   return modified_mem_access ? TODO_update_ssa_only_virtuals : 0;
 }
 
+/* Reset all state within ipa-prop.cc so that we can rerun the compiler
+   within the same process.  For use by toplev::finalize.  */
+
+void
+ipa_prop_cc_finalize (void)
+{
+  if (function_insertion_hook_holder)
+symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
+  function_insertion_hook_holder = NULL;
+
+  if (ipa_edge_args_sum)
+ggc_delete (ipa_edge_args_sum);
+  ipa_edge_args_sum = NULL;
+
+  if (ipa_node_params_sum)
+ggc_delete (ipa_node_params_sum);
+  ipa_node_params_sum = NULL;
+}
 
 #include "gt-ipa-prop.h"
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index fcd0e5c638f..4409c4afee9 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -1255,6 +1255,8 @@ tree ipcp_get_aggregate_const (struct function *func, tree parm, bool by_ref,
 bool unadjusted_ptr_and_unit_offset (tree op, tree *ret,
  poly_int64 *offset_ret);
 
+void ipa_prop_cc_finalize (void);
+
 /* From tree-sra.cc:  */
 tree build_ref_for_offset (location_t, tree, poly_int64, bool, tree,
 			   gimple_stmt_iterator *, bool);
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 6ffad335db4..2ac6fee14c4 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4707,5 +4707,17 @@ make_pass_ipa_sra (gcc::context *ctxt)
   return new pass_ipa_sra (ctxt);
 }
 
+/* Reset all state within ipa-sra.cc so that we can rerun the compiler
+   within the same proc

Re: [PATCH][Ada] Fix syntax errors in expect.c

2023-11-10 Thread Marc Poulhiès



Andris Pavēnis  writes:

> Fixing these errors (attached patch for master branch) was not sufficient for
> building Ada cross-compiler, but it fixed compiler errors.
>
> This would perhaps qualify for trivial change, but it seems that I no more 
> have
> write access (I got it in 2015, but have not used it for a long time. Perhaps 
> I
> do not really need it)

Hello,

I've merged you patch as r14-5332.

Thanks!
Marc

[pushed][PR112337][IRA]: Check autoinc and memory address after temporary equivalence substitution

2023-11-10 Thread Vladimir Makarov


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

The patch was successfully bootstrapped an tested on x86-64, ppc64le, 
and aarch64.
commit b3d1d30eeed67c78e223c146a464d2fdd1dde894
Author: Vladimir N. Makarov 
Date:   Fri Nov 10 11:14:46 2023 -0500

[IRA]: Check autoinc and memory address after temporary equivalence substitution

My previous RA patches to take register equivalence into account do
temporary register equivalence substitution to find out that the
equivalence can be consumed by insns.  The insn with the substitution is
checked on validity using target-depended code.  This code expects that
autoinc operations work on register but this register can be substituted
by equivalent memory.  The patch fixes this problem.  The patch also adds
checking that the substitution can be consumed in memory address too.

gcc/ChangeLog:

PR target/112337
* ira-costs.cc: (validate_autoinc_and_mem_addr_p): New function.
(equiv_can_be_consumed_p): Use it.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 50f80779025..e0528e76a64 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1758,13 +1758,46 @@ process_bb_node_for_costs (ira_loop_tree_node_t loop_tree_node)
 process_bb_for_costs (bb);
 }
 
+/* Return true if all autoinc rtx in X change only a register and memory is
+   valid.  */
+static bool
+validate_autoinc_and_mem_addr_p (rtx x)
+{
+  enum rtx_code code = GET_CODE (x);
+  if (GET_RTX_CLASS (code) == RTX_AUTOINC)
+return REG_P (XEXP (x, 0));
+  const char *fmt = GET_RTX_FORMAT (code);
+  for (int i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+if (fmt[i] == 'e')
+  {
+	if (!validate_autoinc_and_mem_addr_p (XEXP (x, i)))
+	  return false;
+  }
+else if (fmt[i] == 'E')
+  {
+	for (int j = 0; j < XVECLEN (x, i); j++)
+	  if (!validate_autoinc_and_mem_addr_p (XVECEXP (x, i, j)))
+	return false;
+  }
+  /* Check memory after checking autoinc to guarantee that autoinc is already
+ valid for machine-dependent code checking memory address.  */
+  return (!MEM_P (x)
+	  || memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0),
+	  MEM_ADDR_SPACE (x)));
+}
+
 /* Check that reg REGNO can be changed by TO in INSN.  Return true in case the
result insn would be valid one.  */
 static bool
 equiv_can_be_consumed_p (int regno, rtx to, rtx_insn *insn)
 {
   validate_replace_src_group (regno_reg_rtx[regno], to, insn);
-  bool res = verify_changes (0);
+  /* We can change register to equivalent memory in autoinc rtl.  Some code
+ including verify_changes assumes that autoinc contains only a register.
+ So check this first.  */
+  bool res = validate_autoinc_and_mem_addr_p (PATTERN (insn));
+  if (res)
+res = verify_changes (0);
   cancel_changes (0);
   return res;
 }
diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c b/gcc/testsuite/gcc.target/arm/pr112337.c
new file mode 100644
index 000..5dacf0aa4f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr112337.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } */
+
+#pragma GCC arm "arm_mve_types.h"
+int32x4_t h(void *p) { return __builtin_mve_vldrwq_sv4si(p); }
+void g(int32x4_t);
+void f(int, int, int, short, int *p) {
+  int *bias = p;
+  for (;;) {
+int32x4_t d = h(bias);
+bias += 4;
+g(d);
+  }
+}

Re: [PATCH] c++: fix tf_decltype manipulation for COMPOUND_EXPR

2023-11-10 Thread Patrick Palka

On Thu, 9 Nov 2023, Jason Merrill wrote:

> On 11/7/23 10:08, Patrick Palka wrote:
> > bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
> > trunk?
> > 
> > -- >8 --
> > 
> > In the COMPOUND_EXPR case of tsubst_expr, we were redundantly clearing
> > the tf_decltype flag when substituting the LHS and also neglecting to
> > propagate it when substituting the RHS.  This patch corrects this flag
> > manipulation, which allows us to accept the below testcase.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (tsubst_expr) : Don't redundantly
> > clear tf_decltype when substituting the LHS.  Propagate
> > tf_decltype when substituting the RHS.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/decltype-call7.C: New test.
> > ---
> >   gcc/cp/pt.cc| 9 -
> >   gcc/testsuite/g++.dg/cpp0x/decltype-call7.C | 9 +
> >   2 files changed, 13 insertions(+), 5 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 521749df525..5f879287a58 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -20382,11 +20382,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t
> > complain, tree in_decl)
> > case COMPOUND_EXPR:
> > {
> > -   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
> > -   complain & ~tf_decltype, in_decl);
> > -   RETURN (build_x_compound_expr (EXPR_LOCATION (t),
> > -  op0,
> > -  RECUR (TREE_OPERAND (t, 1)),
> > +   tree op0 = RECUR (TREE_OPERAND (t, 0));
> > +   tree op1 = tsubst_expr (TREE_OPERAND (t, 1), args,
> > +   complain|decltype_flag, in_decl);
> > +   RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
> >templated_operator_saved_lookups (t),
> >complain|decltype_flag));
> 
> Hmm, passing decltype_flag to both op1 and the , is concerning.  Can you add a
> test with overloaded operator, where the RHS is a class with a destructor?

I'm not sure if this is what you had in mind, but indeed with this patch
we reject the following with an error outside the immediate context:

struct B { ~B() = delete; };
template B f();

void operator,(int, const B&);

template decltype(42, f()) g(int) = delete; // #1
template void g(...); // #2

int main() {
  g(0); // should select #2
}

gcc/testsuite/g++.dg/cpp0x/decltype-call8.C: In substitution of ‘template decltype ((42, f())) g(int) [with T = B]’:
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:12:7:   required from here
   12 |   g(0);
  |   ^~~
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:8:30: error: use of deleted 
function ‘B::~B()’
8 | template decltype(42, f()) g(int) = delete; // #1
  |~~^~~~
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:3:12: note: declared here
3 | struct B { ~B() = delete; };
  |^

Ultimately because unary_complex_lvalue isn't SFINAE-enabled.  If we
fix that with the following then we accept the testcase as before.

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 9d4d95f85bf..58c45542793 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -556,7 +556,7 @@ build_simple_base_path (tree expr, tree binfo)
 into `(*(a ?  &b : &c)).x', and so on.  A COND_EXPR is only
 an lvalue in the front end; only _DECLs and _REFs are lvalues
 in the back end.  */
-  temp = unary_complex_lvalue (ADDR_EXPR, expr);
+  temp = unary_complex_lvalue (ADDR_EXPR, expr, tf_warning_or_error);
   if (temp)
expr = cp_build_fold_indirect_ref (temp);
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 1fa710d7154..d826afcdb5c 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8130,7 +8130,7 @@ extern tree cp_build_addr_expr(tree, 
tsubst_flags_t);
 extern tree cp_build_unary_op   (enum tree_code, tree, bool,
  tsubst_flags_t);
 extern tree genericize_compound_lvalue (tree);
-extern tree unary_complex_lvalue   (enum tree_code, tree);
+extern tree unary_complex_lvalue   (enum tree_code, tree, 
tsubst_flags_t);
 extern tree build_x_conditional_expr   (location_t, tree, tree, tree,
  tsubst_flags_t);
 extern tree build_x_compound_expr_from_list(tree, expr_list_kind,
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 4f2cb2cd402..277c81412b9 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20386,7 +20386,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
complain|decltype_flag, in_decl);
RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
   templated_operator_saved

Re: [PATCH] g++: Add require-effective-target to multi-input file testcase pr95401.cc

2023-11-10 Thread Patrick O'Neill



On 11/9/23 17:34, Jeff Law wrote:



On 11/3/23 00:18, Patrick O'Neill wrote:

On non-vector targets dejagnu attempts dg-do compile for pr95401.cc.
This produces a command like this:
g++ pr95401.cc pr95401a.cc -S -o pr95401.s

which isn't valid (gcc does not accept multiple input files when using
-S with -o).

This patch adds require-effective-target vect_int to avoid the case
where the testcase is invoked with dg-do compile.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr95401.cc: Add require-effective-target vect_int.
Sorry, I must be missing something here.  I fail to see how adding an 
effective target check would/should impact the problem you've 
described above with the dg-additional-sources interaction with -S.


It's not intuitive (& probably not the cleanest way of solving it).

pr95401.cc is an invalid testcase when run with dg-do compile (for the 
reasons above).


pr95401.cc 
 
does not define a dg-do, which means it uses the testcase uses 
dg-do-what-default 
 
to determine what to do.
dg-do-what-default is set by target-supports.exp 
.


The two options here are set dg-do-what-default run or compile.
On non-vector targets the pr95401 is set to compile (which is invalid).

Ideally we would say if dg-do-what-default == compile don't run, but 
AFAIK that isn't possible.
I didn't want to duplicate the check_vect_support_and_set_flags logic to 
return true/false since that'll probably get out of sync.


I used require-effective-target vect_int as a proxy for 
check_vect_support_and_set_flags (also since the testcase only contains 
integer arrays).


That way we do this now:
dg-do-what-default run -> run
dg-do-what-default compile -> skip test

If there's a cleaner/better approach I'm happy to revise.

Patrick



Jeff

[PATCH] aarch64: Avoid -Wincompatible-pointer-types warning in Linux unwinder

2023-11-10 Thread Florian Weimer

* config/aarch64/linux-unwind.h
(aarch64_fallback_frame_state): Add cast to the expected type
in sc assignment.

(Almost a v2, but the other issue was already fixed via in r14-4183.)

---
 libgcc/config/aarch64/linux-unwind.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/aarch64/linux-unwind.h 
b/libgcc/config/aarch64/linux-unwind.h
index 00eba866049..18b3df71e7b 100644
--- a/libgcc/config/aarch64/linux-unwind.h
+++ b/libgcc/config/aarch64/linux-unwind.h
@@ -77,7 +77,10 @@ aarch64_fallback_frame_state (struct _Unwind_Context 
*context,
 }
 
   rt_ = context->cfa;
-  sc = &rt_->uc.uc_mcontext;
+  /* Historically, the uc_mcontext member was of type struct sigcontext, but
+ glibc uses a different type now with member names in the implementation
+ namespace.  */
+  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
 
 /* This define duplicates the definition in aarch64.md */
 #define SP_REGNUM 31

base-commit: 3a6df3281a525ae6113f50d7b38b09fcd803801e

[Committed] g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector targets

2023-11-10 Thread Patrick O'Neill




On 11/9/23 17:20, Jeff Law wrote:



On 11/2/23 17:45, Patrick O'Neill wrote:

Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The three testcases in this patch overwrite the default with
dg-do run.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr102788.cc: Remove dg-do run directive.
OK.  I'll note your patch has just one file patched, but your comment 
indicates three testcases have this problem.  Did you forget to 
include a couple changes?


If so, those are pre-approved as well.  Just post them for the 
archiver and commit.


Thanks,
jeff

Committed

The comment was mistakenly copy/pasted from 
https://inbox.sourceware.org/gcc-patches/20231102190911.66763-1-patr...@rivosinc.com/T/#u

Revised commit message to only mention the one testcase.

Thanks,
Patrick

Re: [PATCH] c++: fix tf_decltype manipulation for COMPOUND_EXPR

2023-11-10 Thread Jason Merrill


On 11/10/23 12:25, Patrick Palka wrote:

On Thu, 9 Nov 2023, Jason Merrill wrote:


On 11/7/23 10:08, Patrick Palka wrote:

bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
trunk?

-- >8 --

In the COMPOUND_EXPR case of tsubst_expr, we were redundantly clearing
the tf_decltype flag when substituting the LHS and also neglecting to
propagate it when substituting the RHS.  This patch corrects this flag
manipulation, which allows us to accept the below testcase.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Don't redundantly
clear tf_decltype when substituting the LHS.  Propagate
tf_decltype when substituting the RHS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-call7.C: New test.
---
   gcc/cp/pt.cc| 9 -
   gcc/testsuite/g++.dg/cpp0x/decltype-call7.C | 9 +
   2 files changed, 13 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 521749df525..5f879287a58 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20382,11 +20382,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t
complain, tree in_decl)
 case COMPOUND_EXPR:
 {
-   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
-   complain & ~tf_decltype, in_decl);
-   RETURN (build_x_compound_expr (EXPR_LOCATION (t),
-  op0,
-  RECUR (TREE_OPERAND (t, 1)),
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = tsubst_expr (TREE_OPERAND (t, 1), args,
+   complain|decltype_flag, in_decl);
+   RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
   templated_operator_saved_lookups (t),
   complain|decltype_flag));


Hmm, passing decltype_flag to both op1 and the , is concerning.  Can you add a
test with overloaded operator, where the RHS is a class with a destructor?


I'm not sure if this is what you had in mind, but indeed with this patch
we reject the following with an error outside the immediate context:

 struct B { ~B() = delete; };
 template B f();

 void operator,(int, const B&);

 template decltype(42, f()) g(int) = delete; // #1
 template void g(...); // #2

 int main() {
   g(0); // should select #2
 }

gcc/testsuite/g++.dg/cpp0x/decltype-call8.C: In substitution of ‘template 
decltype ((42, f())) g(int) [with T = B]’:
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:12:7:   required from here
12 |   g(0);
   |   ^~~
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:8:30: error: use of deleted 
function ‘B::~B()’
 8 | template decltype(42, f()) g(int) = delete; // #1
   |~~^~~~
gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:3:12: note: declared here
 3 | struct B { ~B() = delete; };
   |^

Ultimately because unary_complex_lvalue isn't SFINAE-enabled.


Please elaborate; my understanding is that unary_complex_lvalue is 
supposed to be a semantically neutral transformation.



diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 4f2cb2cd402..277c81412b9 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20386,7 +20386,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
complain|decltype_flag, in_decl);
RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
   templated_operator_saved_lookups (t),
-  complain|decltype_flag));
+  complain));


This looks like it will break if the operator, returns a class with a 
deleted destructor.


Jason

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-10 Thread Jakub Jelinek

On Thu, Nov 09, 2023 at 03:27:11PM +0800, Hongtao Liu wrote:
> On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1  wrote:
> >
> > This patch aims to avoid generate vblendps with ymm16+, And have
> > bootstrapped and tested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/112435
> > * config/i386/sse.md: Adding constraints to restrict the generation 
> > of
> > vblendps.
> It should be "Don't output vblendps when evex sse reg or gpr32 is involved."
> Others LGTM.

I've missed this patch, so wrote my own today, and am wondering

1) if it isn't better to use separate alternative instead of
   x86_evex_reg_mentioned_p, like in the patch below
2) why do you need the last two hunks in sse.md, both avx2_permv2ti and
   *avx_vperm2f128_nozero insns only use x in constraints, never v,
   so x86_evex_reg_mentioned_p ought to be always false there

Here is the untested patch, of course you have more testcases (though, I
think it is better to test dg-do assemble with avx512vl target rather than
dg-do compile and scan the assembler, after all, the problem was that it
didn't assemble).

2023-11-10  Jakub Jelinek  

PR target/112435
* config/i386/sse.md (avx512vl_shuf_32x4_1,
avx512dq_shuf_64x2_1): Add
alternative with just x instead of v constraints and use vblendps
as optimization only with that alternative.

* gcc.target/i386/avx512vl-pr112435.c: New test.

--- gcc/config/i386/sse.md.jj   2023-11-09 09:04:18.616543403 +0100
+++ gcc/config/i386/sse.md  2023-11-10 15:56:44.138499931 +0100
@@ -19235,11 +19235,11 @@ (define_expand "avx512dq_shuf_avx512dq_shuf_64x2_1"
-  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
(vec_select:VI8F_256
  (vec_concat:
-   (match_operand:VI8F_256 1 "register_operand" "v")
-   (match_operand:VI8F_256 2 "nonimmediate_operand" "vm"))
+   (match_operand:VI8F_256 1 "register_operand" "x,v")
+   (match_operand:VI8F_256 2 "nonimmediate_operand" "xm,vm"))
  (parallel [(match_operand 3 "const_0_to_3_operand")
 (match_operand 4 "const_0_to_3_operand")
 (match_operand 5 "const_4_to_7_operand")
@@ -19254,7 +19254,7 @@ (define_insn "avx512dq_shu
   mask = INTVAL (operands[3]) / 2;
   mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
   operands[3] = GEN_INT (mask);
-  if (INTVAL (operands[3]) == 2 && !)
+  if (INTVAL (operands[3]) == 2 && ! && which_alternative == 0)
 return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
   return "vshuf64x2\t{%3, %2, %1, 
%0|%0, %1, %2, %3}";
 }
@@ -19386,11 +19386,11 @@ (define_expand "avx512vl_shuf_32x4_1"
-  [(set (match_operand:VI4F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
(vec_select:VI4F_256
  (vec_concat:
-   (match_operand:VI4F_256 1 "register_operand" "v")
-   (match_operand:VI4F_256 2 "nonimmediate_operand" "vm"))
+   (match_operand:VI4F_256 1 "register_operand" "x,v")
+   (match_operand:VI4F_256 2 "nonimmediate_operand" "xm,vm"))
  (parallel [(match_operand 3 "const_0_to_7_operand")
 (match_operand 4 "const_0_to_7_operand")
 (match_operand 5 "const_0_to_7_operand")
@@ -19414,7 +19414,7 @@ (define_insn "avx512vl_shuf_)
+  if (INTVAL (operands[3]) == 2 && ! && which_alternative == 0)
 return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
 
   return "vshuf32x4\t{%3, %2, %1, 
%0|%0, %1, %2, %3}";
--- gcc/testsuite/gcc.target/i386/avx512vl-pr112435.c.jj2023-11-10 
16:04:21.708046771 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr112435.c   2023-11-10 
16:03:51.053479094 +0100
@@ -0,0 +1,13 @@
+/* PR target/112435 */
+/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
+/* { dg-options "-mavx512vl -O2" } */
+
+#include 
+
+__m256i
+foo (__m256i a, __m256i b)
+{
+  register __m256i c __asm__("ymm16") = a;
+  asm ("" : "+v" (c));
+  return _mm256_shuffle_i32x4 (c, b, 2);
+}

Jakub

Re: [PATCH] c++: fix tf_decltype manipulation for COMPOUND_EXPR

2023-11-10 Thread Patrick Palka

On Fri, 10 Nov 2023, Jason Merrill wrote:

> On 11/10/23 12:25, Patrick Palka wrote:
> > On Thu, 9 Nov 2023, Jason Merrill wrote:
> > 
> > > On 11/7/23 10:08, Patrick Palka wrote:
> > > > bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
> > > > trunk?
> > > > 
> > > > -- >8 --
> > > > 
> > > > In the COMPOUND_EXPR case of tsubst_expr, we were redundantly clearing
> > > > the tf_decltype flag when substituting the LHS and also neglecting to
> > > > propagate it when substituting the RHS.  This patch corrects this flag
> > > > manipulation, which allows us to accept the below testcase.
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * pt.cc (tsubst_expr) : Don't redundantly
> > > > clear tf_decltype when substituting the LHS.  Propagate
> > > > tf_decltype when substituting the RHS.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp0x/decltype-call7.C: New test.
> > > > ---
> > > >gcc/cp/pt.cc| 9 -
> > > >gcc/testsuite/g++.dg/cpp0x/decltype-call7.C | 9 +
> > > >2 files changed, 13 insertions(+), 5 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C
> > > > 
> > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > index 521749df525..5f879287a58 100644
> > > > --- a/gcc/cp/pt.cc
> > > > +++ b/gcc/cp/pt.cc
> > > > @@ -20382,11 +20382,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t
> > > > complain, tree in_decl)
> > > >  case COMPOUND_EXPR:
> > > >  {
> > > > -   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
> > > > -   complain & ~tf_decltype, in_decl);
> > > > -   RETURN (build_x_compound_expr (EXPR_LOCATION (t),
> > > > -  op0,
> > > > -  RECUR (TREE_OPERAND (t, 1)),
> > > > +   tree op0 = RECUR (TREE_OPERAND (t, 0));
> > > > +   tree op1 = tsubst_expr (TREE_OPERAND (t, 1), args,
> > > > +   complain|decltype_flag, in_decl);
> > > > +   RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
> > > >templated_operator_saved_lookups
> > > > (t),
> > > >complain|decltype_flag));
> > > 
> > > Hmm, passing decltype_flag to both op1 and the , is concerning.  Can you
> > > add a
> > > test with overloaded operator, where the RHS is a class with a destructor?
> > 
> > I'm not sure if this is what you had in mind, but indeed with this patch
> > we reject the following with an error outside the immediate context:
> > 
> >  struct B { ~B() = delete; };
> >  template B f();
> > 
> >  void operator,(int, const B&);
> > 
> >  template decltype(42, f()) g(int) = delete; // #1
> >  template void g(...); // #2
> > 
> >  int main() {
> >g(0); // should select #2
> >  }
> > 
> > gcc/testsuite/g++.dg/cpp0x/decltype-call8.C: In substitution of
> > ‘template decltype ((42, f())) g(int) [with T = B]’:
> > gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:12:7:   required from here
> > 12 |   g(0);
> >|   ^~~
> > gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:8:30: error: use of deleted
> > function ‘B::~B()’
> >  8 | template decltype(42, f()) g(int) = delete; // #1
> >|~~^~~~
> > gcc/testsuite/g++.dg/cpp0x/decltype-call8.C:3:12: note: declared here
> >  3 | struct B { ~B() = delete; };
> >|^
> > 
> > Ultimately because unary_complex_lvalue isn't SFINAE-enabled.
> 
> Please elaborate; my understanding is that unary_complex_lvalue is supposed to
> be a semantically neutral transformation.

Since tf_decltype is now also set when substituting op1 i.e. f(),
substitution yields a bare CALL_EXPR with no temporary materialization.
The problematic unary_complex_lvalue call happens when binding
the reference parameter 'const B&' to this bare CALL_EXPR.  We
take its address via cp_build_addr_expr, which tries
unary_complex_lvalue.  The CALL_EXPR handling in unary_complex_lvalue
in turn materializes a temporary for the call, which fails as expected
due to the destructor but also issues the unexpected error since
unary_complex_lvalue unconditionally uses tf_warning_or_error.

So in short the CALL_EXPR handling in unary_complex_lvalue seems to
assume the requirements of temporary materialization have already
been checked.

> 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 4f2cb2cd402..277c81412b9 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -20386,7 +20386,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t
> > complain, tree in_decl)
> > complain|decltype_flag, in_decl);
> > RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
> >templated_operator_saved_lookups (t),
> > -  compla

Re: [PATCH 3/3] RISC-V: Add support for XCVbi extension in CV32E40P

2023-11-10 Thread Jeff Law





On 11/8/23 04:09, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 


gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Change pattern priority so corev.md
  patterns run before riscv.md patterns.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
---




diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index 0109e1836cf..7d7b952d817 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -706,3 +706,17 @@
  
[(set_attr "type" "load")

(set_attr "mode" "SI")])
+
+;; XCVBI Builtins
+(define_insn "cv_branch"
+  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:X 2 "register_operand" "r")
+ (match_operand:X 3 "const_int5s_operand" 
"CV_bi_sign5")])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "TARGET_XCVBI"
+  "cv.b%C1imm\t%2,%3,%0"
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])
Note that technically you could use "i" or "n" for the constraint of 
operand 3.  This works because the predicate has priority and it only 
allows -16..15.




diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ae2217d0907..168c8665a7a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -579,6 +579,14 @@
  (define_asm_attributes
[(set_attr "type" "multi")])
  
+;; ..

+;;
+;; Machine Description Patterns
+;;
+;; ..
+
+(include "corev.md")
I would put a comment here indicating why a subtarget might want to 
include its patterns before the standard patterns in riscv.md.



OK with the comment added.  Your decision on whether or not to drop the 
CV_bi_sign5 constraint and replace it with "n".


Jeff

Re: [PATCH 3/3] attribs: Namespace-aware lookup_attribute_spec

2023-11-10 Thread Jeff Law





On 11/6/23 05:24, Richard Sandiford wrote:

attribute_ignored_p already used a namespace-aware query
to find the attribute_spec for an existing attribute:

   const attribute_spec *as = lookup_attribute_spec (TREE_PURPOSE (attr));

This patch does the same for other callers in the file.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* attribs.cc (comp_type_attributes): Pass the full TREE_PURPOSE
to lookup_attribute_spec, rather than just the name.
(remove_attributes_matching): Likewise.

OK
jeff

1 2 >

1 - 100 of 157 matches

Mail list logo