date:20230621

RE: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-06-21 Thread Wang, Yanzhang via Gcc-patches

Of cause, I'd like to make it generic. Thanks Robin’s advice! It's right,
there're many similar situations.

But I'm not sure how to distinguish different operations. Currently, the
VMULH is fixed as below.


+   (unspec:VI_QHS
+ [(vec_duplicate:VI_QHS
+(match_operand: 4 "reg_or_0_operand"))
+   (match_operand:VI_QHS 3 "register_operand")] VMULH)

Do we need to define another UNSPEC ? And do we have any APIs to get the
operation, like whether it's VMULH or POW ?

Thanks,
Yanzhang
From: juzhe.zh...@rivai.ai 
Sent: Wednesday, June 21, 2023 2:33 PM
To: Robin Dapp ; Wang, Yanzhang ; 
gcc-patches 
Cc: Robin Dapp ; Kito.cheng ; Li, 
Pan2 
Subject: Re: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.

Oh. Yes. Thanks for Robin pointing this.

@yanzhang, could you refine this patch more deeply to gain more optimizations ?

Thanks.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-06-21 14:27
To: yanzhang.wang; 
gcc-patches
CC: rdapp.gcc; 
juzhe.zhong; 
kito.cheng; pan2.li
Subject: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.
Hi Yanzhang,

while I appreciate the optimization, I'm a bit wary about just adding a special
case for "0".  Is that so common? Wouldn't we also like to have
  * pow2_p (val) == << val and others?

* 1 should also be covered.

Regards
Robin

Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-21 Thread Jakub Jelinek via Gcc-patches

On Wed, Jun 21, 2023 at 06:59:08AM +, Li, Pan2 wrote:
>  inline machine_mode
>  bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
> -  return (machine_mode)
> -((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
> +  lto_input_block *input_block =  (class lto_input_block *) bp->stream;

Still 2 spaces instead of 1 here, otherwise it LGTM, but the important
question is if you have actually tested it with offloading, because only
that will verify it works correctly.
See https://gcc.gnu.org/wiki/Offloading for details.

Jakub

RE: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-21 Thread Li, Pan2 via Gcc-patches

Thanks Jakub, will fix the format issue and send the V3 patch, as well as try 
to validate it for offloading.

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Wednesday, June 21, 2023 3:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH] RISC-V: Fix out of range memory access of machine mode 
table

On Wed, Jun 21, 2023 at 06:59:08AM +, Li, Pan2 wrote:
>  inline machine_mode
>  bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
> -  return (machine_mode)
> -((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
> +  lto_input_block *input_block =  (class lto_input_block *) bp->stream;

Still 2 spaces instead of 1 here, otherwise it LGTM, but the important
question is if you have actually tested it with offloading, because only
that will verify it works correctly.
See https://gcc.gnu.org/wiki/Offloading for details.

Jakub

Re: RE: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-06-21 Thread juzhe.zh...@rivai.ai

No, I don't think we need another UNSPEC.
You just need to modify predicate of (match_operand: 4 "reg_or_0_operand")

juzhe.zh...@rivai.ai

From: Wang, Yanzhang
Date: 2023-06-21 15:08
To: juzhe.zh...@rivai.ai; Robin Dapp; gcc-patches
CC: Robin Dapp; Kito.cheng; Li, Pan2
Subject: RE: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.
Of cause, I'd like to make it generic. Thanks Robin’s advice! It's right,
there're many similar situations.

But I'm not sure how to distinguish different operations. Currently, the
VMULH is fixed as below.

+   (unspec:VI_QHS
+ [(vec_duplicate:VI_QHS
+(match_operand: 4 "reg_or_0_operand"))
+   (match_operand:VI_QHS 3 "register_operand")] VMULH)

Do we need to define another UNSPEC ? And do we have any APIs to get the
operation, like whether it's VMULH or POW ?

Thanks,
Yanzhang
From: juzhe.zh...@rivai.ai  
Sent: Wednesday, June 21, 2023 2:33 PM
To: Robin Dapp ; Wang, Yanzhang ; 
gcc-patches 
Cc: Robin Dapp ; Kito.cheng ; Li, 
Pan2 
Subject: Re: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.

Oh. Yes. Thanks for Robin pointing this.

@yanzhang, could you refine this patch more deeply to gain more optimizations ?

Thanks.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-06-21 14:27
To: yanzhang.wang; gcc-patches
CC: rdapp.gcc; juzhe.zhong; kito.cheng; pan2.li
Subject: Re: [PATCH] RISC-V: convert the mulh with 0 to mov 0 to the reg.
Hi Yanzhang,

while I appreciate the optimization, I'm a bit wary about just adding a special
case for "0".  Is that so common? Wouldn't we also like to have
  * pow2_p (val) == << val and others?

* 1 should also be covered.

Regards
Robin

Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Hongtao Liu via Gcc-patches

On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
 wrote:
>
> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> possible need for VEX3 in the broadcast ones. When EVEX encoding is
> required the broadcast insns are always shorter.
>
> Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.
>
> gcc/
>
> * config/i386/sse.md (vec_dupv4sf): Make first alternative use
> vbroadcastss for AVX2. New AVX512F alternative.
> (*vec_dupv4si): New AVX2 and AVX512F alternatives using
> vpbroadcastd.
> ---
> Especially with the added "enabled" attribute I didn't really see how to
> (further) fold alternatives 0 and 1. Instead *vec_dupv4si might benefit
> from using sse2_noavx2 instead of sse2 for alternative 2, except that
> there is no sse2_noavx2, only sse2_noavx.
>
> Is there a reason why vec_dupv4sf uses sseshuf1 for its shuffle
> alternatives, but *vec_dupv4si uses sselog1? I'd be happy to correct
> this in whichever is the appropriate direction, while touching this
> anyway.
It should be sseshuf1(or sseshuf depending on input operands number in
the pattern) for shufps, sselog means logical instructions.
I think it comes from

Intel SDM Vlolumes1:
5.6.1
Intel® SSE2 Packed and Scalar Double Precision Floating-Point Instructions

 there're descriptions for different kinds of instructions.


>
> I'm working from the assumption that the isa attributes to the original
> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> operand forms already when at least AVX2 is enabled.
>
> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
According to comments, yes, no extra prefix is needed.

;; There are also additional prefixes in 3DNOW, SSSE3.
;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte,
;; sseiadd1,ssecvt1 to 0f7a with no DREX byte.
;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.

> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_
> and elsewhere.)
> ---
> v2: Correct operand constraints. Respect -mprefer-vector-width=. Fold
> two alternatives of vec_dupv4sf.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -26141,41 +26141,64 @@
> (const_int 1)))])
>
>  (define_insn "vec_dupv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SF 0 "register_operand" "=v,v,v,x")
> (vec_duplicate:V4SF
> - (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SF 1 "nonimmediate_operand" "Yv,v,m,0")))]
>"TARGET_SSE"
>"@
> -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
> +   * return TARGET_AVX2 ? \"vbroadcastss\t{%1, %0|%0, %1}\" : 
> \"vshufps\t{$0, %d1, %0|%0, %d1, 0}\";
> +   vbroadcastss\t{%1, %g0|%g0, %1}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "avx,avx,noavx")
> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> -   (set_attr "mode" "V4SF")])
> +  [(set_attr "isa" "avx,*,avx,noavx")
> +   (set (attr "type")
> +   (cond [(and (eq_attr "alternative" "0")
> +   (match_test "!TARGET_AVX2"))
> +(const_string "sseshuf1")
> +  (eq_attr "alternative" "3")
> +(const_string "sseshuf1")
> + ]
> + (const_string "ssemov")))
> +   (set (attr "length_immediate")
> +   (if_then_else (eq_attr "type" "sseshuf1")
> + (const_string "1")
> + (const_string "0")))
> +   (set_attr "prefix_extra" "0,0,1,*")
> +   (set_attr "prefix" "maybe_evex,evex,maybe_evex,orig")
> +   (set_attr "mode" "V4SF,V16SF,V4SF,V4SF")
> +   (set (attr "enabled")
> +   (if_then_else (eq_attr "alternative" "1")
> + (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> +  && !TARGET_PREFER_AVX256")
> + (const_string "*")))])
>
>  (define_insn "*vec_dupv4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,v,v,x")
> (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SI 1 "nonimmediate_operand" "Yvm,v,Yv,m,0")))]
>"TARGET_SSE"
>"@
> +   vpbroadcastd\t{%1, %0|%0, %1}
> +   vpbroadcastd\t{%1, %g0|%g0, %1}
> %vpshufd\t{$0, %1, %0|%0, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "sse2,avx,noavx")
> -   (set_attr "type" "sselog1,ssemov,sselog1")
> -   (set_attr "length_immediate" "1,0,1")
> -

[PATCH V1] RISC-V:Add float16 tuple type abi

2023-06-21 Thread shiyulong

From: yulong 

 gcc/ChangeLog:

* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.

---
 gcc/config/riscv/vector.md|  31 ++-
 .../gcc.target/riscv/rvv/base/abi-10.c|  25 ++
 .../gcc.target/riscv/rvv/base/abi-11.c|  27 ++-
 .../gcc.target/riscv/rvv/base/abi-12.c|  27 ++-
 .../gcc.target/riscv/rvv/base/abi-15.c|  27 ++-
 .../gcc.target/riscv/rvv/base/abi-17.c| 229 ++
 .../gcc.target/riscv/rvv/base/abi-18.c| 229 ++
 .../gcc.target/riscv/rvv/base/abi-8.c |  27 ++-
 .../gcc.target/riscv/rvv/base/abi-9.c |  25 ++
 9 files changed, 630 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-18.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 884e7435cc2..cd87989b536 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -98,7 +98,12 @@
  
VNx2x8HI,VNx3x8HI,VNx4x8HI,VNx5x8HI,VNx6x8HI,VNx7x8HI,VNx8x8HI,\
  
VNx2x4HI,VNx3x4HI,VNx4x4HI,VNx5x4HI,VNx6x4HI,VNx7x4HI,VNx8x4HI,\
  
VNx2x2HI,VNx3x2HI,VNx4x2HI,VNx5x2HI,VNx6x2HI,VNx7x2HI,VNx8x2HI,\
- 
VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI")
+ 
VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI,\
+   VNx2x32HF,VNx2x16HF,VNx3x16HF,VNx4x16HF,\
+ 
VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF,\
+ 
VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF,\
+ 
VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF,\
+ 
VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
 (const_int 16)
 (eq_attr "mode" "VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,\
  VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,\
@@ -156,17 +161,17 @@
   (symbol_ref "riscv_vector::get_vlmul(E_VNx64HImode)")
 
 ; Half float point
-(eq_attr "mode" "VNx1HF")
+(eq_attr "mode" 
"VNx1HF,VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx1HFmode)")
-(eq_attr "mode" "VNx2HF")
+(eq_attr "mode" 
"VNx2HF,VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx2HFmode)")
-(eq_attr "mode" "VNx4HF")
+(eq_attr "mode" 
"VNx4HF,VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx4HFmode)")
-(eq_attr "mode" "VNx8HF")
+(eq_attr "mode" 
"VNx8HF,VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx8HFmode)")
-(eq_attr "mode" "VNx16HF")
+(eq_attr "mode" "VNx16HF,VNx2x16HF,VNx3x16HF,VNx4x16HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx16HFmode)")
-(eq_attr "mode" "VNx32HF")
+(eq_attr "mode" "VNx32HF,VNx2x32HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx32HFmode)")
 (eq_attr "mode" "VNx64HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx64HFmode)")
@@ -249,17 +254,17 @@
   (symbol_ref "riscv_vector::get_ratio(E_VNx64HImode)")
 
 ; Half float point.
-(eq_attr "mode" "VNx1HF")
+(eq_attr "mode" 
"VNx1HF,VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx1HFmode)")
-(eq_attr "mode" "VNx2HF")
+(eq_attr "mode" 
"VNx2HF,VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx2HFmode)")
-(eq_attr "mode" "VNx4HF")
+(eq_attr "mode" 
"VNx4HF,VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx4HFmode)")
-(eq_attr "mode" "VNx8HF")
+(eq_attr "mode" 
"VNx8HF,VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx8HFmode)")
-(eq_attr "mode" "VNx16HF")
+(eq_attr "mode" "VNx16HF,VNx2x16HF,VNx3x16HF,VNx4x16HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx16HFmode)")
-(eq_a

Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Jan Beulich via Gcc-patches

On 21.06.2023 09:37, Hongtao Liu wrote:
> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
>  wrote:
>>
>> Is there a reason why vec_dupv4sf uses sseshuf1 for its shuffle
>> alternatives, but *vec_dupv4si uses sselog1? I'd be happy to correct
>> this in whichever is the appropriate direction, while touching this
>> anyway.
> It should be sseshuf1(or sseshuf depending on input operands number in
> the pattern) for shufps, sselog means logical instructions.

Would you be okay for me to fold in that adjustment, or do you
insist on a separate patch?

>> I'm working from the assumption that the isa attributes to the original
>> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
>> or avx_noavx2 as applicable), as the new earlier alternatives cover all
>> operand forms already when at least AVX2 is enabled.
>>
>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> According to comments, yes, no extra prefix is needed.
> 
> ;; There are also additional prefixes in 3DNOW, SSSE3.
> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte,
> ;; sseiadd1,ssecvt1 to 0f7a with no DREX byte.
> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.

Right, that's what triggered my question. I guess dropping these
"prefix_extra" really wants to be a separate patch (or maybe even
multiple, but it's hard to see how to split), dealing with all of the
instances which likely have accumulated simply via copy-and-paste.

>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -26141,41 +26141,64 @@
>> (const_int 1)))])
>>
>>  (define_insn "vec_dupv4sf"
>> -  [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
>> +  [(set (match_operand:V4SF 0 "register_operand" "=v,v,v,x")
>> (vec_duplicate:V4SF
>> - (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
>> + (match_operand:SF 1 "nonimmediate_operand" "Yv,v,m,0")))]
>>"TARGET_SSE"
>>"@
>> -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
>> +   * return TARGET_AVX2 ? \"vbroadcastss\t{%1, %0|%0, %1}\" : 
>> \"vshufps\t{$0, %d1, %0|%0, %d1, 0}\";
>> +   vbroadcastss\t{%1, %g0|%g0, %1}
>> vbroadcastss\t{%1, %0|%0, %1}
>> shufps\t{$0, %0, %0|%0, %0, 0}"
>> -  [(set_attr "isa" "avx,avx,noavx")
>> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
>> -   (set_attr "length_immediate" "1,0,1")
>> -   (set_attr "prefix_extra" "0,1,*")
>> -   (set_attr "prefix" "maybe_evex,maybe_evex,orig")
>> -   (set_attr "mode" "V4SF")])
>> +  [(set_attr "isa" "avx,*,avx,noavx")
>> +   (set (attr "type")
>> +   (cond [(and (eq_attr "alternative" "0")
>> +   (match_test "!TARGET_AVX2"))
>> +(const_string "sseshuf1")
>> +  (eq_attr "alternative" "3")
>> +(const_string "sseshuf1")
>> + ]
>> + (const_string "ssemov")))
>> +   (set (attr "length_immediate")
>> +   (if_then_else (eq_attr "type" "sseshuf1")
>> + (const_string "1")
>> + (const_string "0")))
>> +   (set_attr "prefix_extra" "0,0,1,*")
>> +   (set_attr "prefix" "maybe_evex,evex,maybe_evex,orig")
>> +   (set_attr "mode" "V4SF,V16SF,V4SF,V4SF")
>> +   (set (attr "enabled")
>> +   (if_then_else (eq_attr "alternative" "1")
>> + (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
>> +  && !TARGET_PREFER_AVX256")
>> + (const_string "*")))])
>>
>>  (define_insn "*vec_dupv4si"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,v,v,x")
>> (vec_duplicate:V4SI
>> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
>> + (match_operand:SI 1 "nonimmediate_operand" "Yvm,v,Yv,m,0")))]
>>"TARGET_SSE"
>>"@
>> +   vpbroadcastd\t{%1, %0|%0, %1}
>> +   vpbroadcastd\t{%1, %g0|%g0, %1}
>> %vpshufd\t{$0, %1, %0|%0, %1, 0}
>> vbroadcastss\t{%1, %0|%0, %1}
>> shufps\t{$0, %0, %0|%0, %0, 0}"
>> -  [(set_attr "isa" "sse2,avx,noavx")
>> -   (set_attr "type" "sselog1,ssemov,sselog1")
>> -   (set_attr "length_immediate" "1,0,1")
>> -   (set_attr "prefix_extra" "0,1,*")
>> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
>> -   (set_attr "mode" "TI,V4SF,V4SF")
>> +  [(set_attr "isa" "avx2,*,sse2,avx,noavx")
>> +   (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1")
>> +   (set_attr "length_immediate" "0,0,1,0,1")
>> +   (set_attr "prefix_extra" "0,0,0,1,*")
>> +   (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig")
>> +   (set_attr "mode" "TI,XI,TI,V4SF,V4SF")
>> (set (attr "preferred_for_speed")
>> - (cond [(eq_attr "alternative" "1")
>> + (cond [(eq_attr "alternative" "3")
>>   (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC")
>>]
>> -  (symbol_ref "true")))])
>> +  (symbol_ref "true")))
>> +   (set (attr "enabled")
>> +   (if_then_else (eq_attr "alternative" "1")

Re: [PATCH V1] RISC-V:Add float16 tuple type abi

2023-06-21 Thread juzhe.zh...@rivai.ai

LGTM. Thanks.



juzhe.zh...@rivai.ai
 
From: shiyulong
Date: 2023-06-21 15:39
To: gcc-patches
CC: palmer; kito.cheng; jeffreyalaw; juzhe.zhong; pan2.li; wuwei2016; jiawei; 
shihua; dje.gcc; pinskia; yulong
Subject: [PATCH V1] RISC-V:Add float16 tuple type abi
From: yulong 
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.
 
---
gcc/config/riscv/vector.md|  31 ++-
.../gcc.target/riscv/rvv/base/abi-10.c|  25 ++
.../gcc.target/riscv/rvv/base/abi-11.c|  27 ++-
.../gcc.target/riscv/rvv/base/abi-12.c|  27 ++-
.../gcc.target/riscv/rvv/base/abi-15.c|  27 ++-
.../gcc.target/riscv/rvv/base/abi-17.c| 229 ++
.../gcc.target/riscv/rvv/base/abi-18.c| 229 ++
.../gcc.target/riscv/rvv/base/abi-8.c |  27 ++-
.../gcc.target/riscv/rvv/base/abi-9.c |  25 ++
9 files changed, 630 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-17.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-18.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 884e7435cc2..cd87989b536 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -98,7 +98,12 @@
  VNx2x8HI,VNx3x8HI,VNx4x8HI,VNx5x8HI,VNx6x8HI,VNx7x8HI,VNx8x8HI,\
  VNx2x4HI,VNx3x4HI,VNx4x4HI,VNx5x4HI,VNx6x4HI,VNx7x4HI,VNx8x4HI,\
  VNx2x2HI,VNx3x2HI,VNx4x2HI,VNx5x2HI,VNx6x2HI,VNx7x2HI,VNx8x2HI,\
-   VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI")
+   VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI,\
+ VNx2x32HF,VNx2x16HF,VNx3x16HF,VNx4x16HF,\
+   VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF,\
+   VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF,\
+   VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF,\
+   VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
(const_int 16)
(eq_attr "mode" "VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,\
  VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,\
@@ -156,17 +161,17 @@
   (symbol_ref "riscv_vector::get_vlmul(E_VNx64HImode)")
; Half float point
- (eq_attr "mode" "VNx1HF")
+ (eq_attr "mode" 
"VNx1HF,VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx1HFmode)")
- (eq_attr "mode" "VNx2HF")
+ (eq_attr "mode" 
"VNx2HF,VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx2HFmode)")
- (eq_attr "mode" "VNx4HF")
+ (eq_attr "mode" 
"VNx4HF,VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx4HFmode)")
- (eq_attr "mode" "VNx8HF")
+ (eq_attr "mode" 
"VNx8HF,VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx8HFmode)")
- (eq_attr "mode" "VNx16HF")
+ (eq_attr "mode" "VNx16HF,VNx2x16HF,VNx3x16HF,VNx4x16HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx16HFmode)")
- (eq_attr "mode" "VNx32HF")
+ (eq_attr "mode" "VNx32HF,VNx2x32HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx32HFmode)")
(eq_attr "mode" "VNx64HF")
   (symbol_ref "riscv_vector::get_vlmul(E_VNx64HFmode)")
@@ -249,17 +254,17 @@
   (symbol_ref "riscv_vector::get_ratio(E_VNx64HImode)")
; Half float point.
- (eq_attr "mode" "VNx1HF")
+ (eq_attr "mode" 
"VNx1HF,VNx2x1HF,VNx3x1HF,VNx4x1HF,VNx5x1HF,VNx6x1HF,VNx7x1HF,VNx8x1HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx1HFmode)")
- (eq_attr "mode" "VNx2HF")
+ (eq_attr "mode" 
"VNx2HF,VNx2x2HF,VNx3x2HF,VNx4x2HF,VNx5x2HF,VNx6x2HF,VNx7x2HF,VNx8x2HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx2HFmode)")
- (eq_attr "mode" "VNx4HF")
+ (eq_attr "mode" 
"VNx4HF,VNx2x4HF,VNx3x4HF,VNx4x4HF,VNx5x4HF,VNx6x4HF,VNx7x4HF,VNx8x4HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx4HFmode)")
- (eq_attr "mode" "VNx8HF")
+ (eq_attr "mode" 
"VNx8HF,VNx2x8HF,VNx3x8HF,VNx4x8HF,VNx5x8HF,VNx6x8HF,VNx7x8HF,VNx8x8HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx8HFmode)")
- (eq_attr "mode" "VNx16HF")
+ (eq_attr "mode" "VNx16HF,VNx2x16HF,VNx3x16HF,VNx4x16HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx16HFmode)")
- (eq_attr "mode" "VNx32HF")
+ (eq_attr "mode" "VNx32HF,VNx2x32HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx32HFmode)")
(eq_attr "mode" "VNx64HF")
   (symbol_ref "riscv_vector::get_ratio(E_VNx64HFmode)")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/abi-10.c
index 62dd7c8663c..

[PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-06-21 Thread Richard Biener via Gcc-patches

The following addresses a miscompilation by RTL scheduling related
to the representation of masked stores.  For that we have

(insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
(const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  )
(const_int -4 [0xfffc] [1 MEM 
 [(int *)vectp_b.12_28]+0 S64 A32])
(vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
(mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
(const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  
)
(const_int -4 [0xfffc] [1 MEM 
 [(int *)vectp_b.12_28]+0 S64 A32])

and specifically the memory attributes

  [1 MEM  [(int *)vectp_b.12_28]+0 S64 A32]

are problematic.  They tell us the instruction stores and reads a full
vector which it if course does not.  There isn't any good MEM_EXPR
we can use here (we lack a way to just specify a pointer and restrict
info for example), and since the MEMs have a vector mode it's
difficult in general as passes do not need to look at the memory
attributes at all.

The easiest way to avoid running into the alias analysis problem is
to scrap the MEM_EXPR when we expand the internal functions for
partial loads/stores.  That avoids the disambiguation we run into
which is realizing that we store to an object of less size as
the size of the mode we appear to store.

After the patch we see just

  [1  S64 A32]

so we preserve the alias set, the alignment and the size (the size
is redundant if the MEM insn't BLKmode).  That's still not good
in case the RTL alias oracle would implement the same
disambiguation but it fends off the gimple one.

This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
and --param=vect-partial-vector-usage=1.

On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
side we might instead want to use a BLKmode MEM?  Any better
ideas here?

Thanks,
Richard.

PR middle-end/110237
* internal-fn.cc (expand_partial_load_optab_fn): Clear
MEM_EXPR and MEM_OFFSET.
(expand_partial_store_optab_fn): Likewise.
---
 gcc/internal-fn.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..2dc685e7d85 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 
   mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   gcc_assert (MEM_P (mem));
+  /* The built MEM_REF does not accurately reflect that the load
+ is only partial.  Clear it.  */
+  set_mem_expr (mem, NULL_TREE);
+  clear_mem_offset (mem);
   mask = expand_normal (maskt);
   target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   create_output_operand (&ops[0], target, TYPE_MODE (type));
@@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 
   mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   gcc_assert (MEM_P (mem));
+  /* The built MEM_REF does not accurately reflect that the store
+ is only partial.  Clear it.  */
+  set_mem_expr (mem, NULL_TREE);
+  clear_mem_offset (mem);
   mask = expand_normal (maskt);
   reg = expand_normal (rhs);
   create_fixed_operand (&ops[0], mem);
-- 
2.35.3

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Uros Bizjak via Gcc-patches

On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches
 wrote:
>
> I notice there's some refactor in vectorizable_conversion
> for code_helper,so I've adjusted my patch to that.
> Here's the patch I'm going to commit.
>
> We have already use intermidate type in case WIDEN, but not for NONE,
> this patch extended that.
>
> gcc/ChangeLog:
>
> PR target/110018
> * tree-vect-stmts.cc (vectorizable_conversion): Use
> intermiediate integer type for float_expr/fix_trunc_expr when
> direct optab is not existed.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr110018-1.c: New test.

> +
> +  /* For conversions between float and smaller integer types try whether 
> we
> +can use intermediate signed integer types to support the
> +conversion.  */

I'm trying to enhance testcase coverage with explicit signed/unsigned
types (patch attached), and I have noticed that zero-extension is used
for unsigned types. So, the above comment that mentions only signed
integer types is not entirely correct.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c 
b/gcc/testsuite/gcc.target/i386/pr110018-1.c
index b1baffd7af1..b6a3be7b7a2 100644
--- a/gcc/testsuite/gcc.target/i386/pr110018-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
@@ -4,14 +4,14 @@
 /* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
 
 void
-foo (double* __restrict a, char* b)
+foo (double* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo1 (float* __restrict a, char* b)
+foo1 (float* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -20,7 +20,7 @@ foo1 (float* __restrict a, char* b)
 }
 
 void
-foo2 (_Float16* __restrict a, char* b)
+foo2 (_Float16* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -33,14 +33,14 @@ foo2 (_Float16* __restrict a, char* b)
 }
 
 void
-foo3 (double* __restrict a, short* b)
+foo3 (double* __restrict a, signed short* b)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo4 (float* __restrict a, char* b)
+foo4 (float* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -49,14 +49,14 @@ foo4 (float* __restrict a, char* b)
 }
 
 void
-foo5 (double* __restrict b, char* a)
+foo5 (double* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo6 (float* __restrict b, char* a)
+foo6 (float* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -65,7 +65,7 @@ foo6 (float* __restrict b, char* a)
 }
 
 void
-foo7 (_Float16* __restrict b, char* a)
+foo7 (_Float16* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -78,14 +78,14 @@ foo7 (_Float16* __restrict b, char* a)
 }
 
 void
-foo8 (double* __restrict b, short* a)
+foo8 (double* __restrict b, signed short* a)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo9 (float* __restrict b, char* a)
+foo9 (float* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
diff --git a/gcc/testsuite/gcc.target/i386/pr110018-2.c 
b/gcc/testsuite/gcc.target/i386/pr110018-2.c
new file mode 100644
index 000..a663e074698
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110018-2.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
+/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
+
+void
+foo (double* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo1 (float* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo2 (_Float16* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo3 (double* __restrict a, unsigned short* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo4 (float* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo5 (double* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo6 (float* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo7 (_Float16* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo8 (double* __restrict b, unsigned short* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo9 (float* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}

[PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-21 Thread Pan Li via Gcc-patches

From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
bits for machine mode table.
* lto-streamer-out.cc (lto_write_mode_table): Stream out the
HOST machine mode bits.
* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
as the table size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
as the packing limit.
(bp_unpack_machine_mode): Ditto.
---
 gcc/lto-streamer-in.cc  | 12 
 gcc/lto-streamer-out.cc | 11 ---
 gcc/lto-streamer.h  |  2 ++
 gcc/tree-streamer.cc|  2 +-
 gcc/tree-streamer.h | 14 +-
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2cb83406db5..2a0720b4e6f 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
 internal_error ("cannot read LTO mode table from %s",
file_data->file_name);
 
-  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
-  file_data->mode_table = table;
   const struct lto_simple_header_with_strings *header
 = (const struct lto_simple_header_with_strings *) data;
   int string_offset;
@@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack (&ib);
 
+  unsigned mode_bits = bp_unpack_value (&bp, 5);
+  unsigned char *table = ggc_cleared_vec_alloc (1 << mode_bits);
+
+  file_data->mode_table = table;
+  file_data->mode_bits = mode_bits;
+
   table[VOIDmode] = VOIDmode;
   table[BLKmode] = BLKmode;
   unsigned int m;
-  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
+  while ((m = bp_unpack_value (&bp, mode_bits)) != VOIDmode)
 {
   enum mode_class mclass
= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
   poly_uint16 size = bp_unpack_poly_value (&bp, 16);
   poly_uint16 prec = bp_unpack_poly_value (&bp, 16);
-  machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
+  machine_mode inner = (machine_mode) bp_unpack_value (&bp, mode_bits);
   poly_uint16 nunits = bp_unpack_poly_value (&bp, 16);
   unsigned int ibit = 0, fbit = 0;
   unsigned int real_fmt_len = 0;
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ab2eb4301e..36899283ded 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3196,6 +3196,11 @@ lto_write_mode_table (void)
if (inner_m != m)
  streamer_mode_table[(int) inner_m] = 1;
   }
+
+  /* Pack the mode_bits value within 5 bits (up to 31) in the beginning.  */
+  unsigned mode_bits = ceil_log2 (MAX_MACHINE_MODE);
+  bp_pack_value (&bp, mode_bits, 5);
+
   /* First stream modes that have GET_MODE_INNER (m) == m,
  so that we can refer to them afterwards.  */
   for (int pass = 0; pass < 2; pass++)
@@ -3205,11 +3210,11 @@ lto_write_mode_table (void)
  machine_mode m = (machine_mode) i;
  if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
continue;
- bp_pack_value (&bp, m, 8);
+ bp_pack_value (&bp, m, mode_bits);
  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
  bp_pack_poly_value (&bp, GET_MODE_SIZE (m), 16);
  bp_pack_poly_value (&bp, GET_MODE_PRECISION (m), 16);
- bp_pack_value (&bp, GET_MODE_INNER (m), 8);
+ bp_pack_value (&bp, GET_MODE_INNER (m), mode_bits);
  bp_pack_poly_value (&bp, GET_MODE_NUNITS (m), 16);
  switch (GET_MODE_CLASS (m))
{
@@ -3229,7 +3234,7 @@ lto_write_mode_table (void)
}
  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
}
-  bp_pack_value (&bp, VOIDmode, 8);
+  bp_pack_value (&bp, VOIDmode, mode_bits);
 
   streamer_write_bitpack (&bp);
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index fc7133d07ba..443f0cd616e 100644
--- a/gcc/lto-stream

RE: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2023-06-21 Thread Wang, Yanzhang via Gcc-patches

Hi Jeff, sorry for the late reply.

> The long branch handling is done at the assembler level.  So the clobbering
> of $ra isn't visible to the compiler.  Thus the compiler has to be
> extremely careful to not hold values in $ra because the assembler may
> clobber $ra.

If assembler will modify the $ra behavior, it seems the rules we defined in
the riscv.cc will be ignored. For example, the $ra saving generated by this
patch may be modified by the assmebler and all others depends on it will be
wrong. So implementing the long jump in the compiler is better.

Do I understand it correctly ?

> If you're not going to use dwarf, then my recommendation is to ensure that
> the data you need is *always* available in the stack at known
> offsets.   That will mean your code isn't optimized as well.  It means
> hand written assembly code has to follow the conventions, you can't link
> against libraries that do not follow those conventions, etc etc.  But
> that's the price you pay for not using dwarf (or presumably ORC/SFRAME
> which I haven't studied in detail).

Yes. That's right. All the libraries need to follow the same logic. But as
you said, this is the price if we choose this solution. And fortunately,
this will only be used in special scenarios.

---

And Jeff, do you have any other comments about this patch? Should we add
some descriptions somewhere in the doc?

Thanks,
Yanzhang

> -Original Message-
> From: Jeff Law 
> Sent: Thursday, June 8, 2023 11:05 PM
> To: Wang, Yanzhang ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2
> 
> Subject: Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> 
> 
> On 6/6/23 21:50, Wang, Yanzhang wrote:
> > Hi Jeff,
> >
> > Thanks your comments. I have few questions that I don't quite understand.
> >
> >> One of the things that needs to be upstreamed is long jump support
> >> within a function.  Essentially once a function reaches 1M in size we
> >> have the real possibility that a direct jump may not reach its target.
> >>
> >> To support this I expect that $ra is going to become a fixed register
> >> (ie, not available to the register allocator as a temporary).  It'll
> >> be used as a scratch register for long jump sequences.
> >>
> >> One of the consequences of this is $ra will need to be saved in leaf
> >> functions that are near or over 1M in size.
> >>
> >> Note that at the time when we have to lay out the stack, we do not
> >> know the precise length of the function.  So there's a degree of
> >> "fuzz" in the decision whether or not to save $ra in a function that
> >> is close to the 1M limit.
> >
> > Do you mean that, long jump to more than 1M offset will need multiple
> > jal and each jal will save the $ra ?
> Long jumps are implemnted as an indirect jump which needs a scratch
> register to hold the high part of the jump target address.
> 
> >
> > If yes, I'm confused about what's the influence of the $ra saving for
> > function prologue. We will save the fp+ra at the prologue, the next
> > $ra saving seems will not modify the $ra already saved.
> The long branch handling is done at the assembler level.  So the clobbering
> of $ra isn't visible to the compiler.  Thus the compiler has to be
> extremely careful to not hold values in $ra because the assembler may
> clobber $ra.
> 
> This ultimately comes back to the phase ordering problem.  At register
> allocation time we don't know if we need long jumps or not.  So we don't
> know if $ra is potentially clobbered by the assembler.   A similar phase
> ordering problems exists in the prologue/epilogue generation.
> 
> The other approach to long branch handling would be to do it all in the
> compiler.  I would actually prefer this approach, but it's not likely to
> land in the near term.
> 
> 
> >
> > I think it's yes (not valid) when we want to get the return address to
> > parent function from $ra directly in the function body. But we can get
> > the right return address from fp with offset if we save them at prologue,
> is it right ?
> Right.  You'll be able to get the value of $ra out of the stack.
> 
> 
> 
> >
> >> Meaning that what you really want is to be using
> >> -fno-omit-frame-pointer and for $ra to always be saved in the stack,
> even in a leaf function.
> >
> > This is also another solution but will change the default behavior of
> > -fno-omit-frame-pointer.
> That's OK.  While -f options are target independent options, targets are
> allowed to adjust certain behaviors based on those options.
> 
> If you're not going to use dwarf, then my recommendation is to ensure that
> the data you need is *always* available in the stack at known
> offsets.   That will mean your code isn't optimized as well.  It means
> hand written assembly code has to follow the conventions, you can't link
> against libraries that do not follow those conventions, etc etc.  But
> that's the price you pay for not using dwarf (or presumably ORC/SFRAME
> which I haven't studied in

Re: [PATCH] Update array address space in c_build_qualified_type

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, Jun 21, 2023 at 7:57 AM  wrote:
>
> Hi,
>
>   When c-typeck.cc:c_build_qualified_type builds an array type
>   from its element type, it does not copy the address space of
>   the element type to the array type itself. This is unlike
>   tree.cc:build_array_type_1, which explicitly does
>
> TYPE_ADDR_SPACE (t) = TYPE_ADDR_SPACE (elt_type);
>
>   This causes the ICE described in
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86869.
>
> struct S {
>   char y[2];
> };
>
> extern const __memx  struct S *s;
>
> extern void bar(const __memx void*);
>
> void foo(void) {
>   bar(&s->y);
> }
>
>   build_component_ref calls c_build_qualified_type, passing in the
>   array type and quals including the address space (ADDR_SPACE_MEMX
>   in this case). Because of this missing address space copy, the
>   returned array type remains in the generic address space.  Later
>   down the line, expand_expr_addr_expr detects the mismatch in
>   address space/mode and tries to convert, and that leads to the
>   ICE described in the bug.
>
>   This patch sets the address space of the array type to that of the
>   element type.
>
>   Regression tests for avr look ok. Ok for trunk?

The patch looks OK to me but please let a C frontend maintainer
double-check (I've CCed Joseph for this).

Thanks,
Richard.

> Regards
> Senthil
>
> PR 86869
>
> gcc/c/ChangeLog:
>
> * c-typeck.cc (c_build_qualified_type): Set
> TYPE_ADDR_SPACE for ARRAY_TYPE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/avr/pr86869.c: New test.
>
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 22e240a3c2a..d4ab1d1bd46 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -16284,6 +16284,7 @@ c_build_qualified_type (tree type, int type_quals, 
> tree orig_qual_type,
>
>   t = build_variant_type_copy (type);
>   TREE_TYPE (t) = element_type;
> + TYPE_ADDR_SPACE (t) = TYPE_ADDR_SPACE (element_type);
>
>if (TYPE_STRUCTURAL_EQUALITY_P (element_type)
>|| (domain && TYPE_STRUCTURAL_EQUALITY_P (domain)))
> diff --git a/gcc/testsuite/gcc.target/avr/pr86869.c 
> b/gcc/testsuite/gcc.target/avr/pr86869.c
> new file mode 100644
> index 000..54cd984276e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/avr/pr86869.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu99" } */
> +
> +extern void bar(const __memx void* p);
> +
> +struct S {
> +  char y[2];
> +};
> +extern const __memx struct S *s;
> +
> +void foo(void) {
> +  bar(&s->y);
> +}

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, Jun 21, 2023 at 9:50 AM Uros Bizjak via Gcc-patches
 wrote:
>
> On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches
>  wrote:
> >
> > I notice there's some refactor in vectorizable_conversion
> > for code_helper,so I've adjusted my patch to that.
> > Here's the patch I'm going to commit.
> >
> > We have already use intermidate type in case WIDEN, but not for NONE,
> > this patch extended that.
> >
> > gcc/ChangeLog:
> >
> > PR target/110018
> > * tree-vect-stmts.cc (vectorizable_conversion): Use
> > intermiediate integer type for float_expr/fix_trunc_expr when
> > direct optab is not existed.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr110018-1.c: New test.
>
> > +
> > +  /* For conversions between float and smaller integer types try 
> > whether we
> > +can use intermediate signed integer types to support the
> > +conversion.  */
>
> I'm trying to enhance testcase coverage with explicit signed/unsigned
> types (patch attached), and I have noticed that zero-extension is used
> for unsigned types. So, the above comment that mentions only signed
> integer types is not entirely correct.

The comment says the intermediate sized vector types are always
signed (because float conversions to/from unsigned are always somewhat
awkward), but yes, if the original type was unsigned zero-extension is
used and if it was signed sign-extension.

The testcase adjustments / additions look good to me btw.

Thanks,
Richard.

>
> Uros.

Re: [PATCH V3] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-21 Thread Richard Biener via Gcc-patches

On Tue, 20 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Add 
> LEN_MASK_{LOAD,STORE} vectorizer support.
> (internal_load_fn_p): Ditto.
> (internal_store_fn_p): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_len_load_store_bias): Ditto.
> * optabs-query.cc (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
> (vectorizable_load): Ditto.
> D
> ---
>  gcc/internal-fn.cc |  35 +-
>  gcc/optabs-query.cc|  25 -
>  gcc/tree-vect-stmts.cc | 234 ++---
>  3 files changed, 227 insertions(+), 67 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..e10c21de5f1 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_STORE_LANES:
>return 2;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_MASK_SCATTER_STORE:
>return 4;
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_argno = 3;
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_argno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
>   return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
>   return -1;
>  }
>  
> diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> index 276f8408dd7..4394d391200 100644
> --- a/gcc/optabs-query.cc
> +++ b/gcc/optabs-query.cc
> @@ -566,11 +566,14 @@ can_vec_mask_load_store_p (machine_mode mode,
>  bool is_load)
>  {
>optab op = is_load ? maskload_optab : maskstore_optab;
> +  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
>machine_mode vmode;
>  
>/* If mode is vector mode, check it directly.  */
>if (VECTOR_MODE_P (mode))
> -return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
> +return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing
> +|| convert_optab_handler (len_op, mode, mask_mode)
> + != CODE_FOR_nothing;
>  
>/* Otherwise, return true if there is some vector mode with
>   the mask load/store supported.  */
> @@ -584,7 +587,9 @@ can_vec_mask_load_store_p (machine_m

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>  wrote:
>>
>> We have already use intermidate type in case WIDEN, but not for NONE,
>> this patch extended that.
>>
>> I didn't do that in pattern recog since we need to know whether the
>> stmt belongs to any slp_node to decide the vectype, the related optabs
>> are checked according to vectype_in and vectype_out. For non-slp case,
>> vec_pack/unpack are always used when lhs has different size from rhs,
>> for slp case, sometimes vec_pack/unpack is used, somethings
>> direct conversion is used.
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> Ok for trunk?
>>
>> gcc/ChangeLog:
>>
>> PR target/110018
>> * tree-vect-stmts.cc (vectorizable_conversion): Use
>> intermiediate integer type for float_expr/fix_trunc_expr when
>> direct optab is not existed.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/i386/pr110018-1.c: New test.
>> ---
>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++
>>  gcc/tree-vect-stmts.cc | 56 -
>>  2 files changed, 149 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c 
>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> new file mode 100644
>> index 000..b1baffd7af1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> @@ -0,0 +1,94 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>> +
>> +void
>> +foo (double* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo1 (float* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo2 (_Float16* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +  a[4] = b[4];
>> +  a[5] = b[5];
>> +  a[6] = b[6];
>> +  a[7] = b[7];
>> +}
>> +
>> +void
>> +foo3 (double* __restrict a, short* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo4 (float* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo5 (double* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo6 (float* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo7 (_Float16* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +  a[4] = b[4];
>> +  a[5] = b[5];
>> +  a[6] = b[6];
>> +  a[7] = b[7];
>> +}
>> +
>> +void
>> +foo8 (double* __restrict b, short* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo9 (float* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index bd3b07a3aa1..1118c89686d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>> return false;
>>if (supportable_convert_operation (code, vectype_out, vectype_in, 
>> &code1))
>> break;
>
> A comment would be nice here.  Like
>
>/* For conversions between float and smaller integer types try whether we 
> can
>   use intermediate signed integer types to support the conversion.  */
>
>> +  if ((code == FLOAT_EXPR
>> +  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>> + || (code == FIX_TRUNC_EXPR
>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))

Is the FIX_TRUNC_EXPR case safe without some flag?

#include 
int32_t x = (int32_t)0x1.0p32;
int32_t y = (int32_t)(int64_t)0x1.0p32;

sets x to 2147483647 and y to 0.

Thanks,
Richard

>> +   {
>> + bool float_expr_p = code == FLOAT_EXPR;
>> + scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
>> + fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
>> + code1 = float_expr_p ? code : NOP_EXPR;
>> + codecvt1 = float_expr_p ? NOP_EXPR : code;
>> + FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
>> +   {
>> + imode = rhs_mode_iter.require ();
>> + if (GET_MODE_SIZE (imode) > fltsz)
>> +   break;
>> +
>> + cvt_type
>> +   = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
>> + 0);
>> + cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
>> + slp_

Re: Re: [PATCH V3] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-21 Thread juzhe.zh...@rivai.ai

Hi, Richi. Thanks so much for the review and comments.

>> Can you instead adjust get_len_load_store_mode and
>>can_vec_mask_load_store_p to provide the optab they matched on
>>via the corresponding IFN code as additional output (add a
>>pointer argument, you can default it to nullptr and only
>>fill in the detail in the context that need it)?
>>Like above the _len case can then simply
>>take precedence.

Do you mean I remove partial_or_mask_vector_ifn
then use can_vec_mask_load_store_p or get_len_load_store_mode to calculate IFN 
instead?

I tried to add pointer argument internal_fn * into get_len_load_store_mode but 
it fail to compile:

../../../riscv-gcc/gcc/optabs-query.h:191:63: error: ‘internal_fn’ has not been 
declared
 opt_machine_mode get_len_load_store_mode (machine_mode, bool, internal_fn* = 
nullptr);

I am not sure whether I am on the same page with you.
Could you help me with that?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-21 16:53
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
On Tue, 20 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Add 
> LEN_MASK_{LOAD,STORE} vectorizer support.
> (internal_load_fn_p): Ditto.
> (internal_store_fn_p): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_len_load_store_bias): Ditto.
> * optabs-query.cc (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
> (vectorizable_load): Ditto.
> D
> ---
>  gcc/internal-fn.cc |  35 +-
>  gcc/optabs-query.cc|  25 -
>  gcc/tree-vect-stmts.cc | 234 ++---
>  3 files changed, 227 insertions(+), 67 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..e10c21de5f1 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_STORE_LANES:
>return 2;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_MASK_SCATTER_STORE:
>return 4;
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_argno = 3;
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_argno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
>  return 0;
> -  if (insn_operand_ma

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-21 Thread Richard Biener via Gcc-patches

On Tue, 20 Jun 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Mon, 19 Jun 2023, Richard Sandiford wrote:
> >
> >> Jeff Law  writes:
> >> > On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
> >> >> IVOPTs has strip_offset which suffers from the same issues regarding
> >> >> integer overflow that split_constant_offset did but the latter was
> >> >> fixed quite some time ago.  The following implements strip_offset
> >> >> in terms of split_constant_offset, removing the redundant and
> >> >> incorrect implementation.
> >> >> 
> >> >> The implementations are not exactly the same, strip_offset relies
> >> >> on ptrdiff_tree_p to fend off too large offsets while 
> >> >> split_constant_offset
> >> >> simply assumes those do not happen and truncates them.  By
> >> >> the same means strip_offset also handles POLY_INT_CSTs but
> >> >> split_constant_offset does not.  Massaging the latter to
> >> >> behave like strip_offset in those cases might be the way to go?
> >> >> 
> >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> 
> >> >> Comments?
> >> >> 
> >> >> Thanks,
> >> >> Richard.
> >> >> 
> >> >> PR tree-optimization/110243
> >> >> * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
> >> >> (strip_offset): Make it a wrapper around split_constant_offset.
> >> >> 
> >> >> * gcc.dg/torture/pr110243.c: New testcase.
> >> > Your call -- IMHO you know this code far better than I.
> >> 
> >> +1, but LGTM FWIW.  I couldn't see anything obvious (and valid)
> >> that split_offset_1 handles and split_constant_offset doesn't.
> >
> > I think it's only the INTEGER_CST vs. ptrdiff_tree_p where the
> > latter (used in split_offset_1) handles POLY_INT_CSTs.  split_offset
> > also computes the offset in poly_int64 and checks it fits
> > (to some extent) while split_constant_offset simply converts all
> > INTEGER_CSTs to ssizetype because it knows it starts from addresses
> > only.
> >
> > An alternative fix would have been to rewrite signed arithmetic
> > to unsigned in strip_offset_1.
> >
> > I wonder if we want to change split_constant_offset to record the
> > offset in a poly_int64 and have a wrapper converting it back to
> > a tree for data-ref analysis.
> 
> Sounds a good idea if it's easily doable.
> 
> > Then we can at least put cst_and_fits_in_hwi checks in the code?
> 
> What would they be protecting against, if we're dealing with
> address arithmetic?

While tree-data-ref.cc deals with address arithmetic only IVOPTs
at least also runs it on general IVs, for example for uses
in the exit condition.

Adding the following to strip_offset

  gcc_assert (POINTER_TYPE_P (TREE_TYPE (expr))
  || (INTEGRAL_TYPE_P (TREE_TYPE (expr))
  && TYPE_PRECISION (TREE_TYPE (expr)) <= TYPE_PRECISION 
(sizetype)));

runs into ICEs when testing a 32bit target.

But IVOPTs only makes use of the computed offset when it strips
it off address uses.  But what I only now realized is that
IVOPTs strip_offset is also used by loop distribution.

I'm going to split the patch in two at least to make these things
more obvious before changing the implementation.

> > The code also tracks a range so it doesn't look like handling
> > POLY_INT_CSTs is easy there - do you remember whether that was
> > important for IVOPTs?
> 
> Got to admit that:
> 
> tree
> strip_offset (tree expr, poly_uint64_pod *offset)
> {
>   poly_int64 off;
>   tree core = strip_offset_1 (expr, false, false, &off);
>   if (!off.is_constant ())
> {
>   core = expr;
>   off = 0;
> }
>   *offset = off;
>   return core;
> }
> 
> doesn't seem to trigger any testsuite failures from a quick test
> (but not a full regtest).

I see.

Thanks,
Richard.

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> Richard Biener via Gcc-patches  writes:
>> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>>  wrote:
>>>
>>> We have already use intermidate type in case WIDEN, but not for NONE,
>>> this patch extended that.
>>>
>>> I didn't do that in pattern recog since we need to know whether the
>>> stmt belongs to any slp_node to decide the vectype, the related optabs
>>> are checked according to vectype_in and vectype_out. For non-slp case,
>>> vec_pack/unpack are always used when lhs has different size from rhs,
>>> for slp case, sometimes vec_pack/unpack is used, somethings
>>> direct conversion is used.
>>>
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> Ok for trunk?
>>>
>>> gcc/ChangeLog:
>>>
>>> PR target/110018
>>> * tree-vect-stmts.cc (vectorizable_conversion): Use
>>> intermiediate integer type for float_expr/fix_trunc_expr when
>>> direct optab is not existed.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/i386/pr110018-1.c: New test.
>>> ---
>>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++
>>>  gcc/tree-vect-stmts.cc | 56 -
>>>  2 files changed, 149 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>>>
>>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c 
>>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>>> new file mode 100644
>>> index 000..b1baffd7af1
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>>> @@ -0,0 +1,94 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>>> +
>>> +void
>>> +foo (double* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo1 (float* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo2 (_Float16* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +  a[4] = b[4];
>>> +  a[5] = b[5];
>>> +  a[6] = b[6];
>>> +  a[7] = b[7];
>>> +}
>>> +
>>> +void
>>> +foo3 (double* __restrict a, short* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo4 (float* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo5 (double* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo6 (float* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo7 (_Float16* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +  a[4] = b[4];
>>> +  a[5] = b[5];
>>> +  a[6] = b[6];
>>> +  a[7] = b[7];
>>> +}
>>> +
>>> +void
>>> +foo8 (double* __restrict b, short* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo9 (float* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index bd3b07a3aa1..1118c89686d 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>>> return false;
>>>if (supportable_convert_operation (code, vectype_out, vectype_in, 
>>> &code1))
>>> break;
>>
>> A comment would be nice here.  Like
>>
>>/* For conversions between float and smaller integer types try whether we 
>> can
>>   use intermediate signed integer types to support the conversion.  */
>>
>>> +  if ((code == FLOAT_EXPR
>>> +  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>>> + || (code == FIX_TRUNC_EXPR
>>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
>
> Is the FIX_TRUNC_EXPR case safe without some flag?
>
> #include 
> int32_t x = (int32_t)0x1.0p32;
> int32_t y = (int32_t)(int64_t)0x1.0p32;
>
> sets x to 2147483647 and y to 0.

Also, I think multi_step_cvt should influence the costs, since at
the moment we cost one statement but generate two.  This makes a
difference for SVE with VECT_COMPARE_COSTS.  Would changing it to:

  vect_model_simple_cost (vinfo, stmt_info,
  ncopies * (multi_step_cvt + 1),
  dt, ndts, slp_node,
  cost_vec);

be OK?

There again, I wonder if we should handle this using patterns instead.
That makes both conversions explicit and therefore easier to cost.

E.g. for SVE, an integer extension is free if the source is a load,
and we do try

[PATCH][committed][docs]: replace backslashchar [PR 110329].

2023-06-21 Thread Tamar Christina via Gcc-patches

Hi All,

It seems like @blackslashchar{} is a relatively new addition
to texinfo.  Other parts of the docs use @samp{\} so use it
here too so older distros work.

Bootstrapped on aarch64-none-linux-gnu and no issues.

committed under obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

PR other/110329
* doc/md.texi: Replace backslashchar.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
052375b1a31b829303e75417c400024f084aef44..9648fdc846abf1700effe3272d5523538ce9b50f
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -837,7 +837,7 @@ blocks on the line.
 @item
 Within an @samp{@{@@} block, any iterators that do not get expanded will result
 in an error.  If for some reason it is required to have @code{<} or @code{>} in
-the output then these must be escaped using @backslashchar{}.
+the output then these must be escaped using @samp{\}.
 
 @item
 It is possible to use the @samp{attrs} list to specify some attributes and to




-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
052375b1a31b829303e75417c400024f084aef44..9648fdc846abf1700effe3272d5523538ce9b50f
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -837,7 +837,7 @@ blocks on the line.
 @item
 Within an @samp{@{@@} block, any iterators that do not get expanded will result
 in an error.  If for some reason it is required to have @code{<} or @code{>} in
-the output then these must be escaped using @backslashchar{}.
+the output then these must be escaped using @samp{\}.
 
 @item
 It is possible to use the @samp{attrs} list to specify some attributes and to

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-21 Thread Jan Hubicka via Gcc-patches

> 
> If I manually add a __builtin_unreachable () to the above case
> I see the *(int *)0 = 0; store DSEd.  Maybe we should avoid
> removing stores that might trap here?  POSIX wise such a trap
> could be a way to jump out of the path leading to unreachable ()
> via siglongjmp ...

I am not sure how much POSIX actually promises here.
I don't think we are supposed to keep such undefined behaviours in
original order.  We compile:

int test (int *a, int *b, int c)
{
int res = *a;
return res + *b / c;
}

to:

test:
.LFB0:
.cfi_startproc
movl(%rsi), %eax
movl%edx, %ecx
cltd
idivl   %ecx
addl(%rdi), %eax
ret

So we read *b before *a.  Passing a==NULL, b non-null and c==0 and
using signal sigsev to recover the program before division by 0 will not
work with optimization.

Reaching unreachable is always undefined behaviour so I think we are
safe to reorder it with a load.
Honza


> 
> Thanks,
> Richard.

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, 21 Jun 2023, Richard Biener wrote:

> On Tue, 20 Jun 2023, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Mon, 19 Jun 2023, Richard Sandiford wrote:
> > >
> > >> Jeff Law  writes:
> > >> > On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
> > >> >> IVOPTs has strip_offset which suffers from the same issues regarding
> > >> >> integer overflow that split_constant_offset did but the latter was
> > >> >> fixed quite some time ago.  The following implements strip_offset
> > >> >> in terms of split_constant_offset, removing the redundant and
> > >> >> incorrect implementation.
> > >> >> 
> > >> >> The implementations are not exactly the same, strip_offset relies
> > >> >> on ptrdiff_tree_p to fend off too large offsets while 
> > >> >> split_constant_offset
> > >> >> simply assumes those do not happen and truncates them.  By
> > >> >> the same means strip_offset also handles POLY_INT_CSTs but
> > >> >> split_constant_offset does not.  Massaging the latter to
> > >> >> behave like strip_offset in those cases might be the way to go?
> > >> >> 
> > >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >> >> 
> > >> >> Comments?
> > >> >> 
> > >> >> Thanks,
> > >> >> Richard.
> > >> >> 
> > >> >>   PR tree-optimization/110243
> > >> >>   * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
> > >> >>   (strip_offset): Make it a wrapper around split_constant_offset.
> > >> >> 
> > >> >>   * gcc.dg/torture/pr110243.c: New testcase.
> > >> > Your call -- IMHO you know this code far better than I.
> > >> 
> > >> +1, but LGTM FWIW.  I couldn't see anything obvious (and valid)
> > >> that split_offset_1 handles and split_constant_offset doesn't.
> > >
> > > I think it's only the INTEGER_CST vs. ptrdiff_tree_p where the
> > > latter (used in split_offset_1) handles POLY_INT_CSTs.  split_offset
> > > also computes the offset in poly_int64 and checks it fits
> > > (to some extent) while split_constant_offset simply converts all
> > > INTEGER_CSTs to ssizetype because it knows it starts from addresses
> > > only.
> > >
> > > An alternative fix would have been to rewrite signed arithmetic
> > > to unsigned in strip_offset_1.
> > >
> > > I wonder if we want to change split_constant_offset to record the
> > > offset in a poly_int64 and have a wrapper converting it back to
> > > a tree for data-ref analysis.
> > 
> > Sounds a good idea if it's easily doable.
> > 
> > > Then we can at least put cst_and_fits_in_hwi checks in the code?
> > 
> > What would they be protecting against, if we're dealing with
> > address arithmetic?
> 
> While tree-data-ref.cc deals with address arithmetic only IVOPTs
> at least also runs it on general IVs, for example for uses
> in the exit condition.
> 
> Adding the following to strip_offset
> 
>   gcc_assert (POINTER_TYPE_P (TREE_TYPE (expr))
>   || (INTEGRAL_TYPE_P (TREE_TYPE (expr))
>   && TYPE_PRECISION (TREE_TYPE (expr)) <= TYPE_PRECISION 
> (sizetype)));
> 
> runs into ICEs when testing a 32bit target.
> 
> But IVOPTs only makes use of the computed offset when it strips
> it off address uses.  But what I only now realized is that
> IVOPTs strip_offset is also used by loop distribution.
> 
> I'm going to split the patch in two at least to make these things
> more obvious before changing the implementation.

Hmm, but still split_constant_offset for example does

case MULT_EXPR:
  if (TREE_CODE (op1) != INTEGER_CST)
return false;

  split_constant_offset (op0, &var0, &off0, &op0_range, cache, limit);
  op1_range.set (TREE_TYPE (op1), wi::to_wide (op1), wi::to_wide 
(op1));
  *off = size_binop (MULT_EXPR, off0, fold_convert (ssizetype, op1));
  if (!compute_distributive_range (type, op0_range, code, op1_range,
   off, result_range))
return false;
  *var = fold_build2 (MULT_EXPR, sizetype, var0,
  fold_convert (sizetype, op1));

so *var is affected as well since we might truncate op1 here.  In
fact we at the end do

  if (INTEGRAL_TYPE_P (type))
*var = fold_convert (sizetype, *var);

so truncate things (the API is documented to do that).

The issue in the PR the change is fixing is that we end up with
an expression that overflows but uses signed arithmetic and so
we miscompile it later.  IIRC the fixes to split_constant_offset
always were that the sum of the base + offset wasn't equal to
the original expression, right?

Richard.

Re: Re: [PATCH V3] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-21 Thread juzhe.zh...@rivai.ai

Hi, Richi.

I received again:

"host smtp-in1.suse.de[195.135.220.23] said: 452 4.3.1 Insufficient system 
storage (in reply to MAIL FROM command)"

that I failed to send you the last email.

Now, I CC you again, this is last email:

Hi, Richi. Thanks so much for the review and comments.
>> Can you instead adjust get_len_load_store_mode and
>>can_vec_mask_load_store_p to provide the optab they matched on
>>via the corresponding IFN code as additional output (add a
>>pointer argument, you can default it to nullptr and only
>>fill in the detail in the context that need it)?
>>Like above the _len case can then simply
>>take precedence.

Do you mean I remove partial_or_mask_vector_ifn
then use can_vec_mask_load_store_p or get_len_load_store_mode to calculate IFN 
instead?
I tried to add pointer argument internal_fn * into get_len_load_store_mode but 
it fail to compile:
../../../riscv-gcc/gcc/optabs-query.h:191:63: error: ‘internal_fn’ has not been 
declared
 opt_machine_mode get_len_load_store_mode (machine_mode, bool, internal_fn* = 
nullptr);
I am not sure whether I am on the same page with you.
Could you help me with that?
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-21 16:53
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
On Tue, 20 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Add 
> LEN_MASK_{LOAD,STORE} vectorizer support.
> (internal_load_fn_p): Ditto.
> (internal_store_fn_p): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_len_load_store_bias): Ditto.
> * optabs-query.cc (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
> (vectorizable_load): Ditto.
> D
> ---
>  gcc/internal-fn.cc |  35 +-
>  gcc/optabs-query.cc|  25 -
>  gcc/tree-vect-stmts.cc | 234 ++---
>  3 files changed, 227 insertions(+), 67 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..e10c21de5f1 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_STORE_LANES:
>return 2;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_MASK_SCATTER_STORE:
>return 4;
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_argno = 3;
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_argno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For n

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Richard Biener via Gcc-patches  writes:
> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> >>  wrote:
> >>>
> >>> We have already use intermidate type in case WIDEN, but not for NONE,
> >>> this patch extended that.
> >>>
> >>> I didn't do that in pattern recog since we need to know whether the
> >>> stmt belongs to any slp_node to decide the vectype, the related optabs
> >>> are checked according to vectype_in and vectype_out. For non-slp case,
> >>> vec_pack/unpack are always used when lhs has different size from rhs,
> >>> for slp case, sometimes vec_pack/unpack is used, somethings
> >>> direct conversion is used.
> >>>
> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>> Ok for trunk?
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> PR target/110018
> >>> * tree-vect-stmts.cc (vectorizable_conversion): Use
> >>> intermiediate integer type for float_expr/fix_trunc_expr when
> >>> direct optab is not existed.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> * gcc.target/i386/pr110018-1.c: New test.
> >>> ---
> >>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++
> >>>  gcc/tree-vect-stmts.cc | 56 -
> >>>  2 files changed, 149 insertions(+), 1 deletion(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>>
> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c 
> >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>> new file mode 100644
> >>> index 000..b1baffd7af1
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>> @@ -0,0 +1,94 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
> >>> +
> >>> +void
> >>> +foo (double* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo1 (float* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo2 (_Float16* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +  a[4] = b[4];
> >>> +  a[5] = b[5];
> >>> +  a[6] = b[6];
> >>> +  a[7] = b[7];
> >>> +}
> >>> +
> >>> +void
> >>> +foo3 (double* __restrict a, short* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo4 (float* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo5 (double* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo6 (float* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo7 (_Float16* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +  a[4] = b[4];
> >>> +  a[5] = b[5];
> >>> +  a[6] = b[6];
> >>> +  a[7] = b[7];
> >>> +}
> >>> +
> >>> +void
> >>> +foo8 (double* __restrict b, short* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo9 (float* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index bd3b07a3aa1..1118c89686d 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
> >>> return false;
> >>>if (supportable_convert_operation (code, vectype_out, vectype_in, 
> >>> &code1))
> >>> break;
> >>
> >> A comment would be nice here.  Like
> >>
> >>/* For conversions between float and smaller integer types try whether 
> >> we can
> >>   use intermediate signed integer types to support the conversion.  */
> >>
> >>> +  if ((code == FLOAT_EXPR
> >>> +  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> >>> + || (code == FIX_TRUNC_EXPR
> >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
> >
> > Is the FIX_TRUNC_EXPR case safe without some flag?
> >
> > #include 
> > int32_t x = (int32_t)0x1.0p32;
> > int32_t y = (int32_t)(int64_t)0x1.0p32;
> >
> > sets x to 2147483647 and y to 0.

Hmm, good question.  GENERIC has a direct truncation to unsigned char
for example, the C standard generally says if the integral part cannot
be represented then the behavior is undefined.  So I think we should be
safe here (0x1.0p32 doesn't fit an int).

> A

[PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread Juzhe-Zhong

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
(fnma4): Ditto.
(*fnma): Ditto.
(fms4): Ditto.
(*fms): Ditto.
(fnms4): Ditto.
(*fnms): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
function.
* config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
* config/riscv/vector.md: Fix attribute bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Add floating-point 
teranary tests.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.

---
 gcc/config/riscv/autovec.md   | 180 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  21 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/ternop/ternop-1.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-10.c  |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-11.c  |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-12.c  |  28 +++
 .../riscv/rvv/autovec/ternop/ternop-2.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-3.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-4.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-5.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-6.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-7.c   |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-8.c   |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-9.c   |  28 +++
 .../riscv/rvv/autovec/ternop/ternop_run-1.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-10.c  |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-11.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-12.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-2.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-3.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-4.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-5.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-6.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-7.c   |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-8.c   |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-9.c   |  60 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-1.c|  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-10.c   |  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-11.c   |  55 ++
 .

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-21 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> The issue in the PR the change is fixing is that we end up with
> an expression that overflows but uses signed arithmetic and so
> we miscompile it later.  IIRC the fixes to split_constant_offset
> always were that the sum of the base + offset wasn't equal to
> the original expression, right?

Yeah, that's right.  (sizetype)(INT_MIN - foo) was split into
(sizetype)INT_MIN + (sizetype)(-foo), and so we ended up with
((sizetype)INT_MIN)*2 rather than 0 for foo==INT_MIN.

Unfortunately, it looks like a lot of the discussion happened on
irc (my fault) and I didn't keep logs.

Thanks,
Richard

[PATCH][committed] aarch64: Convert SVE gather patterns to compact syntax

2023-06-21 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.

The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.

No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md 
(mask_gather_load):
Convert to compact alternatives syntax.
(mask_gather_load): Likewise.
(*mask_gather_load_xtw_unpacked): Likewise.
(*mask_gather_load_sxtw): Likewise.
(*mask_gather_load_uxtw): Likewise.
(@aarch64_gather_load_):
Likewise.

(@aarch64_gather_load_):
Likewise.
(*aarch64_gather_load_
_xtw_unpacked): Likewise.
(*aarch64_gather_load_
_sxtw): Likewise.
(*aarch64_gather_load_
_uxtw): Likewise.
(@aarch64_ldff1_gather): Likewise.
(@aarch64_ldff1_gather): Likewise.
(*aarch64_ldff1_gather_sxtw): Likewise.
(*aarch64_ldff1_gather_uxtw): Likewise.
(@aarch64_ldff1_gather_
): Likewise.
(@aarch64_ldff1_gather_
): Likewise.
(*aarch64_ldff1_gather_
_sxtw): Likewise.
(*aarch64_ldff1_gather_
_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt): Likewise.
(@aarch64_gather_ldnt_
): Likewise.


gather-compact.patch
Description: gather-compact.patch

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford
>  wrote:
>>
>> Richard Sandiford  writes:
>> > Richard Biener via Gcc-patches  writes:
>> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>> >>  wrote:
>> >>>
>> >>> We have already use intermidate type in case WIDEN, but not for NONE,
>> >>> this patch extended that.
>> >>>
>> >>> I didn't do that in pattern recog since we need to know whether the
>> >>> stmt belongs to any slp_node to decide the vectype, the related optabs
>> >>> are checked according to vectype_in and vectype_out. For non-slp case,
>> >>> vec_pack/unpack are always used when lhs has different size from rhs,
>> >>> for slp case, sometimes vec_pack/unpack is used, somethings
>> >>> direct conversion is used.
>> >>>
>> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> >>> Ok for trunk?
>> >>>
>> >>> gcc/ChangeLog:
>> >>>
>> >>> PR target/110018
>> >>> * tree-vect-stmts.cc (vectorizable_conversion): Use
>> >>> intermiediate integer type for float_expr/fix_trunc_expr when
>> >>> direct optab is not existed.
>> >>>
>> >>> gcc/testsuite/ChangeLog:
>> >>>
>> >>> * gcc.target/i386/pr110018-1.c: New test.
>> >>> ---
>> >>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++
>> >>>  gcc/tree-vect-stmts.cc | 56 -
>> >>>  2 files changed, 149 insertions(+), 1 deletion(-)
>> >>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>>
>> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c 
>> >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>> new file mode 100644
>> >>> index 000..b1baffd7af1
>> >>> --- /dev/null
>> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>> @@ -0,0 +1,94 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>> >>> +
>> >>> +void
>> >>> +foo (double* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo1 (float* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo2 (_Float16* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +  a[4] = b[4];
>> >>> +  a[5] = b[5];
>> >>> +  a[6] = b[6];
>> >>> +  a[7] = b[7];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo3 (double* __restrict a, short* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo4 (float* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo5 (double* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo6 (float* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo7 (_Float16* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +  a[4] = b[4];
>> >>> +  a[5] = b[5];
>> >>> +  a[6] = b[6];
>> >>> +  a[7] = b[7];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo8 (double* __restrict b, short* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo9 (float* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> >>> index bd3b07a3aa1..1118c89686d 100644
>> >>> --- a/gcc/tree-vect-stmts.cc
>> >>> +++ b/gcc/tree-vect-stmts.cc
>> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>> >>> return false;
>> >>>if (supportable_convert_operation (code, vectype_out, vectype_in, 
>> >>> &code1))
>> >>> break;
>> >>
>> >> A comment would be nice here.  Like
>> >>
>> >>/* For conversions between float and smaller integer types try whether 
>> >> we can
>> >>   use intermediate signed integer types to support the conversion.  */
>> >>
>> >>> +  if ((code == FLOAT_EXPR
>> >>> +  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>> >>> + || (code == FIX_TRUNC_EXPR
>> >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
>> >
>> > Is the FIX_TRUNC_EXPR case safe without some flag?
>> >
>> > #include 
>> > int32_t x = (int32_t)0x1.0p32;
>> > int32_t y = (int32_t)(int64_t)0x1.0p32;
>> >
>> > sets x to 2147483647 and y to 0.
>
> Hmm, good question.  GENERIC has a direct truncation to unsigne

[PATCH 1/3] Hide and refactor IVOPTs strip_offset

2023-06-21 Thread Richard Biener via Gcc-patches

PR110243 shows strip_offset has some correctness issues, the following
avoids using it from loop distribution which can use the more correct
split_constant_offset from data-ref analysis instead.  The patch then
un-exports the function and refactors it to make it obvious the
actual constant offset is only interesting in address-cases.

The series is split, removing one strip_offset at a time for easier
bisecting of code generation quality issues.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-loop-distribution.cc (classify_builtin_st): Use
split_constant_offset.
* tree-ssa-loop-ivopts.h (strip_offset): Remove.
* tree-ssa-loop-ivopts.cc (strip_offset): Make static.
Refactor to make the offset operand optional and assert
we are dealing with addresses if its required.
---
 gcc/tree-loop-distribution.cc |  9 +
 gcc/tree-ssa-loop-ivopts.cc   | 22 ++
 gcc/tree-ssa-loop-ivopts.h|  1 -
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 6291f941a21..cf7c197aaf7 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -1756,11 +1756,12 @@ classify_builtin_st (loop_p loop, partition *partition, 
data_reference_p dr)
   return;
 }
 
-  poly_uint64 base_offset;
-  unsigned HOST_WIDE_INT const_base_offset;
-  tree base_base = strip_offset (base, &base_offset);
-  if (!base_offset.is_constant (&const_base_offset))
+  tree base_offset;
+  tree base_base;
+  split_constant_offset (base, &base_base, &base_offset);
+  if (!cst_and_fits_in_hwi (base_offset))
 return;
+  unsigned HOST_WIDE_INT const_base_offset = int_cst_value (base_offset);
 
   struct builtin_info *builtin;
   builtin = alloc_builtin (dr, NULL, base, NULL_TREE, size);
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6fbd2d59318..7978c80b39e 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1175,6 +1175,9 @@ contain_complex_addr_expr (tree expr)
   return res;
 }
 
+static tree
+strip_offset (tree expr, poly_uint64_pod *offset = nullptr);
+
 /* Allocates an induction variable with given initial value BASE and step STEP
for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  */
 
@@ -2942,12 +2945,16 @@ strip_offset_1 (tree expr, bool inside_addr, bool 
top_compref,
 
 /* Strips constant offsets from EXPR and stores them to OFFSET.  */
 
-tree
+static tree
 strip_offset (tree expr, poly_uint64_pod *offset)
 {
   poly_int64 off;
   tree core = strip_offset_1 (expr, false, false, &off);
-  *offset = off;
+  if (offset)
+{
+  gcc_assert (POINTER_TYPE_P (expr));
+  *offset = off;
+}
   return core;
 }
 
@@ -3512,7 +3519,6 @@ add_iv_candidate_derived_from_uses (struct ivopts_data 
*data)
 static void
 add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 {
-  poly_uint64 offset;
   tree base;
   struct iv *iv = use->iv;
   tree basetype = TREE_TYPE (iv->base);
@@ -3563,8 +3569,8 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
struct iv_use *use)
 
   /* Record common candidate with constant offset stripped in base.
  Like the use itself, we also add candidate directly for it.  */
-  base = strip_offset (iv->base, &offset);
-  if (maybe_ne (offset, 0U) || base != iv->base)
+  base = strip_offset (iv->base);
+  if (base != iv->base)
 {
   record_common_cand (data, base, iv->step, use);
   add_candidate (data, base, iv->step, false, use);
@@ -3582,9 +3588,9 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
struct iv_use *use)
   step = fold_convert (sizetype, step);
   record_common_cand (data, base, step, use);
   /* Also record common candidate with offset stripped.  */
-  base = strip_offset (base, &offset);
-  if (maybe_ne (offset, 0U))
-   record_common_cand (data, base, step, use);
+  tree alt_base = strip_offset (base);
+  if (alt_base != base)
+   record_common_cand (data, alt_base, step, use);
 }
 
   /* At last, add auto-incremental candidates.  Make such variables
diff --git a/gcc/tree-ssa-loop-ivopts.h b/gcc/tree-ssa-loop-ivopts.h
index 95148616e70..7a53ce47f10 100644
--- a/gcc/tree-ssa-loop-ivopts.h
+++ b/gcc/tree-ssa-loop-ivopts.h
@@ -28,7 +28,6 @@ extern void dump_cand (FILE *, struct iv_cand *);
 extern bool contains_abnormal_ssa_name_p (tree);
 extern class loop *outermost_invariant_loop_for_expr (class loop *, tree);
 extern bool expr_invariant_in_loop_p (class loop *, tree);
-extern tree strip_offset (tree, poly_uint64_pod *);
 bool may_be_nonaddressable_p (tree expr);
 void tree_ssa_iv_optimize (void);
 
-- 
2.35.3

[PATCH 2/3] Less strip_offset in IVOPTs

2023-06-21 Thread Richard Biener via Gcc-patches

This avoids a strip_offset use in record_group_use where we know
it operates on addresses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-loop-ivopts.cc (record_group_use): Use
split_constant_offset.
---
 gcc/tree-ssa-loop-ivopts.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7978c80b39e..65caf382bba 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1175,9 +1175,6 @@ contain_complex_addr_expr (tree expr)
   return res;
 }
 
-static tree
-strip_offset (tree expr, poly_uint64_pod *offset = nullptr);
-
 /* Allocates an induction variable with given initial value BASE and step STEP
for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  */
 
@@ -1609,7 +1606,10 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 {
   unsigned int i;
 
-  addr_base = strip_offset (iv->base, &addr_offset);
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (iv->base)));
+  tree addr_toffset;
+  split_constant_offset (iv->base, &addr_base, &addr_toffset);
+  addr_offset = int_cst_value (addr_toffset);
   for (i = 0; i < data->vgroups.length (); i++)
{
  struct iv_use *use;
-- 
2.35.3

[PATCH 3/3] Less strip_offset in IVOPTs

2023-06-21 Thread Richard Biener via Gcc-patches

This avoids one strip_offset use in add_iv_candidate_for_use where
we know it operates on a sizetype quantity.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-loop-ivopts.cc (add_iv_candidate_for_use): Use
split_constant_offset for the POINTER_PLUS_EXPR case.
---
 gcc/tree-ssa-loop-ivopts.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 65caf382bba..03161813e0a 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -3588,8 +3588,9 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
struct iv_use *use)
   step = fold_convert (sizetype, step);
   record_common_cand (data, base, step, use);
   /* Also record common candidate with offset stripped.  */
-  tree alt_base = strip_offset (base);
-  if (alt_base != base)
+  tree alt_base, alt_offset;
+  split_constant_offset (base, &alt_base, &alt_offset);
+  if (!integer_zerop (alt_offset))
record_common_cand (data, alt_base, step, use);
 }
 
-- 
2.35.3

[PATCH]: [NFC] Move can_vec_mask_load_store_p and get_len_load_store_mode from "optabs-query" into "optabs-tree"

2023-06-21 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Since we want both can_vec_mask_load_store_p and get_len_load_store_mode
can see "internal_fn", move these 2 functions into optabs-tree.

gcc/ChangeLog:

* optabs-query.cc (can_vec_mask_load_store_p): Move to optabs-tree.cc.
(get_len_load_store_mode): Ditto.
* optabs-query.h (can_vec_mask_load_store_p): Move to optabs-tree.h.
(get_len_load_store_mode): Ditto.
* optabs-tree.cc (can_vec_mask_load_store_p): New function.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-if-conv.cc: include optabs-tree instead of optabs-query

---
 gcc/optabs-query.cc | 66 -
 gcc/optabs-query.h  |  2 --
 gcc/optabs-tree.cc  | 65 
 gcc/optabs-tree.h   |  2 ++
 gcc/tree-if-conv.cc |  2 +-
 5 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 276f8408dd7..2fdd0d34354 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -558,72 +558,6 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
   return 0;
 }
 
-/* Return true if target supports vector masked load/store for mode.  */
-
-bool
-can_vec_mask_load_store_p (machine_mode mode,
-  machine_mode mask_mode,
-  bool is_load)
-{
-  optab op = is_load ? maskload_optab : maskstore_optab;
-  machine_mode vmode;
-
-  /* If mode is vector mode, check it directly.  */
-  if (VECTOR_MODE_P (mode))
-return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
-
-  /* Otherwise, return true if there is some vector mode with
- the mask load/store supported.  */
-
-  /* See if there is any chance the mask load or store might be
- vectorized.  If not, punt.  */
-  scalar_mode smode;
-  if (!is_a  (mode, &smode))
-return false;
-
-  vmode = targetm.vectorize.preferred_simd_mode (smode);
-  if (VECTOR_MODE_P (vmode)
-  && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
-  && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
-return true;
-
-  auto_vector_modes vector_modes;
-  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
-  for (machine_mode base_mode : vector_modes)
-if (related_vector_mode (base_mode, smode).exists (&vmode)
-   && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
-   && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
-  return true;
-  return false;
-}
-
-/* If target supports vector load/store with length for vector mode MODE,
-   return the corresponding vector mode, otherwise return opt_machine_mode ().
-   There are two flavors for vector load/store with length, one is to measure
-   length with bytes, the other is to measure length with lanes.
-   As len_{load,store} optabs point out, for the flavor with bytes, we use
-   VnQI to wrap the other supportable same size vector modes.  */
-
-opt_machine_mode
-get_len_load_store_mode (machine_mode mode, bool is_load)
-{
-  optab op = is_load ? len_load_optab : len_store_optab;
-  gcc_assert (VECTOR_MODE_P (mode));
-
-  /* Check if length in lanes supported for this mode directly.  */
-  if (direct_optab_handler (op, mode))
-return mode;
-
-  /* Check if length in bytes supported for same vector size VnQI.  */
-  machine_mode vmode;
-  poly_uint64 nunits = GET_MODE_SIZE (mode);
-  if (related_vector_mode (mode, QImode, nunits).exists (&vmode)
-  && direct_optab_handler (op, vmode))
-return vmode;
-
-  return opt_machine_mode ();
-}
-
 /* Return true if there is a compare_and_swap pattern.  */
 
 bool
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index b266d2fe990..043e9791bc1 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -187,8 +187,6 @@ enum insn_code find_widening_optab_handler_and_mode (optab, 
machine_mode,
 machine_mode,
 machine_mode *);
 int can_mult_highpart_p (machine_mode, bool);
-bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
-opt_machine_mode get_len_load_store_mode (machine_mode, bool);
 bool can_compare_and_swap_p (machine_mode, bool);
 bool can_atomic_exchange_p (machine_mode, bool);
 bool can_atomic_load_p (machine_mode);
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 33ca529094d..77bf745ae40 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -543,3 +543,68 @@ target_supports_op_p (tree type, enum tree_code code,
  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
 }
 
+/* Return true if target supports vector masked load/store for mode.  */
+
+bool
+can_vec_mask_load_store_p (machine_mode mode,
+  machine_mode mask_mode,
+  bool is_load)
+{
+  optab op = is_load ? masklo

Re: [PATCH]: [NFC] Move can_vec_mask_load_store_p and get_len_load_store_mode from "optabs-query" into "optabs-tree"

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, 21 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Since we want both can_vec_mask_load_store_p and get_len_load_store_mode
> can see "internal_fn", move these 2 functions into optabs-tree.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * optabs-query.cc (can_vec_mask_load_store_p): Move to optabs-tree.cc.
> (get_len_load_store_mode): Ditto.
> * optabs-query.h (can_vec_mask_load_store_p): Move to optabs-tree.h.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.cc (can_vec_mask_load_store_p): New function.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-if-conv.cc: include optabs-tree instead of optabs-query
> 
> ---
>  gcc/optabs-query.cc | 66 -
>  gcc/optabs-query.h  |  2 --
>  gcc/optabs-tree.cc  | 65 
>  gcc/optabs-tree.h   |  2 ++
>  gcc/tree-if-conv.cc |  2 +-
>  5 files changed, 68 insertions(+), 69 deletions(-)
> 
> diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> index 276f8408dd7..2fdd0d34354 100644
> --- a/gcc/optabs-query.cc
> +++ b/gcc/optabs-query.cc
> @@ -558,72 +558,6 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
>return 0;
>  }
>  
> -/* Return true if target supports vector masked load/store for mode.  */
> -
> -bool
> -can_vec_mask_load_store_p (machine_mode mode,
> -machine_mode mask_mode,
> -bool is_load)
> -{
> -  optab op = is_load ? maskload_optab : maskstore_optab;
> -  machine_mode vmode;
> -
> -  /* If mode is vector mode, check it directly.  */
> -  if (VECTOR_MODE_P (mode))
> -return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
> -
> -  /* Otherwise, return true if there is some vector mode with
> - the mask load/store supported.  */
> -
> -  /* See if there is any chance the mask load or store might be
> - vectorized.  If not, punt.  */
> -  scalar_mode smode;
> -  if (!is_a  (mode, &smode))
> -return false;
> -
> -  vmode = targetm.vectorize.preferred_simd_mode (smode);
> -  if (VECTOR_MODE_P (vmode)
> -  && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> -  && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> -return true;
> -
> -  auto_vector_modes vector_modes;
> -  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
> -  for (machine_mode base_mode : vector_modes)
> -if (related_vector_mode (base_mode, smode).exists (&vmode)
> - && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> - && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> -  return true;
> -  return false;
> -}
> -
> -/* If target supports vector load/store with length for vector mode MODE,
> -   return the corresponding vector mode, otherwise return opt_machine_mode 
> ().
> -   There are two flavors for vector load/store with length, one is to measure
> -   length with bytes, the other is to measure length with lanes.
> -   As len_{load,store} optabs point out, for the flavor with bytes, we use
> -   VnQI to wrap the other supportable same size vector modes.  */
> -
> -opt_machine_mode
> -get_len_load_store_mode (machine_mode mode, bool is_load)
> -{
> -  optab op = is_load ? len_load_optab : len_store_optab;
> -  gcc_assert (VECTOR_MODE_P (mode));
> -
> -  /* Check if length in lanes supported for this mode directly.  */
> -  if (direct_optab_handler (op, mode))
> -return mode;
> -
> -  /* Check if length in bytes supported for same vector size VnQI.  */
> -  machine_mode vmode;
> -  poly_uint64 nunits = GET_MODE_SIZE (mode);
> -  if (related_vector_mode (mode, QImode, nunits).exists (&vmode)
> -  && direct_optab_handler (op, vmode))
> -return vmode;
> -
> -  return opt_machine_mode ();
> -}
> -
>  /* Return true if there is a compare_and_swap pattern.  */
>  
>  bool
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index b266d2fe990..043e9791bc1 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -187,8 +187,6 @@ enum insn_code find_widening_optab_handler_and_mode 
> (optab, machine_mode,
>machine_mode,
>machine_mode *);
>  int can_mult_highpart_p (machine_mode, bool);
> -bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
> -opt_machine_mode get_len_load_store_mode (machine_mode, bool);
>  bool can_compare_and_swap_p (machine_mode, bool);
>  bool can_atomic_exchange_p (machine_mode, bool);
>  bool can_atomic_load_p (machine_mode);
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 33ca529094d..77bf745ae40 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,3 +543,68 @@ target_supports_op_p (tree type, enum tree_code code,
> && optab_handler

Re: [PATCH 1/3] Hide and refactor IVOPTs strip_offset

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, 21 Jun 2023, Richard Biener wrote:

> PR110243 shows strip_offset has some correctness issues, the following
> avoids using it from loop distribution which can use the more correct
> split_constant_offset from data-ref analysis instead.  The patch then
> un-exports the function and refactors it to make it obvious the
> actual constant offset is only interesting in address-cases.
> 
> The series is split, removing one strip_offset at a time for easier
> bisecting of code generation quality issues.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> 
>   * tree-loop-distribution.cc (classify_builtin_st): Use
>   split_constant_offset.
>   * tree-ssa-loop-ivopts.h (strip_offset): Remove.
>   * tree-ssa-loop-ivopts.cc (strip_offset): Make static.
>   Refactor to make the offset operand optional and assert
>   we are dealing with addresses if its required.

Just noticed this refactoring is a left-over and I edited it out of
this series, will push after re-bootstrapping.

Richard.

> ---
>  gcc/tree-loop-distribution.cc |  9 +
>  gcc/tree-ssa-loop-ivopts.cc   | 22 ++
>  gcc/tree-ssa-loop-ivopts.h|  1 -
>  3 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index 6291f941a21..cf7c197aaf7 100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -1756,11 +1756,12 @@ classify_builtin_st (loop_p loop, partition 
> *partition, data_reference_p dr)
>return;
>  }
>  
> -  poly_uint64 base_offset;
> -  unsigned HOST_WIDE_INT const_base_offset;
> -  tree base_base = strip_offset (base, &base_offset);
> -  if (!base_offset.is_constant (&const_base_offset))
> +  tree base_offset;
> +  tree base_base;
> +  split_constant_offset (base, &base_base, &base_offset);
> +  if (!cst_and_fits_in_hwi (base_offset))
>  return;
> +  unsigned HOST_WIDE_INT const_base_offset = int_cst_value (base_offset);
>  
>struct builtin_info *builtin;
>builtin = alloc_builtin (dr, NULL, base, NULL_TREE, size);
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6fbd2d59318..7978c80b39e 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -1175,6 +1175,9 @@ contain_complex_addr_expr (tree expr)
>return res;
>  }
>  
> +static tree
> +strip_offset (tree expr, poly_uint64_pod *offset = nullptr);
> +
>  /* Allocates an induction variable with given initial value BASE and step 
> STEP
> for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  */
>  
> @@ -2942,12 +2945,16 @@ strip_offset_1 (tree expr, bool inside_addr, bool 
> top_compref,
>  
>  /* Strips constant offsets from EXPR and stores them to OFFSET.  */
>  
> -tree
> +static tree
>  strip_offset (tree expr, poly_uint64_pod *offset)
>  {
>poly_int64 off;
>tree core = strip_offset_1 (expr, false, false, &off);
> -  *offset = off;
> +  if (offset)
> +{
> +  gcc_assert (POINTER_TYPE_P (expr));
> +  *offset = off;
> +}
>return core;
>  }
>  
> @@ -3512,7 +3519,6 @@ add_iv_candidate_derived_from_uses (struct ivopts_data 
> *data)
>  static void
>  add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
>  {
> -  poly_uint64 offset;
>tree base;
>struct iv *iv = use->iv;
>tree basetype = TREE_TYPE (iv->base);
> @@ -3563,8 +3569,8 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
> struct iv_use *use)
>  
>/* Record common candidate with constant offset stripped in base.
>   Like the use itself, we also add candidate directly for it.  */
> -  base = strip_offset (iv->base, &offset);
> -  if (maybe_ne (offset, 0U) || base != iv->base)
> +  base = strip_offset (iv->base);
> +  if (base != iv->base)
>  {
>record_common_cand (data, base, iv->step, use);
>add_candidate (data, base, iv->step, false, use);
> @@ -3582,9 +3588,9 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
> struct iv_use *use)
>step = fold_convert (sizetype, step);
>record_common_cand (data, base, step, use);
>/* Also record common candidate with offset stripped.  */
> -  base = strip_offset (base, &offset);
> -  if (maybe_ne (offset, 0U))
> - record_common_cand (data, base, step, use);
> +  tree alt_base = strip_offset (base);
> +  if (alt_base != base)
> + record_common_cand (data, alt_base, step, use);
>  }
>  
>/* At last, add auto-incremental candidates.  Make such variables
> diff --git a/gcc/tree-ssa-loop-ivopts.h b/gcc/tree-ssa-loop-ivopts.h
> index 95148616e70..7a53ce47f10 100644
> --- a/gcc/tree-ssa-loop-ivopts.h
> +++ b/gcc/tree-ssa-loop-ivopts.h
> @@ -28,7 +28,6 @@ extern void dump_cand (FILE *, struct iv_cand *);
>  extern bool contains_abnormal_ssa_name_p (tree);
>  extern class loop *outermost_invariant_loop_for_expr (class loop *, tree);
>  extern bool expr_invariant_in_loop_p (class loop *,

[committed] [gcc-12] libstdc++: avoid bogus -Wrestrict [PR105651]

2023-06-21 Thread Jonathan Wakely via Gcc-patches

I've pushed Jason's patch from https://gcc.gnu.org/PR105651#c17 to the
gcc-12 branch, because Jakub's fix on gcc-13 isn't possible to backport.

Tested x86_64-linux, pushed to gcc-12.

-- >8 --

PR tree-optimization/105651

libstdc++-v3/ChangeLog:

* include/bits/basic_string.tcc (_M_replace): Add an assert
to avoid -Wrestrict false positive.
---
 libstdc++-v3/include/bits/basic_string.tcc | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index 0696b96604c..48fa28e6466 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -529,6 +529,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  const size_type __nleft = (__p + __len1) - __s;
  this->_S_move(__p, __s, __nleft);
+ // Tell the middle-end that the copy can't overlap
+ // (PR105651).
+ if (__len2 < __nleft)
+   __builtin_unreachable();
  this->_S_copy(__p + __nleft, __p + __len2,
__len2 - __nleft);
}
-- 
2.40.1

RE: [PATCH]: [NFC] Move can_vec_mask_load_store_p and get_len_load_store_mode from "optabs-query" into "optabs-tree"

2023-06-21 Thread Li, Pan2 via Gcc-patches

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, June 21, 2023 7:42 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH]: [NFC] Move can_vec_mask_load_store_p and 
get_len_load_store_mode from "optabs-query" into "optabs-tree"

On Wed, 21 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Since we want both can_vec_mask_load_store_p and get_len_load_store_mode
> can see "internal_fn", move these 2 functions into optabs-tree.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * optabs-query.cc (can_vec_mask_load_store_p): Move to optabs-tree.cc.
> (get_len_load_store_mode): Ditto.
> * optabs-query.h (can_vec_mask_load_store_p): Move to optabs-tree.h.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.cc (can_vec_mask_load_store_p): New function.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-if-conv.cc: include optabs-tree instead of optabs-query
> 
> ---
>  gcc/optabs-query.cc | 66 -
>  gcc/optabs-query.h  |  2 --
>  gcc/optabs-tree.cc  | 65 
>  gcc/optabs-tree.h   |  2 ++
>  gcc/tree-if-conv.cc |  2 +-
>  5 files changed, 68 insertions(+), 69 deletions(-)
> 
> diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> index 276f8408dd7..2fdd0d34354 100644
> --- a/gcc/optabs-query.cc
> +++ b/gcc/optabs-query.cc
> @@ -558,72 +558,6 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
>return 0;
>  }
>  
> -/* Return true if target supports vector masked load/store for mode.  */
> -
> -bool
> -can_vec_mask_load_store_p (machine_mode mode,
> -machine_mode mask_mode,
> -bool is_load)
> -{
> -  optab op = is_load ? maskload_optab : maskstore_optab;
> -  machine_mode vmode;
> -
> -  /* If mode is vector mode, check it directly.  */
> -  if (VECTOR_MODE_P (mode))
> -return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
> -
> -  /* Otherwise, return true if there is some vector mode with
> - the mask load/store supported.  */
> -
> -  /* See if there is any chance the mask load or store might be
> - vectorized.  If not, punt.  */
> -  scalar_mode smode;
> -  if (!is_a  (mode, &smode))
> -return false;
> -
> -  vmode = targetm.vectorize.preferred_simd_mode (smode);
> -  if (VECTOR_MODE_P (vmode)
> -  && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> -  && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> -return true;
> -
> -  auto_vector_modes vector_modes;
> -  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
> -  for (machine_mode base_mode : vector_modes)
> -if (related_vector_mode (base_mode, smode).exists (&vmode)
> - && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> - && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> -  return true;
> -  return false;
> -}
> -
> -/* If target supports vector load/store with length for vector mode MODE,
> -   return the corresponding vector mode, otherwise return opt_machine_mode 
> ().
> -   There are two flavors for vector load/store with length, one is to measure
> -   length with bytes, the other is to measure length with lanes.
> -   As len_{load,store} optabs point out, for the flavor with bytes, we use
> -   VnQI to wrap the other supportable same size vector modes.  */
> -
> -opt_machine_mode
> -get_len_load_store_mode (machine_mode mode, bool is_load)
> -{
> -  optab op = is_load ? len_load_optab : len_store_optab;
> -  gcc_assert (VECTOR_MODE_P (mode));
> -
> -  /* Check if length in lanes supported for this mode directly.  */
> -  if (direct_optab_handler (op, mode))
> -return mode;
> -
> -  /* Check if length in bytes supported for same vector size VnQI.  */
> -  machine_mode vmode;
> -  poly_uint64 nunits = GET_MODE_SIZE (mode);
> -  if (related_vector_mode (mode, QImode, nunits).exists (&vmode)
> -  && direct_optab_handler (op, vmode))
> -return vmode;
> -
> -  return opt_machine_mode ();
> -}
> -
>  /* Return true if there is a compare_and_swap pattern.  */
>  
>  bool
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index b266d2fe990..043e9791bc1 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -187,8 +187,6 @@ enum insn_code find_widening_optab_handler_and_mode 
> (optab, machine_mode,
>machine_mode,
>machine_mode *);
>  int can_mult_highpart_p (machine_mode, bool);
> -bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
> -opt_machine_mode get_len_load_store_mode (machine_mode, bool);
>  bool can_compare_and_swap_p (machine_mode, bool);
>  bool ca

Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Jan Beulich via Gcc-patches

On 21.06.2023 09:44, Jan Beulich wrote:
> On 21.06.2023 09:37, Hongtao Liu wrote:
>> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
>>  wrote:
>>>
>>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
>> According to comments, yes, no extra prefix is needed.
>>
>> ;; There are also additional prefixes in 3DNOW, SSSE3.
>> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte,
>> ;; sseiadd1,ssecvt1 to 0f7a with no DREX byte.
>> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.
> 
> Right, that's what triggered my question. I guess dropping these
> "prefix_extra" really wants to be a separate patch (or maybe even
> multiple, but it's hard to see how to split), dealing with all of the
> instances which likely have accumulated simply via copy-and-paste.

Or wait - I'm altering those lines anyway, so I could as well drop
them right away (and slightly shrink patch size), if that's okay with
you. Of course I should then not forget to also mention this in the
changelog entry.

Jan

[PATCH][committed] aarch64: Avoid same input and output Z register for gather loads

2023-06-21 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software 
Optimization
Guides for Arm cores recommend that as well.
This means that for code like:
#include 

svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
  return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}

we'll want to avoid generating the current:
food:
ld1dz0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
add z0.d, z1.d, z0.d
ret

However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.

This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'

This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.

The benchmark numbers on a Neoverse V1 on fprate look okay:
diff
503.bwaves_r0.00%
507.cactuBSSN_r 0.00%
508.namd_r  0.00%
510.parest_r0.55%
511.povray_r0.22%
519.lbm_r   0.00%
521.wrf_r   0.00%
526.blender_r   0.00%
527.cam4_r  0.56%
538.imagick_r   0.00%
544.nab_r   0.00%
549.fotonik3d_r 0.00%
554.roms_r  0.00%
fprate  0.10%

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

P.S. I had messed up my previous commit of 
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622456.html
by squashing the config/aarch64 changes with this patch.
I have reverted that one commit and reapplied it properly (as it should have 
been a no-op) and am pushing this commit on top of that.
Sorry for the churn in the repo.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md 
(mask_gather_load):
Add alternatives to prefer to avoid same input and output Z register.
(mask_gather_load): Likewise.
(*mask_gather_load_xtw_unpacked): Likewise.
(*mask_gather_load_sxtw): Likewise.
(*mask_gather_load_uxtw): Likewise.
(@aarch64_gather_load_):
Likewise.

(@aarch64_gather_load_):
Likewise.
(*aarch64_gather_load_
_xtw_unpacked): Likewise.
(*aarch64_gather_load_
_sxtw): Likewise.
(*aarch64_gather_load_
_uxtw): Likewise.
(@aarch64_ldff1_gather): Likewise.
(@aarch64_ldff1_gather): Likewise.
(*aarch64_ldff1_gather_sxtw): Likewise.
(*aarch64_ldff1_gather_uxtw): Likewise.
(@aarch64_ldff1_gather_
): Likewise.
(@aarch64_ldff1_gather_
): Likewise.
(*aarch64_ldff1_gather_
_sxtw): Likewise.
(*aarch64_ldff1_gather_
_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt): Likewise.
(@aarch64_gather_ldnt_
): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/gather_earlyclobber.c: New test.
* gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.


gather-earlyclobber.patch
Description: gather-earlyclobber.patch

Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

LGTM apart from a tiny nit:

> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,

Seems like you copied this from the non-fp ternary part but the
rest of the file uses RVV_INSN_OPERANDS_MAX already.  Could you adjust this
as well as emit_vlmax_ternary_insn?

Thanks.

Regards
 Robin

[PATCH v2] RISC-V: Implement autovec copysign.

2023-06-21 Thread Robin Dapp via Gcc-patches

Hi,

changes from v1:
 - Removed UNSPEC_VNCOPYSIGN
 - Adjusted xorsign test expectation.

Regards
 Robin



This adds vector copysign, ncopysign and xorsign as well as the
accompanying tests.

gcc/ChangeLog:

* config/riscv/autovec.md (copysign3): Add expander.
(xorsign3): Dito.
* config/riscv/riscv-vector-builtins-bases.cc (class vfsgnjn):
New class.
* config/riscv/vector-iterators.md (copysign): Remove ncopysign.
(xorsign): Dito.
(n): Dito.
(x): Dito.
* config/riscv/vector.md (@pred_ncopysign): Split off.
(@pred_ncopysign_scalar): Dito.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 43 +
 .../riscv/riscv-vector-builtins-bases.cc  | 18 +++-
 gcc/config/riscv/vector-iterators.md  | 10 +--
 gcc/config/riscv/vector.md| 43 +
 .../riscv/rvv/autovec/binop/copysign-run.c| 89 +++
 .../rvv/autovec/binop/copysign-rv32gcv.c  | 11 +++
 .../rvv/autovec/binop/copysign-rv64gcv.c  | 11 +++
 .../rvv/autovec/binop/copysign-template.h | 78 
 .../rvv/autovec/binop/copysign-zvfh-run.c | 83 +
 9 files changed, 377 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1641d7e1ea..f2e69aaf102 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -804,3 +804,46 @@ (define_expand "3"
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; 
---
+;;  [FP] Sign copying
+;; 
---
+;; Includes:
+;; - vfsgnj.vv/vfsgnjn.vv
+;; - vfsgnj.vf/vfsgnjn.vf
+;; 
---
+
+;; Leave the pattern like this as to still allow combine to match
+;; a negated copysign (see vector.md) before adding the UNSPEC_VPREDICATE 
later.
+(define_insn_and_split "copysign3"
+  [(set (match_operand:VF 0 "register_operand"  "=vd, vd, vr, vr")
+(unspec:VF
+ [(match_operand:VF 1 "register_operand"" vr, vr, vr, vr")
+ (match_operand:VF 2 "register_operand" " vr, vr, vr, vr")] 
UNSPEC_VCOPYSIGN))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VCOPYSIGN, mode),
+riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+  [(set_attr "type" "vfsgnj")
+   (set_attr "mode" "")])
+
+;; 
---
+;; Includes:
+;; - vfsgnjx.vv
+;; - vfsgnjx.vf
+;; 
---
+(define_expand "xorsign3"
+  [(match_operand:VF_AUTO 0 "register_operand")
+(match_operand:VF_AUTO 1 "register_operand")
+(match_operand:VF_AUTO 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VXORSIGN, mode),
+riscv_vector::RVV_BINOP, operands);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index c6c53dc13a5..6b8fec47ac9 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1212,7 +1212,7 @@ public:
   }
 };
 
-/* Implements vfsqrt7/vfrec7/vfclass/vfsgnj/vfsgnjn/vfsgnjx.  */
+/* Implements vfsqrt7/vfrec7/vfclass/vfsgnj/vfsgnjx.  */
 template
 class float_misc : public function_base
 {
@@ -1227,6 +1227,20 @@ public:
   }
 };
 
+/* Implements vfsgnjn.  */
+class vfsgnjn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_exact_insn (code_for_pred_ncopysign_scalar (e.vector_mode 
()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_exact_insn (code_for_pred_ncopysign (e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 /* Implements vmfeq/v

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-21 Thread Jeff Law via Gcc-patches





On 6/21/23 00:41, Richard Biener wrote:

I thought during the introduction of erroneous path isolation that we
concluded stores, calls and such had observable side effects that must be
preserved, even when we hit a block that leads to __builtin_unreachable.


Indeed, I remember we repeatedly hit this in the past.  But
double-checking I see that we instrument

   if (x)
 *(int *)0 = 0;

as

[local count: 1073741824]:
   if (x_2(D) != 0)
 goto ; [50.00%]
   else
 goto ; [50.00%]

[local count: 536870913]:
   MEM[(int *)0B] ={v} 0;
   __builtin_trap ();

path isolation doesn't seem to use __builtin_unreachable ().  I did
not add __builtin_trap () as possible sink (but I did want to treat
__builtin_unreachable () and __builtin_unreachable_trap () the same
way).  The pass also marks the offending store as volatile.
Nuts.  I mixed up trap vs unreachable in my own head.  Though I think 
for the purposes of this issue they should be treated the same.  The 
only difference is one actively halts the code, the other silently lets 
it keep going.




So yes, I think preserving the original trap kind (if there is any)
is important and it still seems to work.  I don't remember whether
we have any test coverage for that though.  I'll also note that
__builtin_trap () has virtual operands (def and use) while
__builtin_unreachable[_trap] () are 'const'.  Honza correctly
says they should probably be novops instead of 'const' preserving
the fact that they have side-effects.
If we have test coverage it's probably minimal -- a few things to 
validate proper behavior around builtin_trap plus whatever regressions 
came up.




I think it's desirable for assertions.  Since we elide plain
__builtin_unreachable () and fall thru whereever it leads that
shouldn't be an issue.

If I manually add a __builtin_unreachable () to the above case
I see the *(int *)0 = 0; store DSEd.  Maybe we should avoid
removing stores that might trap here?  POSIX wise such a trap
could be a way to jump out of the path leading to unreachable ()
via siglongjmp ...
Yea, removing the store seemswrong .  As you note, someone could have a 
handler to catch the store, then longjump elsewhere to continue some 
kind of sensible execution.


The erroneous path isolation bits have some code to try and clean up the 
bogus path a bit.  Ideally leaving a single statement with undefined 
beahvior and the trap.  I wonder if there's any code you could re-use 
from that effort.


Essentially a mini pass that cleans up paths post-dominated by a 
builtin_unreachable or builtin_trap.




jeff

Re: [PATCH v2] RISC-V: Implement autovec copysign.

2023-06-21 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-21 20:57
To: juzhe.zh...@rivai.ai; gcc-patches; palmer; kito.cheng; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH v2] RISC-V: Implement autovec copysign.
Hi,
 
changes from v1:
- Removed UNSPEC_VNCOPYSIGN
- Adjusted xorsign test expectation.
 
Regards
Robin
 
 
 
This adds vector copysign, ncopysign and xorsign as well as the
accompanying tests.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (copysign3): Add expander.
(xorsign3): Dito.
* config/riscv/riscv-vector-builtins-bases.cc (class vfsgnjn):
New class.
* config/riscv/vector-iterators.md (copysign): Remove ncopysign.
(xorsign): Dito.
(n): Dito.
(x): Dito.
* config/riscv/vector.md (@pred_ncopysign): Split off.
(@pred_ncopysign_scalar): Dito.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c: New test.
---
gcc/config/riscv/autovec.md   | 43 +
.../riscv/riscv-vector-builtins-bases.cc  | 18 +++-
gcc/config/riscv/vector-iterators.md  | 10 +--
gcc/config/riscv/vector.md| 43 +
.../riscv/rvv/autovec/binop/copysign-run.c| 89 +++
.../rvv/autovec/binop/copysign-rv32gcv.c  | 11 +++
.../rvv/autovec/binop/copysign-rv64gcv.c  | 11 +++
.../rvv/autovec/binop/copysign-template.h | 78 
.../rvv/autovec/binop/copysign-zvfh-run.c | 83 +
9 files changed, 377 insertions(+), 9 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1641d7e1ea..f2e69aaf102 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -804,3 +804,46 @@ (define_expand "3"
riscv_vector::RVV_BINOP, operands);
   DONE;
})
+
+;; 
---
+;;  [FP] Sign copying
+;; 
---
+;; Includes:
+;; - vfsgnj.vv/vfsgnjn.vv
+;; - vfsgnj.vf/vfsgnjn.vf
+;; 
---
+
+;; Leave the pattern like this as to still allow combine to match
+;; a negated copysign (see vector.md) before adding the UNSPEC_VPREDICATE 
later.
+(define_insn_and_split "copysign3"
+  [(set (match_operand:VF 0 "register_operand"  "=vd, vd, vr, vr")
+(unspec:VF
+ [(match_operand:VF 1 "register_operand"" vr, vr, vr, vr")
+ (match_operand:VF 2 "register_operand" " vr, vr, vr, vr")] 
UNSPEC_VCOPYSIGN))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VCOPYSIGN, mode),
+ riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+  [(set_attr "type" "vfsgnj")
+   (set_attr "mode" "")])
+
+;; 
---
+;; Includes:
+;; - vfsgnjx.vv
+;; - vfsgnjx.vf
+;; 
---
+(define_expand "xorsign3"
+  [(match_operand:VF_AUTO 0 "register_operand")
+(match_operand:VF_AUTO 1 "register_operand")
+(match_operand:VF_AUTO 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VXORSIGN, mode),
+ riscv_vector::RVV_BINOP, operands);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index c6c53dc13a5..6b8fec47ac9 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1212,7 +1212,7 @@ public:
   }
};
-/* Implements vfsqrt7/vfrec7/vfclass/vfsgnj/vfsgnjn/vfsgnjx.  */
+/* Implements vfsqrt7/vfrec7/vfclass/vfsgnj/vfsgnjx.  */
template
class float_misc : public function_base
{
@@ -1227,6 +1227,20 @@ public:
   }
};
+/* Implements vfsgnjn.  */
+class vfsgnjn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_exact_insn (code_for_pred_ncopysign_scalar (e.vector_mode 
()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_exact_insn (code_for_pred_ncopysign (e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
/* Implements vmfeq/vmfne/vmflt/vmfgt/vmfl

[PATCH V2] RISC-V: Support RVV floating-point auto-vectorization

2023-06-21 Thread Juzhe-Zhong

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
(fnma4): Ditto.
(*fnma): Ditto.
(fms4): Ditto.
(*fms): Ditto.
(fnms4): Ditto.
(*fnms): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
function.
* config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
* config/riscv/vector.md: Fix attribute bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Add floating-point 
teranary tests.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.

---
 gcc/config/riscv/autovec.md   | 180 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  49 +++--
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/ternop/ternop-1.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-10.c  |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-11.c  |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-12.c  |  28 +++
 .../riscv/rvv/autovec/ternop/ternop-2.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-3.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-4.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-5.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-6.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-7.c   |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-8.c   |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-9.c   |  28 +++
 .../riscv/rvv/autovec/ternop/ternop_run-1.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-10.c  |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-11.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-12.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-2.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-3.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-4.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-5.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-6.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-7.c   |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-8.c   |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-9.c   |  60 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-1.c|  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-10.c   |  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-11.c   |  55 ++

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Uros Bizjak via Gcc-patches

On Wed, Jun 21, 2023 at 10:22 AM Richard Biener
 wrote:

> > > +  /* For conversions between float and smaller integer types try 
> > > whether we
> > > +can use intermediate signed integer types to support the
> > > +conversion.  */
> >
> > I'm trying to enhance testcase coverage with explicit signed/unsigned
> > types (patch attached), and I have noticed that zero-extension is used
> > for unsigned types. So, the above comment that mentions only signed
> > integer types is not entirely correct.
>
> The comment says the intermediate sized vector types are always
> signed (because float conversions to/from unsigned are always somewhat
> awkward), but yes, if the original type was unsigned zero-extension is
> used and if it was signed sign-extension.
>
> The testcase adjustments / additions look good to me btw.

Thanks, pushed with the following ChangeLog:

vect: Add testcases for unsigned conversions [PR110018]

Also test conversions with unsigned types.

PR target/110018

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110018-1.c: Use explicit signed types.
* gcc.target/i386/pr110018-2.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.

Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread Jeff Law via Gcc-patches





On 6/21/23 05:12, Juzhe-Zhong wrote:

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

 * config/riscv/autovec.md (fma4): New pattern.
 (*fma): Ditto.
 (fnma4): Ditto.
 (*fnma): Ditto.
 (fms4): Ditto.
 (*fms): Ditto.
 (fnms4): Ditto.
 (*fnms): Ditto.
 * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
function.
 * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
 * config/riscv/vector.md: Fix attribute bug.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Add floating-point 
teranary tests.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
 * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.

---




+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VF_AUTO 0 "register_operand"   "=vr, vr, ?&vr")
+   (fma:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand" " %0, vr,   vr")
+ (match_operand:VF_AUTO 2 "register_operand" " vr, vr,   vr")
+ (match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr")))
+   (clobber (match_scratch:SI 4 "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+PUT_MODE (operands[4], Pmode);
Maybe this has already been answered, but why not get the mode right in 
the expander & pattern as opposed to blindly changing it in the C fragment?


It's probably technically safe to do what you've done, mostly because by 
the time the C code runs, we've turned the scratch into a hard register 
and I suspect the code to allocate scratches probably constructs a new 
hard reg for each scratch rather than using a shared object.


But if we can, let's get the mode right from the beginning.

I'll note the existing define_insn_and_splits for integer FMAs have this 
same wart.



jeff

Re: Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread 钟居哲

I failed to make Pmode of the of operand.

I have tried the following

clobber (match_dup_4)
But it causes to many issues. I do many tries turns out only the current 
solution can work.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-21 23:15
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support RVV floating-point ternary 
auto-vectorization
 
 
On 6/21/23 05:12, Juzhe-Zhong wrote:
> This patch adds RVV floating-point auto-vectorization.
> Also, fix attribute bug of floating-point ternary operations in vector.md.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (fma4): New pattern.
>  (*fma): Ditto.
>  (fnma4): Ditto.
>  (*fnma): Ditto.
>  (fms4): Ditto.
>  (*fms): Ditto.
>  (fnms4): Ditto.
>  (*fnms): Ditto.
>  * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
> function.
>  * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
>  * config/riscv/vector.md: Fix attribute bug.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Add floating-point 
> teranary tests.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
>  * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.
> 
> ---
 
 
> +
> +(define_insn_and_split "*fma"
> +  [(set (match_operand:VF_AUTO 0 "register_operand"   "=vr, vr, ?&vr")
> + (fma:VF_AUTO
> +   (match_operand:VF_AUTO 1 "register_operand" " %0, vr,   vr")
> +   (match_operand:VF_AUTO 2 "register_operand" " vr, vr,   vr")
> +   (match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr")))
> +   (clobber (match_scratch:SI 4 "=r,r,r"))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +PUT_MODE (operands[4], Pmode);
Maybe this has already been answered, but why not get the mode right in 
the expander & pattern as opposed to blindly changing it in the C fragment?
 
It's probably technically safe to do what you've done, mostly because by 
the time the C code runs, we've turned the scratch into a hard register 
and I suspect the code to allocate scratches probably constructs a new 
hard reg for each scratch rather than using a shared object.
 
But if we can, let's get the mode right from the beginning.
 
I'll note the existing define_insn_and_splits for integer FMAs have this 
same wart.
 
 
jeff

Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread Jeff Law via Gcc-patches





On 6/21/23 09:20, 钟居哲 wrote:

I failed to make Pmode of the of operand.

I have tried the following

clobber (match_dup_4)
But it causes to many issues. I do many tries turns out only the current 
solution can work.

Can you describe more concretely what failed?

Offhand I can't think of a reason why this shouldn't work and what 
you're doing seems long term fragile/wrong.


jeff

Re: Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread 钟居哲

I have tried:
(define_expand "fms4"
  [(parallel
[(set (match_operand:VF_AUTO 0 "register_operand")
(fma:VF_AUTO
  (match_operand:VF_AUTO 1 "register_operand")
  (match_operand:VF_AUTO 2 "register_operand")
  (neg:VF_AUTO
(match_operand:VF_AUTO 3 "register_operand"
 (clobber (match_dup 4))])]
  "TARGET_VECTOR"
  {
operands[4] = gen_reg_rtx (Pmode);
})

(define_insn_and_split "*fms"
  [(set (match_operand:VF_AUTO 0 "register_operand" "=vr, vr, ?&vr")
  (fma:VF_AUTO
(match_operand:VF_AUTO 1 "register_operand"   " %0, vr,   vr")
(match_operand:VF_AUTO 2 "register_operand"   " vr, vr,   vr")
(neg:VF_AUTO
  (match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr"
   (clobber (match_operand:P 4 "=r,r,r"))]
  "TARGET_VECTOR"
  "#"
  "&& reload_completed"
  [(const_int 0)]
  {
riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
if (which_alternative == 2)
  emit_insn (gen_rtx_SET (operands[0], operands[3]));
rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
riscv_vector::RVV_TERNOP, ops, operands[4]);
DONE;
  }
  [(set_attr "type" "vfmuladd")
   (set_attr "mode" "")])

It fails and create ICE.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-21 23:21
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support RVV floating-point ternary 
auto-vectorization
 
 
On 6/21/23 09:20, 钟居哲 wrote:
> I failed to make Pmode of the of operand.
> 
> I have tried the following
> 
> clobber (match_dup_4)
> But it causes to many issues. I do many tries turns out only the current 
> solution can work.
Can you describe more concretely what failed?
 
Offhand I can't think of a reason why this shouldn't work and what 
you're doing seems long term fragile/wrong.
 
jeff

Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-06-21 Thread Jeff Law via Gcc-patches





On 6/21/23 01:49, Richard Biener via Gcc-patches wrote:

The following addresses a miscompilation by RTL scheduling related
to the representation of masked stores.  For that we have

(insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
 (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  )
 (const_int -4 [0xfffc] [1 MEM  [(int *)vectp_b.12_28]+0 S64 A32])
 (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
 (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
 (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  
)
 (const_int -4 [0xfffc] [1 MEM 
 [(int *)vectp_b.12_28]+0 S64 A32])

and specifically the memory attributes

   [1 MEM  [(int *)vectp_b.12_28]+0 S64 A32]

are problematic.  They tell us the instruction stores and reads a full
vector which it if course does not.  There isn't any good MEM_EXPR
we can use here (we lack a way to just specify a pointer and restrict
info for example), and since the MEMs have a vector mode it's
difficult in general as passes do not need to look at the memory
attributes at all.

The easiest way to avoid running into the alias analysis problem is
to scrap the MEM_EXPR when we expand the internal functions for
partial loads/stores.  That avoids the disambiguation we run into
which is realizing that we store to an object of less size as
the size of the mode we appear to store.

After the patch we see just

   [1  S64 A32]

so we preserve the alias set, the alignment and the size (the size
is redundant if the MEM insn't BLKmode).  That's still not good
in case the RTL alias oracle would implement the same
disambiguation but it fends off the gimple one.

This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
and --param=vect-partial-vector-usage=1.

On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
side we might instead want to use a BLKmode MEM?  Any better
ideas here?

I'd expect that using BLKmode will fend off the RTL aliasing code.

jeff

Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread Jeff Law via Gcc-patches





On 6/21/23 09:28, 钟居哲 wrote:

I have tried:
(define_expand "fms4"
   [(parallel
     [(set (match_operand:VF_AUTO 0 "register_operand")
     (fma:VF_AUTO
       (match_operand:VF_AUTO 1 "register_operand")
       (match_operand:VF_AUTO 2 "register_operand")
       (neg:VF_AUTO
         (match_operand:VF_AUTO 3 "register_operand"
      (clobber (match_dup4))])]
"TARGET_VECTOR"
   {
operands[4] = gen_reg_rtx (Pmode);
})

(define_insn_and_split "*fms"
   [(set (match_operand:VF_AUTO 0 "register_operand" "=vr, vr, ?&vr")
   (fma:VF_AUTO
     (match_operand:VF_AUTO 1 "register_operand" " %0, vr,   vr")
     (match_operand:VF_AUTO 2 "register_operand" " vr, vr,   vr")
     (neg:VF_AUTO
       (match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr"
    (clobber (match_operand:P 4 "=r,r,r"))]
"TARGET_VECTOR"
"#"
"&& reload_completed"
   [(const_int 0)]
   {
     riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
     if (which_alternative == 2)
       emit_insn (gen_rtx_SET (operands[0], operands[3]));
     rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
     riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),

                 riscv_vector::RVV_TERNOP, ops, operands[4]);
     DONE;
   }
   [(set_attr "type" "vfmuladd")
    (set_attr "mode" "")])

It fails and create ICE.
But where did ICE and *why*?  That's really the question.  Changing the 
mode just looks like papering over a more serious issue.


jeff

[PATCH V4] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-21 Thread juzhe . zhong

From: Ju-Zhe Zhong 

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
LEN_MASK_STORE.
(internal_load_fn_p): Add LEN_MASK_LOAD.
(internal_store_fn_p): Add LEN_MASK_STORE.
(internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
(internal_fn_stored_value_index): Add LEN_MASK_STORE.
(internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
* optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
LEN_MASK_{LOAD,STORE}.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  36 ++-
 gcc/optabs-tree.cc |  60 +--
 gcc/optabs-tree.h  |   6 +-
 gcc/tree-vect-stmts.cc | 220 +
 4 files changed, 243 insertions(+), 79 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..b90bd85df2c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_SCATTER_STORE:
   return 4;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 default:
   return (conditional_internal_fn_code (fn) != ERROR_MARK
  || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_opno = 3;
+
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+   = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_opno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 77bf745ae40..e90e1b62ebc 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -548,14 +548,29 @@ target_supports_op_p (tree type, enum tree_code code,
 bool
 can_vec_mask_load_store_p (machine_mode mode,
   machine_mode mask_mode,
-  bool is_load)
+  bool is_load,
+  internal_fn *ifn)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
+  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
   machine_mode vmode;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
+{
+  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
+   {
+ if (ifn)
+   *ifn = is_load ? IFN_MASK_LOAD

[PATCH V3] RISC-V: Support RVV floating-point auto-vectorization

2023-06-21 Thread Juzhe-Zhong

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
(fnma4): Ditto.
(*fnma): Ditto.
(fms4): Ditto.
(*fms): Ditto.
(fnms4): Ditto.
(*fnms): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
function.
* config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
* config/riscv/vector.md: Fix attribute bug.

---
 gcc/config/riscv/autovec.md   | 184 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  49 +++--
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/ternop/ternop-1.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-10.c  |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-11.c  |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-12.c  |  28 +++
 .../riscv/rvv/autovec/ternop/ternop-2.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-3.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-4.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-5.c   |   8 +-
 .../riscv/rvv/autovec/ternop/ternop-6.c   |   9 +-
 .../riscv/rvv/autovec/ternop/ternop-7.c   |  23 +++
 .../riscv/rvv/autovec/ternop/ternop-8.c   |  29 +++
 .../riscv/rvv/autovec/ternop/ternop-9.c   |  28 +++
 .../riscv/rvv/autovec/ternop/ternop_run-1.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-10.c  |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-11.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-12.c  |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-2.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-3.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-4.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-5.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-6.c   |  12 +-
 .../riscv/rvv/autovec/ternop/ternop_run-7.c   |  40 
 .../riscv/rvv/autovec/ternop/ternop_run-8.c   |  60 ++
 .../riscv/rvv/autovec/ternop/ternop_run-9.c   |  60 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-1.c|  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-10.c   |  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-11.c   |  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-12.c   |  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-2.c|  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-3.c|  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-4.c|  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-5.c|  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-6.c|  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-7.c|  35 
 .../rvv/autovec/ternop/ternop_run_zvfh-8.c|  55 ++
 .../rvv/autovec/ternop/ternop_run_zvfh-9.c|  55 ++
 40 files changed, 1386 insertions(+), 34 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c
 create mode 100644 
gcc/t

Re: Re: [PATCH] RISC-V: Support RVV floating-point ternary auto-vectorization

2023-06-21 Thread 钟居哲

Hi, Jeff.

I tried again:
+(define_expand "fma4"
+  [(parallel
+[(set (match_operand:VF_AUTO 0 "register_operand")
+ (fma:VF_AUTO
+   (match_operand:VF_AUTO 1 "register_operand")
+   (match_operand:VF_AUTO 2 "register_operand")
+   (match_operand:VF_AUTO 3 "register_operand")))
+ (clobber (match_dup 4))])]
+  "TARGET_VECTOR"
+  {
+operands[4] = gen_reg_rtx (Pmode);
+  })
+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VF_AUTO 0 "register_operand"   "=vr, vr, ?&vr")
+   (fma:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand" " %0, vr,   vr")
+ (match_operand:VF_AUTO 2 "register_operand" " vr, vr,   vr")
+ (match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr")))
+   (clobber (match_operand:P 4 "register_operand" "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+if (which_alternative == 2)
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
+riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
+ riscv_vector::RVV_TERNOP, ops, 
operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vfmuladd")
+   (set_attr "mode" "")])
It seems to work and all test have passed.
Thanks for pointing this out.

I have sent V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622481.html 

If this implementation is more reasonable, I will send a separate patch to 
adjust integer ternary autovec patterns.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-21 23:32
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support RVV floating-point ternary 
auto-vectorization
 
 
On 6/21/23 09:28, 钟居哲 wrote:
> I have tried:
> (define_expand "fms4"
>[(parallel
>  [(set (match_operand:VF_AUTO 0 "register_operand")
>  (fma:VF_AUTO
>(match_operand:VF_AUTO 1 "register_operand")
>(match_operand:VF_AUTO 2 "register_operand")
>(neg:VF_AUTO
>  (match_operand:VF_AUTO 3 "register_operand"
>   (clobber (match_dup4))])]
> "TARGET_VECTOR"
>{
> operands[4] = gen_reg_rtx (Pmode);
> })
> 
> (define_insn_and_split "*fms"
>[(set (match_operand:VF_AUTO 0 "register_operand" "=vr, vr, ?&vr")
>(fma:VF_AUTO
>  (match_operand:VF_AUTO 1 "register_operand" " %0, vr,   vr")
>  (match_operand:VF_AUTO 2 "register_operand" " vr, vr,   vr")
>  (neg:VF_AUTO
>(match_operand:VF_AUTO 3 "register_operand" " vr,  0,   vr"
> (clobber (match_operand:P 4 "=r,r,r"))]
> "TARGET_VECTOR"
> "#"
> "&& reload_completed"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>  if (which_alternative == 2)
>emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
> mode),
>  riscv_vector::RVV_TERNOP, ops, operands[4]);
>  DONE;
>}
>[(set_attr "type" "vfmuladd")
> (set_attr "mode" "")])
> 
> It fails and create ICE.
But where did ICE and *why*?  That's really the question.  Changing the 
mode just looks like papering over a more serious issue.
 
jeff

Re: [PATCH] Introduce hardbool attribute for C

2023-06-21 Thread Qing Zhao via Gcc-patches

Hi, Alexandre,

> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 22e240a3c2a55..f9cc609b54d94 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -2226,6 +2226,35 @@ convert_lvalue_to_rvalue (location_t loc, struct 
> c_expr exp,
> exp.value = convert (build_qualified_type (TREE_TYPE (exp.value), 
> TYPE_UNQUALIFIED), exp.value);
>   if (force_non_npc)
> exp.value = build1 (NOP_EXPR, TREE_TYPE (exp.value), exp.value);
> +
> +  {
> +tree false_value, true_value;
> +if (convert_p && !error_operand_p (exp.value)
> + && c_hardbool_type_attr (TREE_TYPE (exp.value),
> +  &false_value, &true_value))
> +  {
> + tree t = save_expr (exp.value);
> +
> + mark_exp_read (exp.value);
> +
> + tree trapfn = builtin_decl_explicit (BUILT_IN_TRAP);
> + tree expr = build_call_expr_loc (loc, trapfn, 0);
> + expr = build_compound_expr (loc, expr, boolean_true_node);
> + expr = fold_build3_loc (loc, COND_EXPR, boolean_type_node,
> + fold_build2_loc (loc, NE_EXPR,
> +  boolean_type_node,
> +  t, true_value),
> + expr, boolean_true_node);
> + expr = fold_build3_loc (loc, COND_EXPR, boolean_type_node,
> + fold_build2_loc (loc, NE_EXPR,
> +  boolean_type_node,
> +  t, false_value),
> + expr, boolean_false_node);
> +
> + exp.value = expr;
> +  }
> +  }

I see that you have testing case to check the above built_in_trap call is 
generated by FE. 
Do you have a testing case to check the trap is happening at runtime? 
> +
>   return exp;
> }
> 
> @@ -8488,7 +8517,7 @@ digest_init (location_t init_loc, tree type, tree init, 
> tree origtype,
>   }
>   }
> 
> -  if (code == VECTOR_TYPE)
> +  if (code == VECTOR_TYPE || c_hardbool_type_attr (type))
>   /* Although the types are compatible, we may require a
>  conversion.  */
>   inside_init = convert (type, inside_init);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 2de212c8c2d84..7b5592502734e 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8681,6 +8681,58 @@ initialization will result in future breakage.
> GCC emits warnings based on this attribute by default; use
> @option{-Wno-designated-init} to suppress them.
> 
> +@cindex @code{hardbool} type attribute
> +@item hardbool
> +@itemx hardbool (@var{false_value})
> +@itemx hardbool (@var{false_value}, @var{true_value})
> +This attribute may only be applied to integral types in C, to introduce
> +hardened boolean types.  It turns the integral type into a boolean-like
> +type with the same size and precision, that uses the specified values as
> +representations for @code{false} and @code{true}.  Underneath, it is
> +actually an enumerate type, but its observable behavior is like that of
> +@code{_Bool}, except for the strict internal representations, verified
> +by runtime checks.
> +
> +If @var{true_value} is omitted, the bitwise negation of
> +@var{false_value} is used.  If @var{false_value} is omitted, zero is
> +used.  The named representation values must be different when converted
> +to the original integral type.  Narrower bitfields are rejected if the
> +representations become indistinguishable.
> +
> +Values of such types automatically decay to @code{_Bool}, at which
> +point, the selected representation values are mapped to the
> +corresponding @code{_Bool} values.  When the represented value is not
> +determined, at compile time, to be either @var{false_value} or
> +@var{true_value}, runtime verification calls @code{__builtin_trap} if it
> +is neither.  This is what makes them hardened boolean types.
> +
> +When converting scalar types to such hardened boolean types, implicitly
> +or explicitly, behavior corresponds to a conversion to @code{_Bool},
> +followed by a mapping from @code{false} and @code{true} to
> +@var{false_value} and @var{true_value}, respectively.
> +
> +@smallexample
> +typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
> +hbool first = 0;   /* False, stored as (char)0x5a.  */
> +hbool second = !first; /* True, stored as ~(char)0x5a.  */
> +
> +static hbool zeroinit; /* False, stored as (char)0x5a.  */
> +auto hbool uninit; /* Undefined, may trap.  */
> +@end smallexample
> +
> +When zero-initializing a variable or field of hardened boolean type
> +(presumably held in static storage) the implied zero initializer gets
> +converted to @code{_Bool}, and then to the hardened boolean type, so
> +that the initial value is the hardened representation for @code{false}.
> +Using that value is well defined.  This is @emph{not} the case when
> +variables and fields of such types are uninitialized (presumably held in
> +automatic or dynamic

Re: [Patch, fortran] PR108961 - Segfault when associating to pointer from C_F_POINTER

2023-06-21 Thread Paul Richard Thomas via Gcc-patches

Committed as r14-2021-gcaf0892eea67349d9a1e44590c3440768136fe2b

Thanks for the pointers, Tobias and Mikael, I used them both.

Paul

On Tue, 20 Jun 2023 at 21:47, Mikael Morin  wrote:
>
> Le 20/06/2023 à 18:30, Tobias Burnus a écrit :
> > On 20.06.23 18:19, Paul Richard Thomas via Fortran wrote:
> >
> >> Is there a better way to detect a type(c_ptr) formal argument?
> > u.derived->intmod_sym_id == ISOCBINDING_PTR ?
> && u.derived->from_intmod == INTMOD_ISO_C_BINDING ?
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein

Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-21 Thread Paul Richard Thomas via Gcc-patches

Committed as r14-2022-g577223aebc7acdd31e62b33c1682fe54a622ae27

Thanks for the help and the review Harald. Thanks to Steve too for
picking up Neil Carlson's bugs.

Cheers

Paul

On Tue, 20 Jun 2023 at 22:57, Harald Anlauf  wrote:
>
> Hi Paul,
>
> On 6/20/23 12:54, Paul Richard Thomas via Gcc-patches wrote:
> > Hi Harald,
> >
> > Fixing the original testcase in this PR turned out to be slightly more
> > involved than I expected. However, it resulted in an open door to fix
> > some other PRs and the attached much larger patch.
> >
> > This time, I did remember to include the testcases in the .diff :-)
>
> indeed! :-)
>
> I've only had a superficial look so far although it looks very good.
> (I have to trust your experience with unlimited polymorphism.)
>
> However, I was wondering about the following helper function:
>
> +bool
> +gfc_is_ptr_fcn (gfc_expr *e)
> +{
> +  return e != NULL && e->expr_type == EXPR_FUNCTION
> + && (gfc_expr_attr (e).pointer
> + || (e->ts.type == BT_CLASS
> + && CLASS_DATA (e)->attr.class_pointer));
> +}
> +
> +
>   /* Copy a shape array.  */
>
> Is there a case where gfc_expr_attr (e).pointer returns false
> and you really need the || part?  Looking at gfc_expr_attr
> and the present context, it might just not be necessary.
>
> > I believe that, between the Change.Logs and the comments, it is
> > reasonably self-explanatory.
> >
> > OK for trunk?
>
> OK from my side.
>
> Thanks for the patch!
>
> Harald
>
> > Regards
> >
> > Paul
> >
> > Fortran: Fix some bugs in associate [PR87477]
> >
> > 2023-06-20  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/87477
> > PR fortran/88688
> > PR fortran/94380
> > PR fortran/107900
> > PR fortran/110224
> > * decl.cc (char_len_param_value): Fix memory leak.
> > (resolve_block_construct): Remove unnecessary static decls.
> > * expr.cc (gfc_is_ptr_fcn): New function.
> > (gfc_check_vardef_context): Use it to permit pointer function
> > result selectors to be used for associate names in variable
> > definition context.
> > * gfortran.h: Prototype for gfc_is_ptr_fcn.
> > * match.cc (build_associate_name): New function.
> > (gfc_match_select_type): Use the new function to replace inline
> > version and to build a new associate name for the case where
> > the supplied associate name is already used for that purpose.
> > * resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
> > associate names with pointer function targets to be used in
> > variable definition context.
> > * trans-decl.cc (gfc_get_symbol_decl): Unlimited polymorphic
> > variables need deferred initialisation of the vptr.
> > (gfc_trans_deferred_vars): Do the vptr initialisation.
> > * trans-stmt.cc (trans_associate_var): Ensure that a pointer
> > associate name points to the target of the selector and not
> > the selector itself.
> >
> > gcc/testsuite/
> > PR fortran/87477
> > PR fortran/107900
> > * gfortran.dg/pr107900.f90 : New test
> >
> > PR fortran/110224
> > * gfortran.dg/pr110224.f90 : New test
> >
> > PR fortran/88688
> > * gfortran.dg/pr88688.f90 : New test
> >
> > PR fortran/94380
> > * gfortran.dg/pr94380.f90 : New test
> >
> > PR fortran/95398
> > * gfortran.dg/pr95398.f90 : Set -std=f2008, bump the line
> > numbers in the error tests by two and change the text in two.
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein

Re: [PATCH 2/2] rust: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-21 Thread Paul E Murphy via Gcc-patches





On 6/19/23 3:39 AM, Thomas Schwinge wrote:

Hi Paul!

On 2023-06-16T11:00:02-0500, "Paul E. Murphy via Gcc-patches" 
 wrote:

This was noticed when fixing the gccgo usage of the macro, the
rust usage is very similar.

TARGET_AIX is defined as a non-zero value on linux/powerpc64le
which may cause unexpected behavior.  TARGET_AIX_OS should be
used to toggle AIX specific behavior.

gcc/rust/ChangeLog:

   * rust-object-export.cc [TARGET_AIX]: Rename and update
   usage to TARGET_AIX_OS.


I don't have rights to formally approve this GCC/Rust change, but I'll
note that it follows "as obvious" (see
, "Obvious fixes") to the
corresponding GCC/Go change, which has been approved:
,
and which is where this GCC/Rust code has been copied from, so I suggest
you push both patches at once.


Grüße
  Thomas


Hi Thomas,

Thank you for reviewing.  I do not have commit access, so I cannot push 
this myself.  If this is OK, could one of the rust maintainers push this 
patch?


Thanks,
Paul

Re: PING^2: Re: [PATCH 1/3] testsuite: move handle-multiline-outputs to before check for blank lines

2023-06-21 Thread Mike Stump via Gcc-patches

On Jun 20, 2023, at 10:21 AM, David Malcolm  wrote:
> Does this testsuite patch look OK?
> 
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620275.html
> 
> Thanks
> David
> 
> On Mon, 2023-06-12 at 19:11 -0400, David Malcolm wrote:
>> Please can someone review this testsuite patch:
>>   https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620275.html
>> 
>> Thanks
>> Dave
>> 
>> On Wed, 2023-05-31 at 14:06 -0400, David Malcolm wrote:
>>> I have followup patches that require checking for multiline
>>> patterns
>>> that have blank lines within them, so this moves the handling of
>>> multiline patterns before the check for blank lines, allowing for
>>> such
>>> multiline patterns.
>>> 
>>> Doing so uncovers some issues with existing multiline directives,
>>> which
>>> the patch fixes.

Ok.

[wwwdocs] cxx-status: Add C++26 papers (Spring 2023, Varna)

2023-06-21 Thread Marek Polacek via Gcc-patches

First C++26 papers started to trickle in.  Update our docs accordingly.

We don't have -std=c++2c/-std=c++26/-std=gnu++2c/-std=gnu++26 yet, but
I should have a patch for it by the end of this week.

W3C validated.  Pushed.

commit 9c66e33761140358d350c5fb2d1638f6afdaead4
Author: Marek Polacek 
Date:   Wed Jun 21 12:41:25 2023 -0400

cxx-status: Add C++26 papers (Spring 2023, Varna)

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 675fbcd0..99387540 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -20,6 +20,7 @@
 C++17
 C++20
 C++23
+C++26
 Technical Specifications
   
   
@@ -32,6 +33,99 @@
Implementation Status section of the Libstdc++ manual.
   
 
+  C++26 Support in GCC
+
+  GCC has experimental support for the next revision of the C++
+  standard, which is expected to be published in 2026.
+
+
+
+
+  C++26 Language Features
+
+  
+
+  Language Feature
+  Proposal
+  Available in GCC?
+  SD-6 Feature Test
+
+
+
+   Remove undefined behavior from lexing 
+  https://wg21.link/P2621R2";>P2621R2 (DR) 
+   https://gcc.gnu.org/PR110340";>No
+   
+
+
+   Making non-encodable string literals ill-formed 
+  https://wg21.link/P1854R4";>P1854R4  (DR) 
+   https://gcc.gnu.org/PR110341";>No
+   
+
+
+   Unevaluated strings 
+   https://wg21.link/P2361R6";>P2361R6
+   https://gcc.gnu.org/PR110342";>No
+   
+
+
+   Add @, $, and ` to the basic character set 
+   https://wg21.link/P2558R2";>P2558R2
+   https://gcc.gnu.org/PR110343";>No
+   
+
+
+   constexpr cast from void* 
+   https://wg21.link/P2738R1";>P2738R1
+   https://gcc.gnu.org/PR110344";>No
+   
+
+
+   On the ignorability of standard attributes 
+  https://wg21.link/P2552R3";>P2552R3 (DR) 
+   https://gcc.gnu.org/PR110345";>No
+   
+
+
+   Static storage for braced initializers 
+  https://wg21.link/P2752R3";>P2752R3 (DR) 
+   https://gcc.gnu.org/PR110346";>No
+   
+
+
+   User-generated static_assert messages 
+   https://wg21.link/P2741R3";>P2741R3
+   https://gcc.gnu.org/PR110348";>No
+   
+
+
+   Placeholder variables with no name 
+   https://wg21.link/P2169R4";>P2169R4
+   https://gcc.gnu.org/PR110349";>No
+   
+
+
+  
+
   C++23 Support in GCC
 
   GCC has experimental support for the next revision of the C++

Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-21 Thread Steve Kargl via Gcc-patches

On Wed, Jun 21, 2023 at 05:12:22PM +0100, Paul Richard Thomas wrote:
> Committed as r14-2022-g577223aebc7acdd31e62b33c1682fe54a622ae27
> 
> Thanks for the help and the review Harald. Thanks to Steve too for
> picking up Neil Carlson's bugs.
> 

It's only natural.  You fix bugs in a long desired feature,
and people will start to use that feature more. 

I always look at Neil's bug reports.  They're typically
concise code snippets and have cross references to the
Fortran standard.  Unfortunately, I lack the ability to
fix them. :( 

-- 
Steve

[PATCH] c++: redundant targ coercion for var/alias tmpls

2023-06-21 Thread Patrick Palka via Gcc-patches

When stepping through the variable/alias template specialization code
paths, I noticed we perform template argument coercion twice: first from
instantiate_alias_template / finish_template_variable and again from
tsubst_decl (during instantiate_template).  It should suffice to perform
coercion once.

To that end patch elides this second coercion from tsubst_decl when
possible.  We can't get rid of it completely because we don't always
specialize a variable template from finish_template_variable: we could
also be doing so directly from instantiate_template during variable
template partial specialization selection, in which case the coercion
from tsubst_decl would be the first and only coercion.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  This reduces memory usage of range-v3's zip.cpp by ~0.5%.

gcc/cp/ChangeLog:

* pt.cc (tsubst_decl) : Call
coercion_template_parms only if DECL_TEMPLATE_SPECIALIZATION
is set.
---
 gcc/cp/pt.cc | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index be86051abad..dd10409ce18 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -15232,10 +15232,17 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
if (argvec != error_mark_node
&& PRIMARY_TEMPLATE_P (gen_tmpl)
-   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec))
- /* We're fully specializing a template declaration, so
-we need to coerce the innermost arguments corresponding to
-the template.  */
+   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec)
+   && DECL_TEMPLATE_SPECIALIZATION (t))
+ /* We're fully specializing an alias or variable template, so
+coerce the innermost arguments if necessary.  We expect
+instantiate_alias_template and finish_template_variable to
+already have done this relative to the primary template, in
+which case this coercion is unnecessary, but we can also
+get here when substituting a partial variable template
+specialization (directly from instantiate_template), in
+which case DECL_TEMPLATE_SPECIALIZATION is set and coercion
+is necessary.  */
  argvec = (coerce_template_parms
(DECL_TEMPLATE_PARMS (gen_tmpl),
 argvec, tmpl, complain));
-- 
2.41.0.113.g6640c2d06d

Re: [PATCH] Update array address space in c_build_qualified_type

2023-06-21 Thread Joseph Myers

On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:

> >   This patch sets the address space of the array type to that of the
> >   element type.
> >
> >   Regression tests for avr look ok. Ok for trunk?
> 
> The patch looks OK to me but please let a C frontend maintainer
> double-check (I've CCed Joseph for this).

The question would be whether there are any TYPE_QUALS uses in the C front 
end that behave incorrectly given TYPE_ADDR_SPACE (acting as qualifiers) 
being set on an array type - conceptually, before C2x, array types are 
unqualified, only the element types are qualified.

The fact that this changed in C2x gives a shortcut to doing that analysis 
- you don't need to check all TYPE_QUALS uses in the front end, only a 
limited number of places that might handle qualifiers on arrays that 
already have conditionals to do things differently in C2x mode.  But some 
sort of analysis of those places, to see how they'd react to an array type 
itself having TYPE_ADDR_SPACE set, would be helpful.  If you're lucky, all 
those places only care about TYPE_QUALS on the element type and not on the 
array type itself.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-21 Thread Harald Anlauf via Gcc-patches


Hi Paul,

while I only had a minor question regarding gfc_is_ptr_fcn(),
can you still try to enlighten me why that second part
was necessary?  (I believed it to be redundant and may have
overlooked the obvious.)

Cheers,
Harald

On 6/21/23 18:12, Paul Richard Thomas via Gcc-patches wrote:

Committed as r14-2022-g577223aebc7acdd31e62b33c1682fe54a622ae27

Thanks for the help and the review Harald. Thanks to Steve too for
picking up Neil Carlson's bugs.

Cheers

Paul

On Tue, 20 Jun 2023 at 22:57, Harald Anlauf  wrote:


Hi Paul,

On 6/20/23 12:54, Paul Richard Thomas via Gcc-patches wrote:

Hi Harald,

Fixing the original testcase in this PR turned out to be slightly more
involved than I expected. However, it resulted in an open door to fix
some other PRs and the attached much larger patch.

This time, I did remember to include the testcases in the .diff :-)


indeed! :-)

I've only had a superficial look so far although it looks very good.
(I have to trust your experience with unlimited polymorphism.)

However, I was wondering about the following helper function:

+bool
+gfc_is_ptr_fcn (gfc_expr *e)
+{
+  return e != NULL && e->expr_type == EXPR_FUNCTION
+ && (gfc_expr_attr (e).pointer
+ || (e->ts.type == BT_CLASS
+ && CLASS_DATA (e)->attr.class_pointer));
+}
+
+
   /* Copy a shape array.  */

Is there a case where gfc_expr_attr (e).pointer returns false
and you really need the || part?  Looking at gfc_expr_attr
and the present context, it might just not be necessary.


I believe that, between the Change.Logs and the comments, it is
reasonably self-explanatory.

OK for trunk?


OK from my side.

Thanks for the patch!

Harald


Regards

Paul

Fortran: Fix some bugs in associate [PR87477]

2023-06-20  Paul Thomas  

gcc/fortran
PR fortran/87477
PR fortran/88688
PR fortran/94380
PR fortran/107900
PR fortran/110224
* decl.cc (char_len_param_value): Fix memory leak.
(resolve_block_construct): Remove unnecessary static decls.
* expr.cc (gfc_is_ptr_fcn): New function.
(gfc_check_vardef_context): Use it to permit pointer function
result selectors to be used for associate names in variable
definition context.
* gfortran.h: Prototype for gfc_is_ptr_fcn.
* match.cc (build_associate_name): New function.
(gfc_match_select_type): Use the new function to replace inline
version and to build a new associate name for the case where
the supplied associate name is already used for that purpose.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (gfc_get_symbol_decl): Unlimited polymorphic
variables need deferred initialisation of the vptr.
(gfc_trans_deferred_vars): Do the vptr initialisation.
* trans-stmt.cc (trans_associate_var): Ensure that a pointer
associate name points to the target of the selector and not
the selector itself.

gcc/testsuite/
PR fortran/87477
PR fortran/107900
* gfortran.dg/pr107900.f90 : New test

PR fortran/110224
* gfortran.dg/pr110224.f90 : New test

PR fortran/88688
* gfortran.dg/pr88688.f90 : New test

PR fortran/94380
* gfortran.dg/pr94380.f90 : New test

PR fortran/95398
* gfortran.dg/pr95398.f90 : Set -std=f2008, bump the line
numbers in the error tests by two and change the text in two.

[PATCH 1/1] libcpp: allow UCS_LIMIT codepoints in UTF-8 strings

2023-06-21 Thread Ben Boeckel

libcpp/

* charset.cc: Allow `UCS_LIMIT` in UTF-8 strings.

Reported-by: Damien Guibouret 
Fixes: c1dbaa6656a (libcpp: reject codepoints above 0x10, 2023-06-06)
Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d4f573e365f..54ebab2b8a4 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1891,7 +1891,7 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
 invalid because they cannot be represented in UTF-16.
 
 Reject such values.*/
-  if (cp >= UCS_LIMIT)
+  if (cp > UCS_LIMIT)
return false;
 }
   /* No problems encountered.  */
-- 
2.40.1

Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-21 Thread Bernhard Reutner-Fischer via Gcc-patches

Hi!

First of all, many thanks for the patch!
If i may, i have a question concerning the chosen style in the error
message and one nitpick concerning a return type though, the latter
primarily prompting this reply.

On Tue, 20 Jun 2023 11:54:25 +0100
Paul Richard Thomas via Fortran  wrote:

> diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
> index d5cfbe0cc55..c960dfeabd9 100644
> --- a/gcc/fortran/expr.cc
> +++ b/gcc/fortran/expr.cc

> @@ -6470,6 +6480,22 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, 
> bool alloc_obj,
>   }
> return false;
>   }
> +  else if (context && gfc_is_ptr_fcn (assoc->target))
> + {
> +   if (!gfc_notify_std (GFC_STD_F2018, "%qs at %L associated to "
> +"pointer function target being used in a "
> +"variable definition context (%s)", name,
> +&e->where, context))

I'm curious why you decided to put context in braces and not simply use
quotes as per %qs?

> + return false;
> +   else if (gfc_has_vector_index (e))
> + {
> +   gfc_error ("%qs at %L associated to vector-indexed target"
> +  " cannot be used in a variable definition"
> +  " context (%s)",
> +  name, &e->where, context);

Ditto.

> diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
> index e7be7fddc64..0e4b5440393 100644
> --- a/gcc/fortran/match.cc
> +++ b/gcc/fortran/match.cc
> @@ -6377,6 +6377,39 @@ build_class_sym:
>  }
> 
> 
> +/* Build the associate name  */
> +static int
> +build_associate_name (const char *name, gfc_expr **e1, gfc_expr **e2)
> +{

> +return 1;

> +  return 0;
> +}

I've gone through the frontend recently and changed several such
boolean functions to use bool where appropriate. May i ask folks to use
narrower types in new code, please?
Iff later in the pipeline it is considered appropriate or benefical to
promote types, these will eventually be promoted.

> diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
> index e6a4337c0d2..18589e17843 100644
> --- a/gcc/fortran/trans-decl.cc
> +++ b/gcc/fortran/trans-decl.cc

> @@ -1906,6 +1915,7 @@ gfc_get_symbol_decl (gfc_symbol * sym)
>   gcc_assert (!sym->value || sym->value->expr_type == EXPR_NULL);
>  }
> 
> +

ISTM that the addition of vertical whitespace like here is in
contradiction with the coding style.

Please kindly excuse my comment and, again, thanks!

>gfc_finish_var_decl (decl, sym);
> 
>if (sym->ts.type == BT_CHARACTER)

Merge from trunk to gccgo branch

2023-06-21 Thread Ian Lance Taylor via Gcc-patches

I merged trunk revision 577223aebc7acdd31e62b33c1682fe54a622ae27 to
the gccgo branch.

Ian

[committed] function: Change return type of predicate function from int to bool

2023-06-21 Thread Uros Bizjak via Gcc-patches

Also change some internal variables to bool and some functions to void.

gcc/ChangeLog:

* function.h (emit_initial_value_sets):
Change return type from int to void.
(aggregate_value_p): Change return type from int to bool.
(prologue_contains): Ditto.
(epilogue_contains): Ditto.
(prologue_epilogue_contains): Ditto.
* function.cc (temp_slot): Make "in_use" variable bool.
(make_slot_available): Update for changed "in_use" variable.
(assign_stack_temp_for_type): Ditto.
(emit_initial_value_sets): Change return type from int to void
and update function body accordingly.
(instantiate_virtual_regs): Ditto.
(rest_of_handle_thread_prologue_and_epilogue): Ditto.
(safe_insn_predicate): Change return type from int to bool.
(aggregate_value_p): Change return type from int to bool
and update function body accordingly.
(prologue_contains): Change return type from int to bool.
(prologue_epilogue_contains): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/function.cc b/gcc/function.cc
index 82102ed78d7..6a79a8290f6 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -578,8 +578,8 @@ public:
   tree type;
   /* The alignment (in bits) of the slot.  */
   unsigned int align;
-  /* Nonzero if this temporary is currently in use.  */
-  char in_use;
+  /* True if this temporary is currently in use.  */
+  bool in_use;
   /* Nesting level at which this slot is being used.  */
   int level;
   /* The offset of the slot from the frame_pointer, including extra space
@@ -674,7 +674,7 @@ make_slot_available (class temp_slot *temp)
 {
   cut_slot_from_list (temp, temp_slots_at_level (temp->level));
   insert_slot_to_list (temp, &avail_temp_slots);
-  temp->in_use = 0;
+  temp->in_use = false;
   temp->level = -1;
   n_temp_slots_in_use--;
 }
@@ -848,7 +848,7 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 
size, tree type)
  if (known_ge (best_p->size - rounded_size, alignment))
{
  p = ggc_alloc ();
- p->in_use = 0;
+ p->in_use = false;
  p->size = best_p->size - rounded_size;
  p->base_offset = best_p->base_offset + rounded_size;
  p->full_size = best_p->full_size - rounded_size;
@@ -918,7 +918,7 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 
size, tree type)
 }
 
   p = selected;
-  p->in_use = 1;
+  p->in_use = true;
   p->type = type;
   p->level = temp_slot_level;
   n_temp_slots_in_use++;
@@ -1340,7 +1340,7 @@ has_hard_reg_initial_val (machine_mode mode, unsigned int 
regno)
   return NULL_RTX;
 }
 
-unsigned int
+void
 emit_initial_value_sets (void)
 {
   struct initial_value_struct *ivs = crtl->hard_reg_initial_vals;
@@ -1348,7 +1348,7 @@ emit_initial_value_sets (void)
   rtx_insn *seq;
 
   if (ivs == 0)
-return 0;
+return;
 
   start_sequence ();
   for (i = 0; i < ivs->num_entries; i++)
@@ -1357,7 +1357,6 @@ emit_initial_value_sets (void)
   end_sequence ();
 
   emit_insn_at_entry (seq);
-  return 0;
 }
 
 /* Return the hardreg-pseudoreg initial values pair entry I and
@@ -1535,7 +1534,7 @@ instantiate_virtual_regs_in_rtx (rtx *loc)
 /* A subroutine of instantiate_virtual_regs_in_insn.  Return true if X
matches the predicate for insn CODE operand OPERAND.  */
 
-static int
+static bool
 safe_insn_predicate (int code, int operand, rtx x)
 {
   return code < 0 || insn_operand_matches ((enum insn_code) code, operand, x);
@@ -1947,7 +1946,7 @@ instantiate_decls (tree fndecl)
 /* Pass through the INSNS of function FNDECL and convert virtual register
references to hard register references.  */
 
-static unsigned int
+static void
 instantiate_virtual_regs (void)
 {
   rtx_insn *insn;
@@ -2001,8 +2000,6 @@ instantiate_virtual_regs (void)
   /* Indicate that, from now on, assign_stack_local should use
  frame_pointer_rtx.  */
   virtuals_instantiated = 1;
-
-  return 0;
 }
 
 namespace {
@@ -2030,7 +2027,8 @@ public:
   /* opt_pass methods: */
   unsigned int execute (function *) final override
 {
-  return instantiate_virtual_regs ();
+  instantiate_virtual_regs ();
+  return 0;
 }
 
 }; // class pass_instantiate_virtual_regs
@@ -2044,12 +2042,12 @@ make_pass_instantiate_virtual_regs (gcc::context *ctxt)
 }
 
 
-/* Return 1 if EXP is an aggregate type (or a value with aggregate type).
+/* Return true if EXP is an aggregate type (or a value with aggregate type).
This means a type for which function calls must pass an address to the
function or get an address back from the function.
EXP may be a type node or an expression (whose type is tested).  */
 
-int
+bool
 aggregate_value_p (const_tree exp, const_tree fntype)
 {
   const_tree type = (TYPE_P (exp)) ? exp : TREE_TYPE (exp);
@@ -2069,7 +2067,7 @@ aggregate_value_p (const_tree exp, const_tree fntype)
  else
/* For internal functions, assume nothing needs to be

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-21 Thread Joseph Myers

On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:

> > > int32_t x = (int32_t)0x1.0p32;
> > > int32_t y = (int32_t)(int64_t)0x1.0p32;
> > >
> > > sets x to 2147483647 and y to 0.
> 
> Hmm, good question.  GENERIC has a direct truncation to unsigned char
> for example, the C standard generally says if the integral part cannot
> be represented then the behavior is undefined.  So I think we should be
> safe here (0x1.0p32 doesn't fit an int).

We should be following Annex F (unspecified value plus "invalid" exception 
for out-of-range floating-to-integer conversions rather than undefined 
behavior).  But we don't achieve that very well at present (see bug 93806 
comments 27-29 for examples of how such conversions produce wobbly 
values).

-- 
Joseph S. Myers
jos...@codesourcery.com

PING: Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-21 Thread David Malcolm via Gcc-patches

I'd like to ping this C++ FE patch for review:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621779.html

Thanks
Dave

On Wed, 2023-06-14 at 20:28 -0400, David Malcolm wrote:
> PR c++/110164 notes that in cases where we have a forward decl
> of a std library type such as:
> 
> std::array x;
> 
> we omit this diagnostic:
> 
> error: aggregate ‘std::array x’ has incomplete type and cannot be 
> defined
> 
> This patch adds this hint to the diagnostic:
> 
> note: ‘std::array’ is defined in header ‘’; this is probably fixable 
> by adding ‘#include ’
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for trunk?
> 
> gcc/cp/ChangeLog:
> PR c++/110164
> * cp-name-hint.h (maybe_suggest_missing_header): New decl.
> * decl.cc: Define INCLUDE_MEMORY.  Add include of
> "cp/cp-name-hint.h".
> (start_decl_1): Call maybe_suggest_missing_header.
> * name-lookup.cc (maybe_suggest_missing_header): Remove "static".
> 
> gcc/testsuite/ChangeLog:
> PR c++/110164
> * g++.dg/missing-header-pr110164.C: New test.
> ---
>  gcc/cp/cp-name-hint.h  |  3 +++
>  gcc/cp/decl.cc | 10 ++
>  gcc/cp/name-lookup.cc  |  2 +-
>  gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
>  4 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C
> 
> diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
> index bfa7c53c8f6..e2387e23d1f 100644
> --- a/gcc/cp/cp-name-hint.h
> +++ b/gcc/cp/cp-name-hint.h
> @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>  
>  extern name_hint suggest_alternatives_for (location_t, tree, bool);
>  extern name_hint suggest_alternatives_in_other_namespaces (location_t, tree);
> +extern name_hint maybe_suggest_missing_header (location_t location,
> +  tree name,
> +  tree scope);
>  extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
> tree);
>  extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
>  
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index a672e4844f1..504b08ec250 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>     line numbers.  For example, the CONST_DECLs for enum values.  */
>  
>  #include "config.h"
> +#define INCLUDE_MEMORY
>  #include "system.h"
>  #include "coretypes.h"
>  #include "target.h"
> @@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-objc.h"
>  #include "c-family/c-pragma.h"
>  #include "c-family/c-ubsan.h"
> +#include "cp/cp-name-hint.h"
>  #include "debug.h"
>  #include "plugin.h"
>  #include "builtins.h"
> @@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
> ;   /* An auto type is ok.  */
>    else if (TREE_CODE (type) != ARRAY_TYPE)
> {
> + auto_diagnostic_group d;
>   error ("variable %q#D has initializer but incomplete type", decl);
> + maybe_suggest_missing_header (input_location,
> +   TYPE_IDENTIFIER (type),
> +   TYPE_CONTEXT (type));
>   type = TREE_TYPE (decl) = error_mark_node;
> }
>    else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
> @@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
> gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
>    else
> {
> + auto_diagnostic_group d;
>   error ("aggregate %q#D has incomplete type and cannot be defined",
>  decl);
> + maybe_suggest_missing_header (input_location,
> +   TYPE_IDENTIFIER (type),
> +   TYPE_CONTEXT (type));
>   /* Change the type so that assemble_variable will give
>  DECL an rtl we can live with: (mem (const_int 0)).  */
>   type = TREE_TYPE (decl) = error_mark_node;
> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> index 6ac58a35b56..917b481c163 100644
> --- a/gcc/cp/name-lookup.cc
> +++ b/gcc/cp/name-lookup.cc
> @@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
> tree name)
>     for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
>     applicable.  */
>  
> -static name_hint
> +name_hint
>  maybe_suggest_missing_header (location_t location, tree name, tree scope)
>  {
>    if (scope == NULL_TREE)
> diff --git a/gcc/testsuite/g++.dg/missing-header-pr110164.C 
> b/gcc/testsuite/g++.dg/missing-header-pr110164.C
> new file mode 100644
> index 000..15980071c38
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/missing-header-pr110164.C
> @@ -0,0 +1,10 @@
> +// { dg-require-effective-target c+

Re: [RFC/RFT,V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-06-21 Thread Kees Cook via Gcc-patches

On Sat, Mar 25, 2023 at 01:11:14AM -0700, Dan Li wrote:
> This series of patches is mainly used to support the control flow
> integrity protection of the linux kernel [1], which is similar to
> -fsanitize=kcfi in clang 16.0 [2,3].
> 
> Any suggestion please let me know :).

Hi Dan,

It's been a couple months, and I didn't see any other feedback on this
proposal. I was curious what the status of this work is. Are you able to
attend GNU Cauldron[1] this year? I'd love to see this get some traction
in GCC.

Thanks!

-Kees

[1] https://gcc.gnu.org/wiki/cauldron2023

-- 
Kees Cook

[PATCH] RISC-V: Refactor the integer ternary autovec pattern

2023-06-21 Thread Juzhe-Zhong

Long time ago, I encounter ICE when trying to set clobber register as Pmode
and I forgot the reason.

So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which
makes patterns look unreasonable.

According to Jeff's comments, I tried it again, it works now when we try to
set clobber register as Pmode and the patterns look more reasonable now.

The tests are all passed, Ok for trunk.

gcc/ChangeLog:

* config/riscv/autovec.md (*fma): set clobber to Pmode in expand 
stage.
(*fma): Ditto.
(*fnma): Ditto.
(*fnma): Ditto.

---
 gcc/config/riscv/autovec.md | 54 +++--
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cf154b3737a..731ffe8ff89 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -596,40 +596,41 @@
 ;;result after reload_completed.
 (define_expand "fma4"
   [(parallel
-[(set (match_operand:VI 0 "register_operand" "=vr")
+[(set (match_operand:VI 0 "register_operand")
  (plus:VI
(mult:VI
- (match_operand:VI 1 "register_operand" " vr")
- (match_operand:VI 2 "register_operand" " vr"))
-   (match_operand:VI 3 "register_operand"   " vr")))
- (clobber (match_scratch:SI 4))])]
+ (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "register_operand"))
+   (match_operand:VI 3 "register_operand")))
+ (clobber (match_dup 4))])]
   "TARGET_VECTOR"
-  {})
+  {
+operands[4] = gen_reg_rtx (Pmode);
+  })
 
-(define_insn_and_split "*fma"
+(define_insn_and_split "*fma"
   [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
(plus:VI
  (mult:VI
(match_operand:VI 1 "register_operand" " %0, vr,   vr")
(match_operand:VI 2 "register_operand" " vr, vr,   vr"))
  (match_operand:VI 3 "register_operand"   " vr,  0,   vr")))
-   (clobber (match_scratch:SI 4 "=r,r,r"))]
+   (clobber (match_operand:P 4 "register_operand" "=r,r,r"))]
   "TARGET_VECTOR"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-PUT_MODE (operands[4], Pmode);
-riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
 if (which_alternative == 2)
   emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
-riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus (mode),
- riscv_vector::RVV_TERNOP, ops, 
operands[4]);
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
+  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
 DONE;
   }
   [(set_attr "type" "vimuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")])
 
 ;; -
 ;;  [INT] VNMSAC and VNMSUB
@@ -641,40 +642,41 @@
 
 (define_expand "fnma4"
   [(parallel
-[(set (match_operand:VI 0 "register_operand" "=vr")
+[(set (match_operand:VI 0 "register_operand")
(minus:VI
- (match_operand:VI 3 "register_operand"   " vr")
+ (match_operand:VI 3 "register_operand")
  (mult:VI
-   (match_operand:VI 1 "register_operand" " vr")
-   (match_operand:VI 2 "register_operand" " vr"
- (clobber (match_scratch:SI 4))])]
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "register_operand"
+ (clobber (match_dup 4))])]
   "TARGET_VECTOR"
-  {})
+  {
+operands[4] = gen_reg_rtx (Pmode);
+  })
 
-(define_insn_and_split "*fnma"
+(define_insn_and_split "*fnma"
   [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
  (minus:VI
(match_operand:VI 3 "register_operand"   " vr,  0,   vr")
(mult:VI
  (match_operand:VI 1 "register_operand" " %0, vr,   vr")
  (match_operand:VI 2 "register_operand" " vr, vr,   vr"
-   (clobber (match_scratch:SI 4 "=r,r,r"))]
+   (clobber (match_operand:P 4 "register_operand" "=r,r,r"))]
   "TARGET_VECTOR"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-PUT_MODE (operands[4], Pmode);
-riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
 if (which_alternative == 2)
   emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
-riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
-riscv_vector::RVV_TERNOP, ops, operands[4]);
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
 DONE;
   }
   [(set_attr "type" "vimuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")])
 
 ;;

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches

On Mon, 2023-06-19 at 15:17 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:46, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch takes the tests in vsx-vector-6-p7.h,  vsx-
> > vector-
> > 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of
> > smaller
> > test files by functionality rather than processor version.
> > 
> > The patch has been tested on Power 10 with no regressions.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> >Carl
> > 
> > --
> > rs6000: Update the vsx-vector-6.* tests.
> > 
> > The vsx-vector-6.h file is included into the processor specific
> > test files
> > vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The
> > .h file
> > contains a large number of vsx vector builtin tests.  The processor
> > specific files contain the number of instructions that the tests
> > are
> > expected to generate for that processor.  The tests are compile
> > only.
> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> > new tests are runnable tests to verify the builtin argument types
> > and the
> 
> But the newly added test cases are all with "dg-do compile", it
> doesn't
> match what you said here.

Ah, yea, that is wrong.  Fixed.

> 
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> 
> It's good to have more coverage with runnable case, but we miss some
> test
> coverages on the expected insn counts which cases p{7,8,9}.c can
> provide
> originally.  Unless we can ensure it's already tested somewhere else
> (do
> we? it wasn't stated in this patch), I think we still need those
> checks.

Yea, I was going with a runnable test and didn't include the
instruction counts.  Added back in.  Rather then doing by processor
version (P8, P9, P10) I was able to do it by BE/LE.  The instruction
counts were the same for LE accross processor versions but there are a
few instruction counts that vary with BE and LE.  

I did noticed in one of the tests that the compiler computed the
answers at compile time and thus didn't actually generate the builtin
code.  After digging a little more I found a few more tests where the
compiler was doing the calculations and just inserting the answers.

So, I moved all of the tests to functions so the compiler would
actually generate the desired builtin code.  

> 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 319 +
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 305 +
> >  .../powerpc/vsx-vector-6-func-2op.c   | 278 
> >  .../powerpc/vsx-vector-6-func-3op.c   | 229 ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 429
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 237 ++
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 --
> >  10 files changed, 1797 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..90a360ea158
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-ve

[PATCH ver 2] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 156 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 223 ++
 .../powerpc/vsx-vector-6-func-2op.c   | 142 +
 .../powerpc/vsx-vector-6-func-3op.c   | 273 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 205 +
 .../powerpc/vsx-vector-6-func-cmp.c   | 130 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 --
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1129 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..0d4e237673b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,156 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+  /* Macro to check the results for the various floating point argument tests.
+   */
+#define FLOAT_CHECK(NAME)  \
+  f_result = vec_##NAME(f_src);\
+   \
+  if ((f_result[0] != f_##NAME##_expected[0]) ||   \
+  (f_result[1] != f_##NAME##_expected[1]) ||   \
+  (f_result[2] != f_##NAME##_expected[2]) ||   \
+  (f_result[3] != f_##NAME##_expected[3])) \
+{  \
+  if (DEBUG) { \
+printf("ERROR: vec_%s (float) expected value does not match\n",\
+   #NAME);

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-21 Thread Thiago Jung Bauermann via Gcc-patches



Hello,

Jeff Law  writes:

> On 6/19/23 22:52, Tamar Christina wrote:
>
>>> It's a bit hackish, but could we reject the stack pointer for operand1 in 
>>> the
>>> stack-tie?  And if we do so, does it help?
>> Yeah this one I had to defer until later this week to look at closer because 
>> what I'm
>> wondering about is whether the optimization should apply to frame related
>> RTX as well.
>> Looking at the description of RTX_FRAME_RELATED_P that this optimization may
>> end up de-optimizing RISC targets by creating an offset that is larger than 
>> offset
>> which can be used from a SP making reload having to spill.  i.e. sometimes 
>> the
>> move was explicitly done. So perhaps it should not apply it to
>> RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?
>> Other parts of this pass already seems to bail out in similar situations. So 
>> I needed
>> to
>> write some testcases to check what would happen in these cases hence the 
>> deferral.
>> to later in the week.
> Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better 
> in general to
> me.  The cases where we're looking to clean things up aren't really in the
> prologue/epilogue, but instead in the main body after register elimination 
> has turned fp
> into sp + offset, thus making all kinds of things no longer valid.

The problems I reported were fixed by commits:

580b74a79146 "aarch64: Robustify stack tie handling"
079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"

Thanks!

But unfortunately I'm still seeing bootstrap failures (ICE segmentation
fault) in today's trunk with build config bootstrap-lto in both
armv8l-linux-gnueabihf and aarch64-linux-gnu.

If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
the stack pointer if possible" from trunk then both bootstraps succeed.

Here's the command I'm using to build on armv8l:

~/src/configure \
SHELL=/bin/bash \
--with-gnu-as \
--with-gnu-ld \
--disable-libmudflap \
--enable-lto \
--enable-shared \
--without-included-gettext \
--enable-nls \
--with-system-zlib \
--disable-sjlj-exceptions \
--enable-gnu-unique-object \
--enable-linker-build-id \
--disable-libstdcxx-pch \
--enable-c99 \
--enable-clocale=gnu \
--enable-libstdcxx-debug \
--enable-long-long \
--with-cloog=no \
--with-ppl=no \
--with-isl=no \
--disable-multilib \
--with-float=hard \
--with-fpu=neon-fp-armv8 \
--with-mode=thumb \
--with-arch=armv8-a \
--enable-threads=posix \
--enable-multiarch \
--enable-libstdcxx-time=yes \
--enable-gnu-indirect-function \
--disable-werror \
--enable-checking=yes \
--enable-bootstrap \
--with-build-config=bootstrap-lto \
--enable-languages=c,c++,fortran,lto \
&& make \
profiledbootstrap \
SHELL=/bin/bash \
-w \
-j 40 \
CFLAGS_FOR_BUILD="-pipe -g -O2" \
CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
LDFLAGS_FOR_BUILD="-static-libgcc" \
MAKEINFOFLAGS=--force \
BUILD_INFO="" \
MAKEINFO=echo

And here's the slightly different one for aarch64-linux:

~/src/configure \
SHELL=/bin/bash \
--with-gnu-as \
--with-gnu-ld \
--disable-libmudflap \
--enable-lto \
--enable-shared \
--without-included-gettext \
--enable-nls \
--with-system-zlib \
--disable-sjlj-exceptions \
--enable-gnu-unique-object \
--enable-linker-build-id \
--disable-libstdcxx-pch \
--enable-c99 \
--enable-clocale=gnu \
--enable-libstdcxx-debug \
--enable-long-long \
--with-cloog=no \
--with-ppl=no \
--with-isl=no \
--disable-multilib \
--enable-fix-cortex-a53-835769 \
--enable-fix-cortex-a53-843419 \
--with-arch=armv8-a \
--enable-threads=posix \
--enable-multiarch \
--enable-libstdcxx-time=yes \
--enable-gnu-indirect-function \
--disable-werror \
--enable-checking=yes \
--enable-bootstrap \
--with-build-config=bootstrap-lto \
--enable-languages=c,c++,fortran,lto \
&& make \
profiledbootstrap \
SHELL=/bin/bash \
-w \
-j 40 \
LDFLAGS_FOR_TARGET="-Wl,-fix-cortex-a53-843419" \
CFLAGS_FOR_BUILD="-pipe -g -O2" \
CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
LDFLAGS_FOR_BUILD="-static-libgcc" \
MAKEINFOFLAGS=--force \
BUILD_INFO="" \
MAKEINFO=echo

-- 
Thiago

RE: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-21 Thread Li, Pan2 via Gcc-patches

Hi there,

I try to verify the offloading following below doc.

https://gcc.gnu.org/wiki/Offloading#How_to_build_an_offloading-enabled_GCC

with some steps:

1. Build nvptx-tools.
2. Symbol link nvptx-newlib to gcc source code.
3. Build the Nividia PTX accel compiler.
4. Build the host compiler with nvptx as offload target, but I don't have the 
GPU, then drop the --with-cuda-driver=xxx option.
5. Run command for building, aka ./nvptx-tools/usr/local/bin/gcc -O0 -fopenmp 
test.c -o test.elf.

The building complete successfully, but looks I cannot run it without GPU, and 
I am not very sure this is good enough for validation or not.

Pan

-Original Message-
From: Li, Pan2 
Sent: Wednesday, June 21, 2023 3:23 PM
To: Jakub Jelinek 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de
Subject: RE: [PATCH] RISC-V: Fix out of range memory access of machine mode 
table

Thanks Jakub, will fix the format issue and send the V3 patch, as well as try 
to validate it for offloading.

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Wednesday, June 21, 2023 3:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH] RISC-V: Fix out of range memory access of machine mode 
table

On Wed, Jun 21, 2023 at 06:59:08AM +, Li, Pan2 wrote:
>  inline machine_mode
>  bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
> -  return (machine_mode)
> -((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
> +  lto_input_block *input_block =  (class lto_input_block *) bp->stream;

Still 2 spaces instead of 1 here, otherwise it LGTM, but the important
question is if you have actually tested it with offloading, because only
that will verify it works correctly.
See https://gcc.gnu.org/wiki/Offloading for details.

Jakub

Go patch committed: Determine types of Slice_{value, info} expressions

2023-06-21 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend determines the types of a couple of
expressions types that accidentally failed to recurse into their
subexpressions.  The test case for this is https://go.dev/cl/505015.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
f42544e04a131cee886cb7cdc65df1e2f09baf8c
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index dbb2d68f909..a028350ba8e 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-6a1d165c2218cd127ee937a1f45599075762f716
+195060166e6045408a2cb95e6aa88c6f0b98f20b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 4ac55af7433..2112de6abfc 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -18307,6 +18307,16 @@ Slice_value_expression::do_traverse(Traverse* traverse)
   return TRAVERSE_CONTINUE;
 }
 
+// Determine type of a slice value.
+
+void
+Slice_value_expression::do_determine_type(const Type_context*)
+{
+  this->valmem_->determine_type_no_context();
+  this->len_->determine_type_no_context();
+  this->cap_->determine_type_no_context();
+}
+
 Expression*
 Slice_value_expression::do_copy()
 {
diff --git a/gcc/go/gofrontend/expressions.h b/gcc/go/gofrontend/expressions.h
index 3d7e78711bd..bdb7ccd010d 100644
--- a/gcc/go/gofrontend/expressions.h
+++ b/gcc/go/gofrontend/expressions.h
@@ -4364,8 +4364,7 @@ class Slice_value_expression : public Expression
   { return this->type_; }
 
   void
-  do_determine_type(const Type_context*)
-  { }
+  do_determine_type(const Type_context*);
 
   Expression*
   do_copy();
@@ -4419,7 +4418,7 @@ class Slice_info_expression : public Expression
 
   void
   do_determine_type(const Type_context*)
-  { }
+  { this->slice_->determine_type_no_context(); }
 
   Expression*
   do_copy()

Re: [PATCH] Introduce hardbool attribute for C

2023-06-21 Thread Alexandre Oliva via Gcc-patches

Thanks for the test.

Did you mean for me to incorporate it into the patch, or do you mean to
contribute it separately, if the feature happens to be accepted?

On Jun 19, 2023, Bernhard Reutner-Fischer  wrote:

> I don't see explicit tests with _Complex nor __complex__. Would we
> want to check these here, or are they handled thought the "underlying"
> tests above?

Good question.  The notion of using complex types to hold booleans
hadn't even crossed my mind.

On the one hand, there doesn't seem to be reason to rule them out, and
that could go for literally any other type.

On the other, there doesn't seem to be any useful case for them.  Can
anyone think of one?

> I'd welcome a fortran interop note in the docs

Is there any good place for such interop notes?  I'm not sure I'm
best-suited to write them up, since Fortran is not a language I'm
very familiar with, but I suppose I could give it a try.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH zero-call-used-regs] Add leafy mode for zero-call-used-regs

2023-06-21 Thread Alexandre Oliva via Gcc-patches

Hello, Qing,

On Jun 16, 2023, Qing Zhao  wrote:

> As I mentioned in the previous round of review, I think that the documentation
>  might need to add more details on what’s the LEAFY mode,
> The purpose of it, and how to use it, provide more details to the end-users.

I'm afraid I'm having difficulty picturing what it is that you're
looking for.  The proposal incorporates, by reference, all the
documentation for 'used' and for 'all', and the way to use it is no
different.

>> +Same as @samp{used} in a leaf function, and same as @samp{all} in a
>> +nonleaf function.

If there was documentation on how to choose between e.g. all and used, I
suppose I could build on that to add this intermediate choice, but...  I
can't find any such docs, and I'm uncertain on whether adding that would
be useful to begin with.

Did you have something else in mind?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] Introduce hardbool attribute for C

2023-06-21 Thread Alexandre Oliva via Gcc-patches

On Jun 21, 2023, Qing Zhao  wrote:

> I see that you have testing case to check the above built_in_trap call
> is generated by FE.
> Do you have a testing case to check the trap is happening at runtime? 

I have written such tests, using type-punning, but I don't think our
testing infrastructure could take trapping as success and other results
as failure.

> So, when -ftrivial-auto-var-init presents, what’s the behavior for the
> hardened Boolean auto variables?

Good question.  This option was not even available when hardbool was
designed and implemented.  (tests) The deferred_init internal function
initializes with bit-patterns 0x00 or 0xfe, regardless of type, when the
data lives in memory, and otherwise forces the 0x00 bit pattern for
booleans, variable-sized types, types that cannot be accessed with a
single mode or for modes that don't have a set pattern.

It's hard for me to tell what "correct" or "expected" would be here.
Enumerators don't seem to be given special treatment.  Checked
enumerators, constrained integral subtypes, none of these would get
well-formed values or even checking at the assignments.

If I were to design this option myself, I'd probably arrange for it to
handle booleans (including hardened booleans) by zero-initializing as
false and pattern-initializing as true, though I realize that this could
be very confusing if one chose to use 0xfe as the value for false and/or
0x00 as the value for true.

I'd probably have arranged for the front-end to create the initializer
value, because expansion time is too late to figure it out: we may not
even have the front-end at hand any more, in case of lto compilation.

But with the current description and implementation, I guess the
behavior is correct, if not ideal: the bit patterns refer to the
representation, rather than to the language interpretation of the value.
When it comes ot integral types, they may match, but floating-point,
fixed fractional types, offsets and multipliers, even boolean member of
larger structs...  not so much: the effect is that of a memset, rather
than that of an assignment of zero or of the pattern to a variable.

Now, I acknowledge that the decision to make implicit
zero-initialization of boolean types set them to the value for false,
rather than to all-bits-zero representation, is a departure from common
practice of zero-initialization yielding logical zero.  That was unusual
enough that I thought it worth mentioning in the docs.

> This might need to be documented and also handled correctly. 

I suppose the place to document this distinction between logical values
and representation would be under -ftrivial-auto-var-init.  That's
likely where someone using that option would look for guidance on how it
interacts with unusual types, and where exceptions to general
expectations WRT initialization would go.  Do you concur?

That said, it probalby makes sense to refer to / mention that
-ftrivial-auto-var-init does not special-case hardened booleans in the
hardened booleans documentation.  I wonder if there are other
conflicting options I'm not even aware of.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.

2023-06-21 Thread Aldy Hernandez via Gcc-patches

Ping*2

On Wed, Jun 14, 2023, 14:10 Aldy Hernandez  wrote:

> PING
>
> On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
> >
> > Minor cleanups to get rid of value_range in IPA.  There's only one left,
> > but it's in the switch code which is integer specific.
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> > * ipa-cp.cc (decide_whether_version_node): Adjust comment.
> > * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
> > for Value_Range.
> > (set_switch_stmt_execution_predicate): Same.
> > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
> > ---
> >  gcc/ipa-cp.cc|  3 +--
> >  gcc/ipa-fnsummary.cc | 22 ++
> >  gcc/ipa-prop.cc  |  9 +++--
> >  3 files changed, 18 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index 03273666ea2..2e64415096e 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node
> *node)
> > {
> >   /* If some values generated for self-recursive calls with
> >  arithmetic jump functions fall outside of the known
> > -value_range for the parameter, we can skip them.  VR
> interface
> > -supports this only for integers now.  */
> > +range for the parameter, we can skip them.  */
> >   if (TREE_CODE (val->value) == INTEGER_CST
> >   && !plats->m_value_range.bottom_p ()
> >   && !ipa_range_contains_p (plats->m_value_range.m_vr,
> > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> > index 0474af8991e..1ce8501fe85 100644
> > --- a/gcc/ipa-fnsummary.cc
> > +++ b/gcc/ipa-fnsummary.cc
> > @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct
> cgraph_node *node,
> >   if (vr.varying_p () || vr.undefined_p ())
> > break;
> >
> > - value_range res;
> > + Value_Range res (op->type);
> >   if (!op->val[0])
> > {
> > + Value_Range varying (op->type);
> > + varying.set_varying (op->type);
> >   range_op_handler handler (op->code, op->type);
> >   if (!handler
> >   || !res.supports_type_p (op->type)
> > - || !handler.fold_range (res, op->type, vr,
> > - value_range
> (op->type)))
> > + || !handler.fold_range (res, op->type, vr,
> varying))
> > res.set_varying (op->type);
> > }
> >   else if (!op->val[1])
> > {
> > - value_range op0;
> > + Value_Range op0 (op->type);
> >   range_op_handler handler (op->code, op->type);
> >
> >   ipa_range_set_and_normalize (op0, op->val[0]);
> > @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct
> cgraph_node *node,
> > }
> >   if (!vr.varying_p () && !vr.undefined_p ())
> > {
> > - value_range res;
> > - value_range val_vr;
> > + int_range<2> res;
> > + Value_Range val_vr (TREE_TYPE (c->val));
> >   range_op_handler handler (c->code, boolean_type_node);
> >
> >   ipa_range_set_and_normalize (val_vr, c->val);
> >
> >   if (!handler
> > - || !res.supports_type_p (boolean_type_node)
> > + || !val_vr.supports_type_p (TREE_TYPE (c->val))
> >   || !handler.fold_range (res, boolean_type_node,
> vr, val_vr))
> > res.set_varying (boolean_type_node);
> >
> > @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct
> ipa_func_body_info *fbi,
> >int bound_limit = opt_for_fn (fbi->node->decl,
> > param_ipa_max_switch_predicate_bounds);
> >int bound_count = 0;
> > -  value_range vr;
> > +  // This can safely be an integer range, as switches can only hold
> > +  // integers.
> > +  int_range<2> vr;
> >
> >get_range_query (cfun)->range_of_expr (vr, op);
> >if (vr.undefined_p ())
> >  vr.set_varying (TREE_TYPE (op));
> >tree vr_min, vr_max;
> > +  // ?? This entire function could use a rewrite to use the irange
> > +  // API, instead of trying to recreate its intersection/union logic.
> > +  // Any use of get_legacy_range() is a serious code smell.
> >value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max);
> >wide_int vr_wmin = wi::to_wide (vr_min);
> >wide_int vr_wmax = wi::to_wide (vr_max);
> > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> > index 6383bc11e0a..5f9e6dbbff2 100644
> > --- a/gcc/ipa-prop.cc
> > +++ b/gcc/ipa-prop.cc
> > @@ -

Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.

2023-06-21 Thread Aldy Hernandez via Gcc-patches

Ping*2

On Wed, Jun 14, 2023, 14:09 Aldy Hernandez  wrote:

> PING
>
> On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
> >
> > This patch converts the ipa_jump_func code to use the type agnostic
> > ipa_vr suitable for GC instead of value_range which is integer specific.
> >
> > I've disabled the range cacheing to simplify the patch for review, but
> > it is handled in the next patch in the series.
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> > * ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
> > * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
> > (ipa_set_jfunc_vr): Take a range.
> > (ipa_compute_jump_functions_for_edge): Pass range to
> > ipa_set_jfunc_vr.
> > (ipa_write_jump_function): Call streamer write helper.
> > (ipa_read_jump_function): Call streamer read helper.
> > * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.
> > ---
> >  gcc/ipa-cp.cc   | 15 +++
> >  gcc/ipa-prop.cc | 70 ++---
> >  gcc/ipa-prop.h  |  5 +++-
> >  3 files changed, 44 insertions(+), 46 deletions(-)
> >
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index bdbc2184b5f..03273666ea2 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -1928,6 +1928,21 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr,
> >   && !dst_vr.undefined_p ());
> >  }
> >
> > +/* Same as above, but the SRC_VR argument is an IPA_VR which must
> > +   first be extracted onto a vrange.  */
> > +
> > +static bool
> > +ipa_vr_operation_and_type_effects (vrange &dst_vr,
> > +  const ipa_vr &src_vr,
> > +  enum tree_code operation,
> > +  tree dst_type, tree src_type)
> > +{
> > +  Value_Range tmp;
> > +  src_vr.get_vrange (tmp);
> > +  return ipa_vr_operation_and_type_effects (dst_vr, tmp, operation,
> > +   dst_type, src_type);
> > +}
> > +
> >  /* Determine range of JFUNC given that INFO describes the caller node or
> > the one it is inlined to, CS is the call graph edge corresponding to
> JFUNC
> > and PARM_TYPE of the parameter.  */
> > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> > index bbfe0f8aa45..c46a89f1b49 100644
> > --- a/gcc/ipa-prop.cc
> > +++ b/gcc/ipa-prop.cc
> > @@ -2287,9 +2287,10 @@ ipa_set_jfunc_bits (ipa_jump_func *jf, const
> widest_int &value,
> >  /* Return a pointer to a value_range just like *TMP, but either find it
> in
> > ipa_vr_hash_table or allocate it in GC memory.  TMP->equiv must be
> NULL.  */
> >
> > -static value_range *
> > -ipa_get_value_range (value_range *tmp)
> > +static ipa_vr *
> > +ipa_get_value_range (const vrange &tmp)
> >  {
> > +  /* FIXME: Add hashing support.
> >value_range **slot = ipa_vr_hash_table->find_slot (tmp, INSERT);
> >if (*slot)
> >  return *slot;
> > @@ -2297,40 +2298,27 @@ ipa_get_value_range (value_range *tmp)
> >value_range *vr = new (ggc_alloc ()) value_range;
> >*vr = *tmp;
> >*slot = vr;
> > +  */
> > +  ipa_vr *vr = new (ggc_alloc ()) ipa_vr (tmp);
> >
> >return vr;
> >  }
> >
> > -/* Return a pointer to a value range consisting of TYPE, MIN, MAX and
> an empty
> > -   equiv set. Use hash table in order to avoid creating multiple same
> copies of
> > -   value_ranges.  */
> > -
> > -static value_range *
> > -ipa_get_value_range (enum value_range_kind kind, tree min, tree max)
> > -{
> > -  value_range tmp (TREE_TYPE (min),
> > -  wi::to_wide (min), wi::to_wide (max), kind);
> > -  return ipa_get_value_range (&tmp);
> > -}
> > -
> > -/* Assign to JF a pointer to a value_range structure with TYPE, MIN and
> MAX and
> > -   a NULL equiv bitmap.  Use hash table in order to avoid creating
> multiple
> > -   same value_range structures.  */
> > +/* Assign to JF a pointer to a value_range just like TMP but either
> fetch a
> > +   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
> >
> >  static void
> > -ipa_set_jfunc_vr (ipa_jump_func *jf, enum value_range_kind type,
> > - tree min, tree max)
> > +ipa_set_jfunc_vr (ipa_jump_func *jf, const vrange &tmp)
> >  {
> > -  jf->m_vr = ipa_get_value_range (type, min, max);
> > +  jf->m_vr = ipa_get_value_range (tmp);
> >  }
> >
> > -/* Assign to JF a pointer to a value_range just like TMP but either
> fetch a
> > -   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
> > -
> >  static void
> > -ipa_set_jfunc_vr (ipa_jump_func *jf, value_range *tmp)
> > +ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr)
> >  {
> > -  jf->m_vr = ipa_get_value_range (tmp);
> > +  Value_Range tmp;
> > +  vr.get_vrange (tmp);
> > +  ipa_set_jfunc_vr (jf, tmp);
> >  }
> >
> >  /* Compute jump function for all arguments of callsite CS and insert the
> > @@ -2392,8 +2380,8 @@ ipa_compute_jump_functions_for_edge (struct
> ipa_func_body_info *fbi,
> >
> >   if (addr_nonzer

Re: [PATCH] Implement ipa_vr hashing.

2023-06-21 Thread Aldy Hernandez via Gcc-patches

Ping*2

On Wed, Jun 14, 2023, 14:11 Aldy Hernandez  wrote:

> PING
>
> On Sat, Jun 10, 2023 at 10:30 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 5/29/23 16:51, Martin Jambor wrote:
> > > Hi,
> > >
> > > On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> > >> Implement hashing for ipa_vr.  When all is said and done, all these
> > >> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered
> by
> > >> the similar 7% increase in this area last week.  So we get type
> agnostic
> > >> ranges with "infinite" range precision close to free.
> > >
> > > Do you know why/where this slow-down happens?  Do we perhaps want to
> > > limit the "infiniteness" a little somehow?
> >
> > I addressed the slow down in another mail.
> >
> > >
> > > Also, jump functions live for a long time, have you looked at how
> memory
> > > hungry they become?  I hope that the hashing would be good at
> preventing
> > > any issues.
> >
> > On a side-note, the caching does help.  On a (mistaken) hunch, I had
> > played around with removing caching for everything but UNDEFINED/VARYING
> > and zero/nonzero to simplify things, but the cache hit ratio was still
> > surprisingly high (+80%).  So good job there :-).
> >
> > >
> > > Generally, I think I OK with the patches if the impact on memory is not
> > > too bad, though I guess they depend on the one I looked at last week,
> so
> > > we may focus on that one first.
> >
> > I'm not sure whether this was an OK for the other patches, given you
> > approved the first patch, so I'll hold off until you give the go-ahead.
> >
> > Thanks.
> > Aldy
>

Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-21 Thread Paul Richard Thomas via Gcc-patches

Hi Both,

> while I only had a minor question regarding gfc_is_ptr_fcn(),
> can you still try to enlighten me why that second part
> was necessary?  (I believed it to be redundant and may have
> overlooked the obvious.)

Blast! I forgot about checking that. Lurking in the back of my mind
and going back to the first days of OOP in gfortran is a distinction
between a class entity with the pointer attribute and one whose data
component has the class_pointer attribute. I'll check it out and do
whatever is needed.


> > +  else if (context && gfc_is_ptr_fcn (assoc->target))
> > + {
> > +   if (!gfc_notify_std (GFC_STD_F2018, "%qs at %L associated to "
> > +"pointer function target being used in a "
> > +"variable definition context (%s)", name,
> > +&e->where, context))
>
> I'm curious why you decided to put context in braces and not simply use
> quotes as per %qs?

That's the way it's done in the preceding errors. I had to keep the
context in context, so to speak.

> > +/* Build the associate name  */
> > +static int
> > +build_associate_name (const char *name, gfc_expr **e1, gfc_expr **e2)
> > +{
>
> > +return 1;
>
> > +  return 0;
> > +}
>
> I've gone through the frontend recently and changed several such
> boolean functions to use bool where appropriate. May i ask folks to use
> narrower types in new code, please?
> Iff later in the pipeline it is considered appropriate or benefical to
> promote types, these will eventually be promoted.
>
> > diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
> > index e6a4337c0d2..18589e17843 100644
> > --- a/gcc/fortran/trans-decl.cc
> > +++ b/gcc/fortran/trans-decl.cc
>
> > @@ -1906,6 +1915,7 @@ gfc_get_symbol_decl (gfc_symbol * sym)
> >   gcc_assert (!sym->value || sym->value->expr_type == EXPR_NULL);
> >  }
> >
> > +
>

'twas accidental. There had previously been another version of the fix
that I commented out and the extra line crept in when I deleted it.
Thanks for the spot.

>
> Please kindly excuse my comment and, again, thanks!
>
> >gfc_finish_var_decl (decl, sym);
> >
> >if (sym->ts.type == BT_CHARACTER)

Regards

Paul

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, 21 Jun 2023, Jeff Law wrote:

> 
> 
> On 6/21/23 00:41, Richard Biener wrote:
> >> I thought during the introduction of erroneous path isolation that we
> >> concluded stores, calls and such had observable side effects that must be
> >> preserved, even when we hit a block that leads to __builtin_unreachable.
> > 
> > Indeed, I remember we repeatedly hit this in the past.  But
> > double-checking I see that we instrument
> > 
> >if (x)
> >  *(int *)0 = 0;
> > 
> > as
> > 
> > [local count: 1073741824]:
> >if (x_2(D) != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870913]:
> >MEM[(int *)0B] ={v} 0;
> >__builtin_trap ();
> > 
> > path isolation doesn't seem to use __builtin_unreachable ().  I did
> > not add __builtin_trap () as possible sink (but I did want to treat
> > __builtin_unreachable () and __builtin_unreachable_trap () the same
> > way).  The pass also marks the offending store as volatile.
> Nuts.  I mixed up trap vs unreachable in my own head.  Though I think for the
> purposes of this issue they should be treated the same.  The only difference
> is one actively halts the code, the other silently lets it keep going.

I think there's a difference in that __builtin_trap () is observable
while __builtin_unreachable () is not and reaching __builtin_unreachable 
() invokes undefined behavior while reaching __builtin_trap () does not.

So the isolation code marking the trapping code volatile should be
enough and the trap () is just there to end the basic block
(and maybe be on the safe side to really trap).

Richard.

> > So yes, I think preserving the original trap kind (if there is any)
> > is important and it still seems to work.  I don't remember whether
> > we have any test coverage for that though.  I'll also note that
> > __builtin_trap () has virtual operands (def and use) while
> > __builtin_unreachable[_trap] () are 'const'.  Honza correctly
> > says they should probably be novops instead of 'const' preserving
> > the fact that they have side-effects.
> If we have test coverage it's probably minimal -- a few things to validate
> proper behavior around builtin_trap plus whatever regressions came up.
> 
> 
> > I think it's desirable for assertions.  Since we elide plain
> > __builtin_unreachable () and fall thru whereever it leads that
> > shouldn't be an issue.
> > 
> > If I manually add a __builtin_unreachable () to the above case
> > I see the *(int *)0 = 0; store DSEd.  Maybe we should avoid
> > removing stores that might trap here?  POSIX wise such a trap
> > could be a way to jump out of the path leading to unreachable ()
> > via siglongjmp ...
> Yea, removing the store seemswrong .  As you note, someone could have a
> handler to catch the store, then longjump elsewhere to continue some kind of
> sensible execution.
> 
> The erroneous path isolation bits have some code to try and clean up the bogus
> path a bit.  Ideally leaving a single statement with undefined beahvior and
> the trap.  I wonder if there's any code you could re-use from that effort.
> 
> Essentially a mini pass that cleans up paths post-dominated by a
> builtin_unreachable or builtin_trap.

Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-06-21 Thread Richard Biener via Gcc-patches

On Wed, 21 Jun 2023, Jeff Law wrote:

> 
> 
> On 6/21/23 01:49, Richard Biener via Gcc-patches wrote:
> > The following addresses a miscompilation by RTL scheduling related
> > to the representation of masked stores.  For that we have
> > 
> > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ]
> > [90])
> >  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >  )
> >  (const_int -4 [0xfffc] [1 MEM
> >   [(int *)vectp_b.12_28]+0 S64 A32])
> >  (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
> >  (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
> >  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >  )
> >  (const_int -4 [0xfffc] [1 MEM
> >   [(int *)vectp_b.12_28]+0 S64
> >  A32])
> > 
> > and specifically the memory attributes
> > 
> >[1 MEM  [(int *)vectp_b.12_28]+0 S64 A32]
> > 
> > are problematic.  They tell us the instruction stores and reads a full
> > vector which it if course does not.  There isn't any good MEM_EXPR
> > we can use here (we lack a way to just specify a pointer and restrict
> > info for example), and since the MEMs have a vector mode it's
> > difficult in general as passes do not need to look at the memory
> > attributes at all.
> > 
> > The easiest way to avoid running into the alias analysis problem is
> > to scrap the MEM_EXPR when we expand the internal functions for
> > partial loads/stores.  That avoids the disambiguation we run into
> > which is realizing that we store to an object of less size as
> > the size of the mode we appear to store.
> > 
> > After the patch we see just
> > 
> >[1  S64 A32]
> > 
> > so we preserve the alias set, the alignment and the size (the size
> > is redundant if the MEM insn't BLKmode).  That's still not good
> > in case the RTL alias oracle would implement the same
> > disambiguation but it fends off the gimple one.
> > 
> > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> > and --param=vect-partial-vector-usage=1.
> > 
> > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> > side we might instead want to use a BLKmode MEM?  Any better
> > ideas here?
> I'd expect that using BLKmode will fend off the RTL aliasing code.

I suspect there's no way to specify the desired semantics?  OTOH
code that looks at the MEM operand only and not the insn (which
should have some UNSPEC wrapped) needs to be conservative, so maybe
the alias code shouldn't assume that a (mem:V16SI ..) actually
performs an access of the size of V16SI at the specified location?

Anyway, any opinion on the actual patch?  It's enough to fix the
observed miscompilation.

Thanks,
Richard.

85 matches

Mail list logo