Re: [PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

2023-10-09 Thread juzhe.zhong
Do you mean add a check whether it is vectorized or not?Sounds reasonable, I can add that in another patch. Replied Message FromJeff LawDate10/09/2023 21:51 ToJuzhe-Zhong,gcc-patches@gcc.gnu.org Ccrguent...@suse.deSubjectRe: [PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

On 10/9/23 07:39, Juzhe-Zhong wrote:
> RVV vectorize it with stride5 load_lanes.
>  
> gcc/testsuite/ChangeLog:
>  
>     * gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
OK.

As a follow-up, would it make sense to test the .vect dump for something  
else in the ! {vec_load_lanes && vect_strided5 } case to verify that it  
does and continues to be vectorized for that configuration?

jeff



Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-10 Thread juzhe.zhong
I am working on it. Currently,  we have about 50+ additional FAILs after enabling vectorization.some of them need fixed on middle-end. E.g richard fixed a missed cse optimization.Some need fix on test case.I am analyzing each fail one by one.I prefer postpone this patch since it will cause some additional fails and I will handle that eventually after full coverage analysis. Replied Message FromJeff LawDate10/10/2023 21:33 Tojuzhe.zh...@rivai.ai,macro Ccgcc-patches,Robin Dapp,Kito.cheng,Richard BienerSubjectRe: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

On 10/9/23 19:13, juzhe.zh...@rivai.ai wrote:
> Oh. I realize this patch increase FAIL that I recently fixed:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html  
> 
>  
> This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer  
> will failed at first time but succeed at 2nd time),
> then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2  (ARM SVE 2  
> times because they have vec_pack_trunc_optab).
>  
> I think the root cause of RVV failing at multiple tests of "vect" is  
> that we don't enable vec_pack/vec_unpack/... stuff,
> we still succeed at vectorizations and we want to enable tests of them
> (Mostly just using different approach to vectorize it (cause dump FAIL)  
> because of some changing I have done previously in the middle-end).
>  
> So enabling "vec_pack" for RVV will fix some FAILs but increase some  
> other FAILs.
>  
> CC to Richi to see more reasonable suggestions.
So what is the summary on Maciej's patch to enable vec_pack_trunc?  ie,  
is it something we should move forward with as-is, is it superceded by  
your work in this space or does it need further investigation because of  
differences in testing methodologies or something else?

jeff



Re: [PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

2023-10-12 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate10/13/2023 13:33 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Add test for FP iroundf auto vectorization


Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/02/2023 19:48 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator


Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-05 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/05/2023 17:30 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec


Re: [PATCH] RISC-V: Early expand DImode vec_duplicate in RV32 system

2023-11-06 Thread juzhe.zhong
OK。will add it. Replied Message FromKito ChengDate11/06/2023 20:46 Tojuzhe.zh...@rivai.ai Cckito.cheng,gcc-patches,jeffreyalaw,Robin DappSubjectRe: Re: [PATCH] RISC-V: Early expand DImode vec_duplicate in RV32 systemI would prefer to add a dedicated test case to test that, so that we
could also cover that even if we didn't enable multi-lib testing for
RV32, and I suppose that should only require compile test for part of
that test case ?

On Mon, Nov 6, 2023 at 8:41 PM juzhe.zh...@rivai.ai
 wrote:
>
> Testcase already existed on the trunk, which is added by Li Pan added recently when supporting rounding mode autovec.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635280.html
>
> math-llrintf-run-0.c passed on RV64 but cause ICE on RV32.
>
>
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-11-06 20:38
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Early expand DImode vec_duplicate in RV32 system
> Could you add a testcase? other than that LGTM.
>
> On Mon, Nov 6, 2023 at 8:27 PM Juzhe-Zhong  wrote:
> >
> > An ICE was discovered in recent rounding autovec support:
> >
> > config/riscv/riscv-v.cc:4314
> >    65 | }
> >   | ^
> > 0x1fa5223 riscv_vector::validate_change_or_fail(rtx_def*, rtx_def**,
> > rtx_def*, bool)
> > /home/pli/repos/gcc/222/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv-v.cc:4314
> > 0x1fb1aa2 pre_vsetvl::remove_avl_operand()
> > /home/pli/repos/gcc/222/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv-vsetvl.cc:3342
> > 0x1fb18c1 pre_vsetvl::cleaup()
> > /home/pli/repos/gcc/222/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv-vsetvl.cc:3308
> > 0x1fb216d pass_vsetvl::lazy_vsetvl()
> > /home/pli/repos/gcc/222/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv-vsetvl.cc:3480
> > 0x1fb2214 pass_vsetvl::execute(function*)
> > /home/pli/repos/gcc/222/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv-vsetvl.cc:3504
> >
> > The root cause is that the RA reload into (set (reg) vec_duplicate:DI). However, it is not valid in RV32 system
> > since we don't have a single broadcast instruction DI scalar in RV32 system.
> > We should expand it early for RV32 system.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/predicates.md: Refine predicate.
> > * config/riscv/riscv-protos.h (can_be_broadcasted_p): New function.
> > * config/riscv/riscv-v.cc (can_be_broadcasted_p): Ditto.
> > * config/riscv/vector.md (vec_duplicate): New pattern.
> > (*vec_duplicate): Adapt pattern.
> >
> > ---
> >  gcc/config/riscv/predicates.md  |  9 +
> >  gcc/config/riscv/riscv-protos.h |  1 +
> >  gcc/config/riscv/riscv-v.cc | 20 
> >  gcc/config/riscv/vector.md  | 20 +++-
> >  4 files changed, 41 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> > index db18054607f..df1c66f3a76 100644
> > --- a/gcc/config/riscv/predicates.md
> > +++ b/gcc/config/riscv/predicates.md
> > @@ -553,14 +553,7 @@
> >
> >  ;; The scalar operand can be directly broadcast by RVV instructions.
> >  (define_predicate "direct_broadcast_operand"
> > -  (and (match_test "!(reload_completed && !FLOAT_MODE_P (GET_MODE (op))
> > -   && (register_operand (op, GET_MODE (op)) || CONST_INT_P (op)
> > -   || rtx_equal_p (op, CONST0_RTX (GET_MODE (op
> > -   && maybe_gt (GET_MODE_BITSIZE (GET_MODE (op)), GET_MODE_BITSIZE (Pmode)))")
> > -    (ior (match_test "rtx_equal_p (op, CONST0_RTX (GET_MODE (op)))")
> > - (ior (match_code "const_int,const_poly_int")
> > -  (ior (match_operand 0 "register_operand")
> > -   (match_test "satisfies_constraint_Wdm (op)"))
> > +  (match_test "riscv_vector::can_be_broadcasted_p (op)"))
> >
> >  ;; A CONST_INT operand that has exactly two bits cleared.
> >  (define_predicate "const_nottwobits_operand"
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index 6cbf2130f88..acae00f653f 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -595,6 +595,7 @@ uint8_t get_sew (rtx_insn *);
> >  enum vlmul_type get_vlmul (rtx_insn *);
> >  int count_regno_occurrences (rtx_insn *, unsigned int);
> >  bool imm_avl_p (machine_mode);
> > +bool can_be_broadcasted_p (rtx);
> >  }
> >
> >  /* We classify builtin types into two classes:
> > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> > index 80d2bb9e289..a64946213c3 100644
> > --- a/gcc/config/riscv/riscv-v.cc
> > +++ b/gcc/config/riscv/riscv-v.cc
> > @@ -4417,4 +4417,24 @@ count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
> >    return count;
> >  }
> >
> > +/* Return true if the OP can be directly broadcasted

Re: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-06 Thread juzhe.zhong
Not sure who is maintaining this branch. I always developing on the master.  CCing to other riscv folks Replied Message FromMaxim BlinovDate11/06/2023 21:13 ToRichard Biener Ccgcc-patches@gcc.gnu.org,juzhe.zh...@rivai.ai,maxim.bli...@imgtec.comSubjectRe: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssaOn Mon, 6 Nov 2023 at 13:07, Richard Biener  wrote:
> I see
>
> DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
>    vec_extract, vec_extract)
>
> ?

Oh, you're right! I should have checked the master branch first... and
I was even wondering why it wasn't marked as such. Should perhaps
cherry pick this for gcc-13-with-riscv-opts?



[PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread juzhe.zhong
> gcc/
>
>  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
>  that vectorization factor is a compile-time constant.
>
> ---
>   gcc/tree-vect-loop-manip.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 6aa3d2ed0bf..1ad1961c788 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
> /* It's guaranteed that vector loop bound before vectorization is at
>least VF, so set range information for newly generated var. */
> -  if (new_var_p)
> +  if (new_var_p && vf.is_constant ())
>   {
> value_range vr (type,
> wi::to_wide (build_int_cst (type, vf)),

I don't think we need to apply this limit in case of RVV auto-vectorization.
I have talked with Kito and I have a full solution of supporting RVV solution.

We are going to support RVV auto-vectorization in 3 configuration according to 
RVV ISA spec:
1. -march=zve32* support QI and HI auto-vectorization by VNx4QImode and 
VNx2HImode
2. -march=zve64* support QI and HI and SI auto-vectorization by VNx8QImode and 
VNx4HImode and VNx2SImode
3. -march=v* support QI and HI and SI and DI auto-vectorization by VNx16QImode 
and VNx8HImode and VNx4SImode and VNx2DImode

I will support them in GCC 14. Current loop vectorizer works well for us no 
need to fix it. 
Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread juzhe.zhong
Currently, upstream GCC is not ready to support auto-vec.
I am building the basic infrastructure of RVV and need more testing.
I can't support auto-vec now since it depends on the infrastructure tha I am 
building.
I have open source "rvv-next" in RISC-V foundation repo which fully support 
intrinsic && auto-vec.
You can either wait for the upstream GCC or develop base rvv-next.



juzhe.zh...@rivai.ai
 
From: Michael Collison
Date: 2023-02-23 01:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; richard.sandiford; richard.guenther
Subject: Re: [PATCH] vect: Check that vector factor is a compile-time constant
Juzhe,
I disagree with this comment. There are many stakeholders for autovectorization 
and waiting until GCC 14 is not a viable solution for us as well as other 
stakeholders ready to begin work on autovectorization.
As we discussed I have been moving forward with patches for autovectorization 
and am preparing to send them to gcc-patches. This assert is preventing code 
from compiling and needs to be addressed.
If you have a solution in either the RISCV backend or in this file can you 
please present it?
On 2/22/23 10:27, juzhe.zh...@rivai.ai wrote:
> gcc/
>
>  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
>  that vectorization factor is a compile-time constant.
>
> ---
>   gcc/tree-vect-loop-manip.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 6aa3d2ed0bf..1ad1961c788 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
> /* It's guaranteed that vector loop bound before vectorization is at
>least VF, so set range information for newly generated var. */
> -  if (new_var_p)
> +  if (new_var_p && vf.is_constant ())
>   {
> value_range vr (type,
> wi::to_wide (build_int_cst (type, vf)),

I don't think we need to apply this limit in case of RVV auto-vectorization.
I have talked with Kito and I have a full solution of supporting RVV solution.

We are going to support RVV auto-vectorization in 3 configuration according to 
RVV ISA spec:
1. -march=zve32* support QI and HI auto-vectorization by VNx4QImode and 
VNx2HImode
2. -march=zve64* support QI and HI and SI auto-vectorization by VNx8QImode and 
VNx4HImode and VNx2SImode
3. -march=v* support QI and HI and SI and DI auto-vectorization by VNx16QImode 
and VNx8HImode and VNx4SImode and VNx2DImode

I will support them in GCC 14. Current loop vectorizer works well for us no 
need to fix it. 
Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread juzhe.zhong
Besides, since GCC 13 currently is on stage 4. 
Unlike the infrastructure that I am building for intrinsic && auto-vec which is 
safe and will not affect the original RISC-V port functionality.
Auto-vectorization will potentially affect the orignal RISC-V port 
functionality which is not safe to support in current stage of GCC 13.



juzhe.zh...@rivai.ai
 
From: Michael Collison
Date: 2023-02-23 01:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; richard.sandiford; richard.guenther
Subject: Re: [PATCH] vect: Check that vector factor is a compile-time constant
Juzhe,
I disagree with this comment. There are many stakeholders for autovectorization 
and waiting until GCC 14 is not a viable solution for us as well as other 
stakeholders ready to begin work on autovectorization.
As we discussed I have been moving forward with patches for autovectorization 
and am preparing to send them to gcc-patches. This assert is preventing code 
from compiling and needs to be addressed.
If you have a solution in either the RISCV backend or in this file can you 
please present it?
On 2/22/23 10:27, juzhe.zh...@rivai.ai wrote:
> gcc/
>
>  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
>  that vectorization factor is a compile-time constant.
>
> ---
>   gcc/tree-vect-loop-manip.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 6aa3d2ed0bf..1ad1961c788 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
> /* It's guaranteed that vector loop bound before vectorization is at
>least VF, so set range information for newly generated var. */
> -  if (new_var_p)
> +  if (new_var_p && vf.is_constant ())
>   {
> value_range vr (type,
> wi::to_wide (build_int_cst (type, vf)),

I don't think we need to apply this limit in case of RVV auto-vectorization.
I have talked with Kito and I have a full solution of supporting RVV solution.

We are going to support RVV auto-vectorization in 3 configuration according to 
RVV ISA spec:
1. -march=zve32* support QI and HI auto-vectorization by VNx4QImode and 
VNx2HImode
2. -march=zve64* support QI and HI and SI auto-vectorization by VNx8QImode and 
VNx4HImode and VNx2SImode
3. -march=v* support QI and HI and SI and DI auto-vectorization by VNx16QImode 
and VNx8HImode and VNx4SImode and VNx2DImode

I will support them in GCC 14. Current loop vectorizer works well for us no 
need to fix it. 
Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-01 Thread juzhe.zhong
Let's me first introduce RVV load/store basics  and stack allocation.
For scalable vector memory allocation, we allocate memory according to machine 
vector-length.
To get this CPU vector-length value (runtime invariant but compile time 
unknown), we have an instruction call csrr vlenb.
For example, csrr a5,vlenb (store CPU a single register vector-length value 
(describe as bytesize) in a5 register).
A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That 
means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.

Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize 
poly (1,1). So their storage consumes the same size.
Meaning when we want to allocate a memory storge or stack for register 
spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, 
VNx8BI are doing the same process as I described above. They all consume
the same memory storage size since we can't model them accurately according to 
precision or you bitsize.

They consume the same storage (I am agree it's better to model them more 
accurately in case of memory storage comsuming).

Well, even though they are consuming same size memory storage, I can make their 
memory accessing behavior (load/store) accurately by
emiting  the accurate RVV instruction for them according to RVV ISA.

VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size  poly 
(1,1)
The instruction for these modes as follows:
VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.

So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI 
accurately according to precision or bitsize.
This implementation is fine even though their memory storage is not accurate.

However, the problem is that since they have the same bytesize, GCC will think 
they are the same and do some incorrect statement elimination:

(Note: Load same memory base)
load v0 VNx1BI from base0
load v1 VNx2BI from base0
load v2 VNx4BI from base0
load v3 VNx8BI from base0

store v0 base1
store v1 base2
store v2 base3
store v3 base4

This program sequence, in GCC, it will eliminate the last 3 load instructions.

Then it will become:

load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size 
(1,1) memory data)

store v0 base1
store v0 base2
store v0 base3
store v0 base4

This is what we want to fix. I think as long as we can have the way to 
differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
and GCC will not do th incorrect elimination for RVV. 

I think it can work fine  even though these 4 modes consume inaccurate memory 
storage size
but accurate data memory access load store behavior.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-03-01 21:19
To: Pan Li via Gcc-patches
CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Pan Li via Gcc-patches  writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more 
> information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with 
> different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka 
> [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addia4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addia1,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v   v24,0(a1)
 
But I think it's important to think about the patch as more than a way
of fixing the bug above.  The aim has to be to describe the modes as they
really are.
 
I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
A GET_MODE_SIZE that is too small would cause problems.  So would a
GET_MODE_SIZE that is too big.
 
Like Richard says, I think the question comes down to the amount of padding.
Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
of padding for even X and 0 bits of padding for odd X?
 
I agree getting rid of GET_MODE_SIZE and representing everything in bits
would avoid the problem at this point, but I think it would just be pushing
the difficulty elsewhere.  E.g. stack layout will be "interesting" if we
can't work in byte sizes.
 
Thanks,
Richard
 


Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-01 Thread juzhe.zhong
Sorry for missleading typo.

>> VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
>> VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
>> VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
>> VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.

It should be:
 VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
 VNx2BI: vsevl e8mf4 + vlm,  loading 1/4 of poly (1,1) storage.
 VNx4BI: vsevl e8mf2 + vlm,  loading 1/2 of poly (1,1) storage.
 VNx8BI: vsevl e8m1 + vlm,  loading 1 of poly (1,1) storage.

Plz be aware of this . Thanks. 


juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-03-01 21:50
To: richard.sandiford; gcc-patches
CC: rguenther; Pan Li; pan2.li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Let's me first introduce RVV load/store basics  and stack allocation.
For scalable vector memory allocation, we allocate memory according to machine 
vector-length.
To get this CPU vector-length value (runtime invariant but compile time 
unknown), we have an instruction call csrr vlenb.
For example, csrr a5,vlenb (store CPU a single register vector-length value 
(describe as bytesize) in a5 register).
A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That 
means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.

Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize 
poly (1,1). So their storage consumes the same size.
Meaning when we want to allocate a memory storge or stack for register 
spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, 
VNx8BI are doing the same process as I described above. They all consume
the same memory storage size since we can't model them accurately according to 
precision or you bitsize.

They consume the same storage (I am agree it's better to model them more 
accurately in case of memory storage comsuming).

Well, even though they are consuming same size memory storage, I can make their 
memory accessing behavior (load/store) accurately by
emiting  the accurate RVV instruction for them according to RVV ISA.

VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size  poly 
(1,1)
The instruction for these modes as follows:
VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.

So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI 
accurately according to precision or bitsize.
This implementation is fine even though their memory storage is not accurate.

However, the problem is that since they have the same bytesize, GCC will think 
they are the same and do some incorrect statement elimination:

(Note: Load same memory base)
load v0 VNx1BI from base0
load v1 VNx2BI from base0
load v2 VNx4BI from base0
load v3 VNx8BI from base0

store v0 base1
store v1 base2
store v2 base3
store v3 base4

This program sequence, in GCC, it will eliminate the last 3 load instructions.

Then it will become:

load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size 
(1,1) memory data)

store v0 base1
store v0 base2
store v0 base3
store v0 base4

This is what we want to fix. I think as long as we can have the way to 
differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
and GCC will not do th incorrect elimination for RVV. 

I think it can work fine  even though these 4 modes consume inaccurate memory 
storage size
but accurate data memory access load store behavior.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-03-01 21:19
To: Pan Li via Gcc-patches
CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Pan Li via Gcc-patches  writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more 
> information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with 
> different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka 
> [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addia4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addia1,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v   v24,0(a1)
 
But I think it's important to think about the patch as more than a way
of fixing the bug above.  The aim has to be to describe the modes as they
really are.
 
I do

Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-01 Thread juzhe.zhong
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work.  It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.

>> Correct?
Yes.

>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> memory address well-defined?  That is, how is the padding handled
>> by the machine load/store instructions?

storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 
1/8  poly (1,1) ~ 2/8  poly (1,1) memory data unchange.
load VNx2BI will load 0 ~ 2/8  poly (1,1), note that 0 ~ 1/8 poly (1,1) is the 
date that we store above, 1/8  poly (1,1) ~ 2/8  poly (1,1)  is the orignal 
memory data.
You can see here for this case (LLVM):
https://godbolt.org/z/P9e1adrd3
foo:# @foo
vsetvli a2, zero, e8, mf8, ta, ma
vsm.v   v0, (a0)
vsetvli a2, zero, e8, mf4, ta, ma
vlm.v   v8, (a0)
vsm.v   v8, (a1)
ret

We can also doing like this in GCC as long as we can differentiate VNx1BI and 
VNx2BI, and GCC do not eliminate statement according precision even though
they have same bytesize.

First we emit vsetvl e8mf8 +vsm for VNx1BI
Then we emit vsetvl e8mf8 + vlm for VNx2BI

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-03-01 22:03
To: juzhe.zhong
CC: richard.sandiford; gcc-patches; Pan Li; pan2.li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Wed, 1 Mar 2023, Richard Biener wrote:
 
> On Wed, 1 Mar 2023, juzhe.zh...@rivai.ai wrote:
> 
> > Let's me first introduce RVV load/store basics  and stack allocation.
> > For scalable vector memory allocation, we allocate memory according to 
> > machine vector-length.
> > To get this CPU vector-length value (runtime invariant but compile time 
> > unknown), we have an instruction call csrr vlenb.
> > For example, csrr a5,vlenb (store CPU a single register vector-length value 
> > (describe as bytesize) in a5 register).
> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. 
> > That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> > 
> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same 
> > bytesize poly (1,1). So their storage consumes the same size.
> > Meaning when we want to allocate a memory storge or stack for register 
> > spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = 
> > a5/8)
> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, 
> > VNx8BI are doing the same process as I described above. They all consume
> > the same memory storage size since we can't model them accurately according 
> > to precision or you bitsize.
> > 
> > They consume the same storage (I am agree it's better to model them more 
> > accurately in case of memory storage comsuming).
> > 
> > Well, even though they are consuming same size memory storage, I can make 
> > their memory accessing behavior (load/store) accurately by
> > emiting  the accurate RVV instruction for them according to RVV ISA.
> > 
> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size  
> > poly (1,1)
> > The instruction for these modes as follows:
> > VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
> > VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
> > VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
> > VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.
> > 
> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, 
> > VNx8BI accurately according to precision or bitsize.
> > This implementation is fine even though their memory storage is not 
> > accurate.
> > 
> > However, the problem is that since they have the same bytesize, GCC will 
> > think they are the same and do some incorrect statement elimination:
> > 
> > (Note: Load same memory base)
> > load v0 VNx1BI from base0
> > load v1 VNx2BI from base0
> > load v2 VNx4BI from base0
> > load v3 VNx8BI from base0
> > 
> > store v0 base1
> > store v1 base2
> > store v2 base3
> > store v3 base4
> > 
> > This program sequence, in GCC, it will eliminate the last 3 load 
> > instructions.
> > 
> > Then it will become:
> > 
> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly 
> > size (1,1) memory data)
> > 
> > store v0 base1
> > store v0 base2
> > store v0 base3
> > store v0 base4
> > 
> > This is wha

Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-01 Thread juzhe.zhong
>> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> Or is the GCC size larger in some cases than the number of bytes
>> loaded and stored?
For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack 
for register spillling
according to ADJUST_BYTESIZE. 
After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE 
(vsetvl e8mf8).
After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE 
(vsetvl e8mf2).
After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE 
(vsetvl e8mf4).
After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE 
(vsetvl e8m1).

Note: except these 4 machine modes, all other machine modes of RVV, 
ADJUST_BYTESIZE
are equal to the real number of bytes of load/store instruction that RVV ISA 
define.

Well, as I said, it's fine that we allocated larger memory for 
VNx1BI,VNx2BI,VNx4BI, 
we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward 
according 
to the machine_mode as long as long GCC didn't do the incorrect elimination in 
middle-end.

Besides, poly (1,1) is 1/8 of machine vector-length which is already really a 
small number,
which is the real number bytes loaded/stored for VNx8BI.
You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually 
load/stored by appropriate vsetvl
since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's 
totally fine so far as long as we can
gurantee the correctness and I think optimizing such memory storage consuming 
is trivial.

>> And does it equal the size of the corresponding LLVM machine type?

Well, for some reason, in case of register spilling, LLVM consume much more 
memory than GCC.
And they always do whole register load/store (a single vector register 
vector-length) for register spilling.
That's another story (I am not going to talk too much about this since it's a 
quite ugly implementation). 
They don't model the types accurately according RVV ISA for register spilling.

In case of normal load/store like:
vbool8_t v2 = *(vbool8_t*)in;  *(vbool8_t*)(out + 100) = v2;
This kind of load/store, their load/stores instructions of codegen are accurate.
Even though their instructions are accurate for load/store accessing behavior, 
I am not sure whether size 
of their machine type is accurate.

For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
  VNx2BI of GCC is represented as vscale x 2 x i1
in LLVM IR.
I am not sure the bytesize of  vscale x 1 x i1 and vscale x 2 x i1.
I didn't take a deep a look at it.

I think this question is not that important, no matter whether VNx1BI and 
VNx2BI are modeled accurately in case of ADUST_BYTESIZE
in GCC or  vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case 
of  their bytesize,
I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine 
for RVV  even though in some case, their memory allocation
is not accurate in compiler.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-03-02 00:14
To: Li\, Pan2
CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2"  writes:
> Thanks all for so much valuable and helpful materials.
>
> As I understand (Please help to correct me if any mistake.), for the VNx*BI 
> (aka, 1, 2, 4, 8, 16, 32, 64),
> the precision and mode size need to be adjusted as below.
>
> Precision size [1, 2, 4, 8, 16, 32, 64]
> Mode size [1, 1, 1, 1, 2, 4, 8]
>
> Given that, if we ignore the self-test failure, only the adjust_precision 
> part is able to fix the bug I mentioned.
> The genmode will first get the precision, and then leverage the mode_size = 
> exact_div / 8 to generate.
> Meanwhile, it also provides the adjust_mode_size after the mode_size 
> generation.
>
> The riscv parts has the mode_size_adjust already and the value of mode_size 
> will be overridden by the adjustments.
 
Ah, OK!  In that case, would the following help:
 
Turn:
 
  mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
 
into:
 
  if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
   &mode_size[E_%smode]))
mode_size[E_%smode] = -1;
 
where -1 is an "obviously wrong" value.
 
Ports that might hit the -1 are then responsible for setting the size
later, via ADJUST_BYTESIZE.
 
After all the adjustments are complete, genmodes asserts that no size is
known_eq to -1.
 
That way, target-independent code doesn't need to guess what the
correct behaviour is.
 
Does the eventual value set by ADJUST_BYTESIZE equal the real number of
bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
And does it equal t

RISC-V: Add auto-vectorization support

2023-03-03 Thread juzhe.zhong
>> This series of patches adds foundational support for RISC-V 
>> autovectorization. These patches are based on the current upstream rvv 
>> vector intrinsic support and is not a new implementation. Most of the 
>> implementation consists of adding the new vector cost model, the 
>> autovectorization patterns themselves and target hooks.
>> This implementation only provides support for integer addition and 
>> subtraction as a proof of concept.
>> As discussed on this list, if these patches are approved they will be 
>> merged into a "auto-vectorization" branch once gcc-13 branches for release.
>> There are two known issues related to crashes (assert failures) 
>> associated with tree vectorization; one of which I have sent a patch for 
>> and have received feedback. I will be sending a patch for the second 
>> issue tomorrow.

These patches have so many issues:
1. You should not arithmetic operation without supporting auto-vectorization 
load/stores.
2. RVV cost model is totally incorrect since they are just my experimental work 
without any
benchmark tuning.
3. ICE in auto-vectorization base on current upstream framework.
4. The vector-length/LMUL compile option is not ratified, we can't push the 
unratified
compile option. The compile option should be consistent with LLVM.
5. The current RVV instruction machine descriptions are not stable, you can not 
support auto-vec
base on unstable machine descriptions.
etc. so many issues.

So I totally disagree this set of pathes. These patches are coming from my 
original ugly experimental 
RVV work in RISC-V repo:
https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next  that I 
already abandoned (I no longer maintained). 

Currently, we (I && kito) have finished all intrinsics (except segment 
instructions) and machine descriptions,
we will keep testing and fine-tunning && fix bugs until GCC 13 release.  We 
should wait until machine descriptions
are stable to support auto-vec. So don't do any auto-vec during stage 4 in GCC 
13 plz. 

I have an elegent implementation in my downstream.
And I will start to auto-vec when GCC 14 is open.

Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH] RISC-V: Add fault first load C/C++ support

2023-03-08 Thread juzhe.zhong
Address comment and fix it in this V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613608.html



juzhe.zh...@rivai.ai
 
From: Bernhard Reutner-Fischer
Date: 2023-03-09 05:16
To: juzhe.zhong; gcc-patches
CC: kito.cheng; Ju-Zhe Zhong
Subject: Re: [PATCH] RISC-V: Add fault first load C/C++ support
On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote:
>From: Ju-Zhe Zhong 
>
 
>+class vleff : public function_base
>+{
>+public:
>+  unsigned int call_properties (const function_instance &) const override
>+  {
>+return CP_READ_MEMORY | CP_WRITE_CSR;
>+  }
>+
>+  gimple *fold (gimple_folder &f) const override
>+  {
>+/* fold vleff (const *base, size_t *new_vl, size_t vl)
>+
>+   > vleff (const *base, size_t vl)
>+  new_vl = MEM_REF[read_vl ()].  */
>+
>+auto_vec vargs;
 
Where is that magic 8 coming from?
 
Wouldn't you rather have one temporary to hold this manually CSEd
 
nargs = gimple_call_num_args (f.call) - 2;
 
which you would use throughout this function as it does not seem to change?
 
Would you reserve something based off nargs for the auto_vec above?
If not, please add a comment where the 8 comes from?
 
thanks,
 
>+
>+for (unsigned i = 0; i < gimple_call_num_args (f.call); i++)
>+  {
>+ /* Exclude size_t *new_vl argument.  */
>+ if (i == gimple_call_num_args (f.call) - 2)
>+   continue;
>+
>+ vargs.quick_push (gimple_call_arg (f.call, i));
>+  }
>+
>+gimple *repl = gimple_build_call_vec (gimple_call_fn (f.call), vargs);
>+gimple_call_set_lhs (repl, f.lhs);
>+
>+/* Handle size_t *new_vl by read_vl.  */
>+tree new_vl = gimple_call_arg (f.call, gimple_call_num_args (f.call) - 2);
>+if (integer_zerop (new_vl))
>+  {
>+ /* This case happens when user passes the nullptr to new_vl argument.
>+In this case, we just need to ignore the new_vl argument and return
>+vleff instruction directly. */
>+ return repl;
>+  }
>+
>+tree tmp_var = create_tmp_var (size_type_node, "new_vl");
>+tree decl = get_read_vl_decl ();
>+gimple *g = gimple_build_call (decl, 0);
>+gimple_call_set_lhs (g, tmp_var);
>+tree indirect
>+  = fold_build2 (MEM_REF, size_type_node,
>+  gimple_call_arg (f.call,
>+   gimple_call_num_args (f.call) - 2),
>+  build_int_cst (build_pointer_type (size_type_node), 0));
>+gassign *assign = gimple_build_assign (indirect, tmp_var);
>+
>+gsi_insert_after (f.gsi, assign, GSI_SAME_STMT);
>+gsi_insert_after (f.gsi, g, GSI_SAME_STMT);
>+return repl;
>+  }
>+
 
 


Re: Re: [PATCH] RISC-V: Refine reduction RA constraint according to RVV ISA

2023-03-14 Thread juzhe.zhong
Since according to RVV ISA, "The destination vector register can overlap the 
source operands, including the mask register."
That means we can have vredsum.vs v0,v8,v9,v0.t. This patch is to refine the 
constraint to allow this happen that the current RA constraint doesn't allow.
Since you can see "vd" to match  "vm", vd doesn't include mask register (v0). 
This trivial optimization can allow our RA have 1 more register to allocate.
It's overall beneficial to the RA. 



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-03-15 02:05
To: juzhe.zhong; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Refine reduction RA constraint according to RVV ISA
 
 
On 3/13/23 03:05, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> According to RVV ISA:
> 14. Vector Reduction Operations
> 
> "The destination vector register can overlap the source operands, including 
> the mask register."
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Refine RA constraint.
This feels like it ought to wait for gcc-14 as well.
 
One question though, why even bother with the matching constraint at all 
in these patterns?  ISTM it doesn't really accomplish anything. 
Removing it allows a single alternative to handle all the possibilities.
 
Jeff
 


Re: [PATCH V3] RISC-V: Adjust scalar_to_vec cost

2024-01-12 Thread juzhe.zhong
VLA is a known issue for a long time.GCC doesn't have too much cse optimization forVLA vectors. It should be a big work to investigate what's going on.I think most cse optimization for precomputed result are vls loop. So I think as long as we can do a good job on cost model which pick appropriate vls loop. It's not big issue and not high priority for me. Replied Message FromRobin DappDate01/12/2024 18:10 ToJuzhe-Zhong,gcc-patches@gcc.gnu.org Ccrdapp@gmail.com,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH V3] RISC-V: Adjust scalar_to_vec cost> Tested on both RV32/RV64 no regression, Ok for trunk ?

Yes, thanks!

Btw out of curiosity, did you see why we actually fail to
optimize away the VLA loop?  We should open a bug for that
I suppose.

Regards
 Robin




Re: [PATCH v2] RISC-V: remove param riscv-vector-abi. [PR113538]

2024-01-25 Thread juzhe.zhong
lgtm Replied Message Fromyanzhang.w...@intel.comDate01/25/2024 21:06 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,kito.ch...@sifive.com,pan2...@intel.com,yanzhang.w...@intel.comSubject[PATCH v2] RISC-V: remove param riscv-vector-abi. [PR113538]


Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong
I don”t choose to run since I didn”t have issue run on my local simulator no matter qemu or spike.So it”s better to check vsetvl asm.full available is not consistent between LCM analysis and earliest fusion,so it”s safe to postpone it. Replied Message FromRobin DappDate12/13/2023 20:08 ToJuzhe-Zhong,gcc-patches@gcc.gnu.org Ccrdapp@gmail.com,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]Hi Juzhe,

in general looks OK to me.

Just a question for understanding:

> -  if (header_info.valid_p ()
> -      && (anticipated_exp_p (header_info) || block_info.full_available))

Why is full_available true if we cannot use it?

> +/* { dg-do compile } */

It would be nice if we could make this a run test as well.

Regards
 Robin



Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong
Do you mean add some comments in tests? Replied Message FromRobin DappDate12/13/2023 20:16 Tojuzhe.zhong Ccrdapp@gmail.com,gcc-patches@gcc.gnu.org,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]
> I don”t choose to run since I didn”t have issue run on my local
> simulator no matter qemu or spike.

Yes it was flaky.  That's kind of expected with the out-of-bounds
writes we did.  They can depend on runtime environment and other
factors.  Of course it's a bit counterintuitive to add a (before)
passing test but, with the proper comment, if it ever FAILed at some
point in the future we'd have a point of reference.  

Regards
 Robin



Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong
OK. will add it later. Replied Message FromRobin DappDate12/13/2023 20:23 Tojuzhe.zhong Ccrdapp@gmail.com,gcc-patches@gcc.gnu.org,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]> Do you mean add some comments in tests?

I meant add it as a run test as well and comment that the test
has caused out-of-bounds writes before and passed by the time of
adding it (or so) and is kept regardless.

Regards
 Robin



Re: [PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

2023-12-13 Thread juzhe.zhong
lgtm from my side. But I'd like to see Robin's commentsThanks Replied Message Frompan2...@intel.comDate12/13/2023 21:49 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,rdapp@gmail.comSubject[PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988


Re: [PATCH v1] RISC-V: Fix POLY INT handle bug

2023-12-17 Thread juzhe.zhong
lgtm. Replied Message Frompan2...@intel.comDate12/18/2023 08:22 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Fix POLY INT handle bug


Re: [PATCH] RISC-V: Add viota missed avl_type attribute

2023-12-17 Thread juzhe.zhong
lgtm Replied Message FromLi XuDate12/18/2023 09:04 Togcc-patches@gcc.gnu.org Cckito.ch...@gmail.com,pal...@dabbelt.com,juzhe.zh...@rivai.aiSubject[PATCH] RISC-V: Add viota missed avl_type attribute


Re: [PATCH] testsuite: Fix dump checks under different riscv-sim for RVV.

2023-12-18 Thread juzhe.zhong
ok Replied Message FromLi XuDate12/19/2023 13:31 Togcc-patches@gcc.gnu.org Cckito.ch...@gmail.com,pal...@dabbelt.com,juzhe.zh...@rivai.ai,xuliSubject[PATCH] testsuite: Fix dump checks under different riscv-sim for RVV.


Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread juzhe.zhong
I am on vacation today. I will back tomorrow or late tonight.  I think we can land theadvector before spring festival as long as it is not invasive to RVV1.0 Replied Message FromjoshuaDate01/08/2024 11:17 ToKito Cheng Ccjuzhe.zh...@rivai.ai,jeffreyalaw,gcc-patches,Jim Wilson,palmer,andrew,philipp.tomsich,christoph.muellner,jinma,cooper.quSubjectRe:Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.Hi Kito,

Thank you for your support.
So even during stage 4, we can merge this for GCC 14?





--
发件人:Kito Cheng 
发送时间:2024年1月8日(星期一) 11:06
收件人:joshua
抄 送:"juzhe.zh...@rivai.ai"; jeffreyalaw; "gcc-patches"; Jim Wilson; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; jinma; "cooper.qu"
主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.


I am ok with merging this for GCC 14, as we discussed several times in
the RISC-V GCC sync up meeting, I think at least we reach consensus
among Jeff Law, Palmer Dabbelt and me.

But please be careful: don't break anything for standard vector stuff.

On Mon, Jan 8, 2024 at 10:11 AM joshua  wrote:
>
> Hi Juzhe,
>
> Stage 3 will close today and there are still some patches that
> haven't been reviewed left.
> So is it possible to get xtheadvector merged in GCC-14?
> We emailed Kito regarding this, but haven't got any reply yet.
>
> Joshua
>
>
>
>
>
>
> --
> 发件人:juzhe.zh...@rivai.ai 
> 发送时间:2024年1月4日(星期四) 17:18
> 收件人:"cooper.joshua"; jeffreyalaw; "gcc-patches"
> 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; jinma; "cooper.qu"
> 主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
> \ No newline at end of file
> Each file needs newline.
>
>
> I am not able to review arch stuff. This needs kito.
>
>
> Besides, Andrew Pinski want us defer theadvector to GCC-15.
>
>
> I have no strong opinion here.
>
>
> juzhe.zh...@rivai.ai
>
>
> 发件人: joshua
> 发送时间: 2024-01-04 17:15
> 收件人: 钟居哲; Jeff Law; gcc-patches
> 抄送: jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> 主题: Re:Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
> Hi Juzhe,
>
> So is the following patch that this patch relies on OK to commit?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> Joshua
>
>
>
>
> --
> 发件人:钟居哲 
> 发送时间:2024年1月2日(星期二) 06:57
> 收件人:Jeff Law; "cooper.joshua"; "gcc-patches"
> 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; "Christoph Müllner"; jinma; Cooper Qu
> 主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
>
> I will be back to work so I will take a look at other patches today.
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> >       New function to add assembler insn code prefix/suffix.
> >       * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> >       * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h                    |  1 +
> >   gcc/config/riscv/riscv.cc                          | 14 ++
> >   gcc/config/riscv/riscv.h                           |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c     | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 10

Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-11-30 Thread juzhe.zhong
Yes. We can defer it Thanks richard. Replied Message FromRichard BienerDate11/30/2023 20:19 Tojuzhe.zh...@rivai.ai Cctamar.christina,gcc-patchesSubjectRe: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit codeOn Thu, Nov 30, 2023 at 12:11 PM juzhe.zh...@rivai.ai  wrote:

BIAS should be:        signed char biasval          = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);        tree bias = build_int_cst (intQI_type_node, biasval);
Currently, only IBM will set LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS -1 for some situations of len_load/len_store.Otherwise, it is always 0.But for consistency, I think we should use the codes as follows.I see your patch is so big and separate into multiple sub-patches.Do you have a patch that directly can be applied for whole support.I want to support length and test that base your patch.Can we defer LEN support for a followup?  I think we still need to set partial loop supportas disabled when there are any lengths with the initial patch for correctness.Richard. Thanks.
juzhe.zh...@rivai.ai
 From: Tamar ChristinaDate: 2023-11-30 18:58To: juzhe.zh...@rivai.ai; gcc-patchesCC: Richard BienerSubject: RE: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code


Hi Juzhe,
 
I meant that “lens” is undefined, from looking around I guess that needs to be
 
  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
 
for `bias` I meant
 
    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
 all true mask, cond, all false mask, len, bias);
 
that variable `bias` isn’t defined. And I can’t find any other usage of IFN_VCOND_MASK_LEN creation to figure out what it’s supposed to be

 
is it just an SImode 0?
 
Thanks,
Tamar
 
 



From: juzhe.zh...@rivai.ai  
Sent: Thursday, November 30, 2023 11:49 AM
To: Tamar Christina ; gcc-patches 
Cc: Richard Biener 
Subject: Re: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code


 

Thanks Tamar.


 


I am not sure whether I am not on the same page with you.


 


IMHO, ARM SVE will use the final mask = loop mask (generate by WHILE_ULT) & conditional mask.


Use that final mask to do the cbranch. Am I right ?


 


If yes, I leverage that for length and avoid too much codes change in your patch.


 


So, for RVV, the length is pretty same as loop mask in ARM SVE.


For example, suppose n = 4, in ARM SVE, WHILE_ULT (whilelo) generate mask = 0b000


Then use that mask to control the operations.


 


For RVV, is the same, length will be 4, then we will only process the elements with index < 4.


 


For bias, I think that won't be the issue. Currently, BIAS is not used by RVV and only used on len_load/len_store for IBM targets.


So, the bias value by default is 0 in all other situations except len_load/len_store specifically for IBM.


 







juzhe.zh...@rivai.ai





 




From: Tamar
 Christina


Date: 2023-11-30 18:39


To: juzhe.zh...@rivai.ai;
gcc-patches


CC: Richard
 Biener


Subject: RE: [PATCH 9/21]middle-end: implement
 vectorizable_early_exit for codegen of exit code





Hi Juzhe,
 
I’m happy to take the hunks, just that I can’t test it and don’t know the specifics of how it lens work.
I still need to read up on it.
 
I tried adding that chunk in, but for the first bit `lens` seems undefined, and the second bit it seems `bias` is undefined.
 
I’ll also need what to do for vectorizable_live_operations how to get the first element rather than the last.
 
Thanks,
Tamar
 



From:
juzhe.zh...@rivai.ai 

Sent: Thursday, November 30, 2023 4:48 AM
To: gcc-patches 
Cc: Richard Biener ; Tamar Christina 
Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code


 

Hi, Richard and Tamar.


 


I am sorry for bothering you.


Hope you don't mind I give some comments:


 


Can we support partial vector for length ?


 


IMHO, we can do that as follows:


 




bool length_loop_p
=
LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);

 

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P
 (loop_vinfo))


  {


   
if (direct_internal_fn_supported_p
 (IFN_VCOND_MASK_LEN, vectype,


                                        OPTIMIZE_FOR_SPEED))


     
vect_record_loop_len (loop_vinfo, lens, ncopies, vectype,
1);


   
else


     
vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
NULL);


  }

 

if (length_loop_p)


  {


    tree len
=
vect_get_loop_len (loop_vinfo, gsi, loop_lens,
1, vectype,
0,
0);


    /* Use VCOND_MASK_LEN (all true, cond, all false, len, bias) to generate


       final mask = i < len + bias ? cond[i] : false.  */


    cond
=
gimple_build (&cond_gsi,
 IFN_VCOND_MASK_LEN, truth_type,


                         all
true mask, cond, 

Re: [PATCH] RISC-V: testsuite: Fix strcmp-run.c test.

2023-12-11 Thread juzhe.zhong
lgtm. Replied Message FromRobin DappDate12/11/2023 21:40 Togcc-patches,palmer,Kito Cheng,jeffreyalaw,juzhe.zh...@rivai.ai,Li, Pan2 Ccrdapp@gmail.comSubject[PATCH] RISC-V: testsuite: Fix strcmp-run.c test.Hi,

this fixes expectations in the strcmp-run test which would sometimes
fail with newlib.  The test expects libc strcmp return values and
asserts the vectorized result is similar to those.  Therefore hard-code
the expected results instead of relying on a strcmp call.

Pan has already tested in a lot of configurations and doesn't see
failures anymore.

I'd argue it's obvious enough to push it if nobody complains :)

Regards
 Robin

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: Adjust test
    expectation.
---
 .../riscv/rvv/autovec/builtin/strcmp-run.c    | 23 ++-
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
index 6dec7da91c1..adbe022e0ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -1,8 +1,6 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -minline-strcmp" } */
  
-#include 
-
 int
 __attribute__ ((noipa))
 foo (const char *s, const char *t)
@@ -10,23 +8,26 @@ foo (const char *s, const char *t)
   return __builtin_strcmp (s, t);
 }
  
-int
-__attribute__ ((noipa, optimize ("0")))
-foo2 (const char *s, const char *t)
-{
-  return strcmp (s, t);
-}
-
 #define SZ 10
  
-int main ()
+int
+main ()
 {
   const char *s[SZ]
 = {"",  "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
    "a", "z",    "1", "9",  "12345678901234567889012345678901234567890"};
  
+  const int ref[SZ * SZ]
+    = {0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 97,  0,   49,     97, 64,
+   115, -25, 48,  40,  48,    48,  -49,  0,    48,  15,  -49, -74, -1,     -9, -1,
+   0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 33,  -64, -15, 33, 0,
+   -64, -89, -16, -24, -16, 97,  -115, 49,    97,  64,  0,   -25, 48,     40, 48,
+   122, 25,     74,  122, 89,    25,  0,       73,    65,  73,  49,  -48, 1,     49, 16,
+   -48, -73, 0,   -8,  -50, 57,  -40,  9,    57,  24,  -40, -65, 8,     0,  8,
+   49,  -48, 1,   49,  16,    -48, -73,  50,    -8,  0};
+
   for (int i = 0; i < SZ; i++)
 for (int j = 0; j < SZ; j++)
-  if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+  if (foo (s[i], s[j]) != ref [i * SZ + j])
 __builtin_abort ();
 }
--  
2.43.0




Re: [PATCH 1/1] [fwprop]: Add the support of forwarding the vec_duplicate rtx

2023-01-13 Thread juzhe.zhong
Hi, Richard. Would you mind take a look at this patch?
This is a proposal patch (We could add more testcase for ARM in the future).
But we want to know if this patch is a correct approach to achieve what we want.

In RVV (RISC-V Vector), we have a bunch of instructions: 
vadd.vx/vsub.vx/vmul.vx..etc.
Such instructions allows CPU do the operations between vector and scalar 
directly without any vector duplicate or broadcast instruction.
So this patch is quite important for RVV auto-vectorizaton support which can 
reduce a lot of gimple IR pattern.
I known GCC 13 is not the appropriate for this patch, we hope this can be done 
in GCC 14.

Thank you so much.


juzhe.zh...@rivai.ai
 
From: lehua.ding
Date: 2023-01-13 17:42
To: gcc-patches
CC: richard.sandiford; juzhe.zhong; Lehua Ding
Subject: [PATCH 1/1] [fwprop]: Add the support of forwarding the vec_duplicate 
rtx
From: Lehua Ding 
 
ps: Resend for adjusting the width of each line of text.
 
Hi,
 
When I was adding the new RISC-V auto-vectorization function, I found that
converting `vector-reg1 vop vector-vreg2` to `scalar-reg3 vop vectorreg2`
is not very easy to handle where `vector-reg1` is a vec_duplicate_expr.
For example the bellow gimple IR:
 
```gimple

vect_cst__51 = [vec_duplicate_expr] z_14(D);
 

vect_iftmp.13_53 = .LEN_COND_ADD(mask__40.9_47, vect__6.12_50, vect_cst__51, { 
0.0, ... }, curr_cnt_60);
```
 
I once wanted to add corresponding functions to gimple IR, such as adding
.LEN_COND_ADD_VS, and then convert .LEN_COND_ADD to .LEN_COND_ADD_VS in 
match.pd.
This method can be realized, but it will cause too many similar internal 
functions
to be added to gimple IR. It doesn't feel necessary. Later, I tried to combine 
them
on the combine pass but failed. Finally, I thought of adding the ability to 
support
forwarding `(vec_duplciate reg)` in fwprop pass, so I have this patch.
 
Because the current upstream does not support the RISC-V automatic vectorization
function, I found an example in sve that can also be optimized and simply tried
it. For the float type, one instruction can be reduced, for example the bellow C
code. The difference between the new and old assembly code is that the new one
uses the mov instruction to directly move the scalar variable to the vector 
register.
The old assembly code first moves the scalar variable to the vector register 
outside
the loop, and then uses the sel instruction. Compared with the entire assembly 
code,
the new assembly code has one instruction less. In addition, I noticed that some
instructions in the new assembly code are ahead of the `ble .L1` instruction.
I debugged and found that the modification was made in the ce1 pass. This pass
believes that moving up is more beneficial to performance.
 
In addition, for the int type, compared with the float type, the new assembly 
code
will have one more `fmov s2, w2` instruction, so I can't judge whether the
performance is better than the previous one. In fact, I mainly do RISC-V 
development work.
 
This patch is an exploratory patch and has not been tested too much. I mainly
want to see your suggestions on whether this method is feasible and possible
potential problems.
 
Best,
Lehua Ding
 
```c
/* compiler options: -O3 -march=armv8.2-a+sve -S */
void test1 (int *pred, float *x, float z, int n)
{
 for (int i = 0; i < n; i += 1)
   {
 x[i] = pred[i] != 1 ? x[i] : z;
   }
}
```
 
The old assembly code like this (compiler explorer link: 
https://godbolt.org/z/hxTnEhaqY):
 
```asm
test1:
 cmp w2, 0
 ble.L1
 mov x3, 0
 cntw x4
 mov z0.s, s0
 whilelo p0.s, wzr, w2
 ptrue p2.b, all
.L3:
 ld1w z2.s, p0/z, [x0, x3, lsl 2]
 ld1w z1.s, p0/z, [x1, x3, lsl 2]
 cmpne p1.s, p2/z, z2.s, #1
 sel z1.s, p1, z1.s, z0.s
 st1w z1.s, p0, [x1, x3, lsl 2]
 add x3, x3, x4
 while lo p0.s, w3, w2
 b.any.L3
.L1:
 ret
```
 
The new assembly code like this:
 
```asm
test1:
 whilelo p0.s, wzr, w2
 mov x3, 0
 cntw x4
 ptrue p2.b, all
 cmp w2, 0
 ble.L1
.L3:
 ld1w z2.s, p0/z, [x0, x3, lsl 2]
 ld1w z1.s, p0/z, [x1, x3, lsl 2]
 cmpne p1.s, p2/z, z2.s, #1
 mov z1.s, p1/m, s0
 st1w z1.s, p0, [x1, x3, lsl 2]
 add x3, x3, x4
 while lo p0.s, w3, w2
 b.any.L3
.L1:
 ret
```
 
 
gcc/ChangeLog:
 
* config/aarch64/aarch64-sve.md (@aarch64_sel_dup_vs): Add new 
pattern to capture new opeands order
* fwprop.cc (fwprop_propagation::profitable_p): Add new check
(reg_single_def_for_src_p): Add new function for src rtx
(forward_propagate_into): Change to new function call
 
---
gcc/config/aarch64/aarch64-sve.md | 20 
gcc/fwprop.cc | 16 +++-
2 files changed, 35 insertions(+), 1 deletion(-)
 
diff --git a/gcc/config/aa

Re: [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread juzhe.zhong
lgtm Replied Message FromLehua DingDate11/08/2023 21:27 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,kito.ch...@gmail.com,rdapp@gmail.com,pal...@rivosinc.com,jeffreya...@gmail.com,lehua.d...@rivai.aiSubject[PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl


Re: [PATCH v1] RISC-V: Support vec_init for trailing same element

2023-11-09 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/10/2023 14:22 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Support vec_init for trailing same element


Re: [PATCH] RISC-V: testsuite: Fix popcount test.

2023-11-21 Thread juzhe.zhong
ok Replied Message FromRobin DappDate11/21/2023 21:35 Togcc-patches,palmer,Kito Cheng,jeffreyalaw,juzhe.zh...@rivai.ai Ccrdapp@gmail.comSubjectRe: [PATCH] RISC-V: testsuite: Fix popcount test.> Mhm, not so obvious after all.  We vectorize 250 instances with
> rv32gcv, 229 with rv64gcv and 250 with rv64gcv_zbb.  Will have
> another look tomorrow.

The problem is that tree-vect-patterns is more restrictive than
necessary and does not vectorize everything it could.  Therefore
I'm going to commit the attached with a TODO comment and a
separate check for zbb.

Regards
 Robin

Subject: [PATCH v2] RISC-V: testsuite: Fix popcount test.

Due to Jakub's recent middle-end changes we now vectorize some more
popcount instances.  This patch just adjusts the dump check.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec/unop/popcount.c: Adjust check.
    * lib/target-supports.exp: Add riscv_zbb.
---
 .../gcc.target/riscv/rvv/autovec/unop/popcount.c  | 10 +-
 gcc/testsuite/lib/target-supports.exp | 11 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
index 585a522aa81..ca1319c2e7e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
@@ -1461,4 +1461,12 @@ main ()
   RUN_ALL ()
 }
  
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
+/* TODO: Due to an over-zealous check in tree-vect-patterns we do not vectorize
+   e.g.
+ uint64_t dst[];
+ uint32_t src[];
+ dst[i] = __builtin_popcountll (src[i]);
+   even though we could.  Therefore, for now, adjust the following checks.
+   This difference was exposed in r14-5557-g6dd4c703be17fa.  */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" { target { { rv64 } && { ! riscv_zbb } } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 250 "vect" { target { { rv32 } || { riscv_zbb } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f3cd0311e27..87b2ae58720 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1983,6 +1983,17 @@ proc check_effective_target_riscv_ztso { } {
 }]
 }
  
+# Return 1 if the target arch supports the Zbb extension, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_riscv_zbb { } {
+    return [check_no_compiler_messages riscv_ext_zbb assembly {
+   #ifndef __riscv_zbb
+   #error "Not __riscv_zbb"
+   #endif
+    }]
+}
+
 # Return 1 if we can execute code when using dg-add-options riscv_v
  
 proc check_effective_target_riscv_v_ok { } {
--  
2.42.0




Re: [PATCH] RISC-V: Fix VSETVL PASS regression

2023-11-27 Thread juzhe.zhong
committed as it passed zvl128/256/512/1024 no regression. Replied Message FromJuzhe-ZhongDate11/27/2023 21:24 Togcc-patches@gcc.gnu.org Cckito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.com,rdapp@gmail.com,Juzhe-ZhongSubject[PATCH] RISC-V: Fix VSETVL PASS regression


Re: [PATCH] RISC-V: Fixed ICE caused by missing operand

2023-09-19 Thread juzhe.zhong
LGTM Replied Message FromLehua DingDate09/20/2023 13:39 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,kito.ch...@gmail.com,rdapp@gmail.com,pal...@rivosinc.com,jeffreya...@gmail.com,lehua.d...@rivai.aiSubject[PATCH] RISC-V: Fixed ICE caused by missing operand


Re: [PATCH v1] RISC-V: Rename the test macro for math autovec test

2023-09-21 Thread juzhe.zhong
ok Replied Message Frompan2...@intel.comDate09/22/2023 11:47 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Rename the test macro for math autovec test


Re: Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread juzhe.zhong
Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix it 
here.
We dynamic adjust the minimum vector-length for different '-march' according to 
RVV ISA specification.
So we strongly suggest that we should drop this fix.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-19 02:21
To: Richard Biener; Jeff Law; Palmer Dabbelt
CC: Michael Collison; gcc-patches; 钟居哲
Subject: Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple 
of 2.
Few more background about RVV:
 
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
 
Current status (without that patch):
 
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
 
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
 
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
 
However `v` extension guarantees the minimal VLEN is 128 bits.
 
We are trying to introduce another type/mode mapping for this configure:
 
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
 
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
 
 
 
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng  wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison  
> > wrote:
> > >
> > > While working on autovectorizing for the RISCV port I encountered an issue
> > > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes (e.g. 
> > > VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison  
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > > +2, &half_nelts))
> > > {
> > >   /* Try fusing consecutive sequences of COUNT / NVECTORS 
> > > elements
> > >  together into elements of type INT_TYPE and using the 
> > > result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > >   poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE 
> > > (vector_type));
> > >   vec_perm_builder sel1 (nelts, 2, 3);
> > >   vec_perm_builder sel2 (nelts, 2, 3);
> > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > +
> > >   for (unsigned int i = 0; i < 3; ++i)
> > > {
> > >   sel1.quick_push (i);
> > > --
> > > 2.34.1
> > >
 


Re: Re: [PATCH] RISC-V: Fix PR109535

2023-04-18 Thread juzhe.zhong
>> ChangeLog should reference the bug number, like this:
>> PR target/109535
>> Seems like this ought to be static. Though it's not clear why
>> count_occurrences didn't do what you needed.  Can you explain why
>> count_occurrences was insufficient for your needs?

Address comment, I will resend a patch with referencing bug PR number and 
adding "static".
The reason why count_occurrences can not work since we want to count the regno 
occurrences
instead of rtx occurrences.

The bug issue reported by google/highway project:
(set(..)
   (reg:QI s0)
(reg:DI s0))

The "avl" operand rtx  = (reg:DI s0)
count_occurrences return 1 however the actual regno occurrences should be 2.

Thanks


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-19 03:00
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR109535
 
 
On 4/17/23 20:03, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New 
> function.
>  (pass_vsetvl::cleanup_insns): Fix bug.
ChangeLog should reference the bug number, like this:
 
PR target/109535
 
 
> 
> ---
>   gcc/config/riscv/riscv-vsetvl.cc | 15 ++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 1b66e3b9eeb..43e2cf08377 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
> cfg_bb,
> return true;
>   }
>   
> +/* Count the number of REGNO in RINSN.  */
> +int
> +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
Seems like this ought to be static. Though it's not clear why 
count_occurrences didn't do what you needed.  Can you explain why 
count_occurrences was insufficient for your needs?
 
 
 
 
Jeff
 


Re: Re: [PATCH] CPROP: Allow cprop optimization when the function has a single block

2023-02-02 Thread juzhe.zhong
We set VL/VTYPE these 2 implicit global status denpency register as fixed reg.
Then CSE can do the optimization now.

>> Yea.  I'm wondering about when the right place to introduce these
>>dependencies might be.  I'm still a few months out from worrying about
>>RVV, but it's not too far away.
You don't need to worry about RVV. I can promise you that RVV support in GCC 
will be solid and
optimal. You can just try. For example, try VSETVL PASS,  this PASS implemented 
in GCC is much better
than LLVM. I have include so many fancy optimizations there.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-02-02 22:36
To: juzhe.zh...@rivai.ai; rguenther
CC: gcc-patches; kito.cheng; richard.sandiford; apinski
Subject: Re: [PATCH] CPROP: Allow cprop optimization when the function has a 
single block
 
 
On 2/2/23 05:35, juzhe.zh...@rivai.ai wrote:
> Thank you so much. Kito helped me fix it already.
> RVV instruction patterns can have CSE optimizations now.
What was the issue?
 
jeff
 


Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types

2023-02-11 Thread juzhe.zhong
Thanks for contributing this.
Hi, Richard. Can you help us with this issue?
In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t 
(VNx2BImode), vbool64_t (VNx1BImode)
Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
According to RVV ISA, we adjust like:
VNx8BImode (8,8) NUNTTS
VNx8BImode (8,8) NUNTTS


juzhe.zh...@rivai.ai
 
From: incarnation.p.lee
Date: 2023-02-11 16:46
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
From: Pan Li 
 
Fix the bug for mode tieable of the rvv bool types. The vbool*_t
cannot be tied as the actually load/store size is determinated by
the vl. The mode size of rvv bool types are also adjusted for the
underlying optimization pass. The rvv bool type is vbool*_t, aka
vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
vbool64_t.
 
PR 108185
PR 108654
 
gcc/ChangeLog:
 
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-modes.def| 14 ++--
gcc/config/riscv/riscv.cc   | 34 -
gcc/config/riscv/riscv.h|  2 +
gcc/machmode.h  |  3 +
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
gcc/tree-ssa-sccvn.cc   | 13 +++-
13 files changed, 608 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
 
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..cc21d3c83a2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
ADJUST_ALIGNMENT (VNx32BI, 1);
ADJUST_ALIGNMENT (VNx64BI, 1);
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
/*
| Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b7804b7501..138c052e13c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
   return scale;
}
+/* Call from ADJUST_BYTESIZE in riscv-modes.def.  Return the correct
+   BYTES size for corresponding machine_mode.  */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+
+  if (riscv_v_ext_vector_mode_p (mode))
+{
+  poly_uint16 mode_size = GET_MODE_SIZE

Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types

2023-02-11 Thread juzhe.zhong
Thanks for contributing this.
Hi, Richard. Can you help us with this issue?
In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t 
(VNx2BImode), vbool64_t (VNx1BImode)
Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
According to RVV ISA, we adjust these modes as follows:

VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)

If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are 
different.
However, If we tried GET_MODE_SIZE of these modes, they are the same (poly 
(1,1)).
Such scenario make these tied together and gives the wrong code gen since their 
bitsize are different.
Consider the case as this:
#include "riscv_vector.h"
void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
{
  vint8m1_t v = *(vint8m1_t*)in;
  *(vint8m1_t*)out = v;  vbool16_t v4 = *(vbool16_t *)in;
  *(vbool16_t *)(out + 300) = v4;
  vbool8_t v3 = *(vbool8_t*)in;
  *(vbool8_t*)(out + 200) = v3;
}
The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT 
(vbool8_t) v4" in gimple.
We failed to fix it in RISC-V backend. Can you help us with this? Thanks.


juzhe.zh...@rivai.ai
 
From: incarnation.p.lee
Date: 2023-02-11 16:46
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
From: Pan Li 
 
Fix the bug for mode tieable of the rvv bool types. The vbool*_t
cannot be tied as the actually load/store size is determinated by
the vl. The mode size of rvv bool types are also adjusted for the
underlying optimization pass. The rvv bool type is vbool*_t, aka
vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
vbool64_t.
 
PR 108185
PR 108654
 
gcc/ChangeLog:
 
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-modes.def| 14 ++--
gcc/config/riscv/riscv.cc   | 34 -
gcc/config/riscv/riscv.h|  2 +
gcc/machmode.h  |  3 +
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
gcc/tree-ssa-sccvn.cc   | 13 +++-
13 files changed, 608 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
 
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..cc21d3c83a2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
ADJUST_ALIGNMENT (VNx32BI, 1);
ADJUST_ALIGNMENT (VNx64BI, 1);
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_

Re: [PATCH] RISC-V: Handle vlenb correctly in unwinding

2023-02-12 Thread juzhe.zhong
LGTM



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-02-12 19:33
To: gcc-patches; kito.cheng; jim.wilson.gcc; palmer; andrew; juzhe.zhong
CC: Kito Cheng
Subject: [PATCH] RISC-V: Handle vlenb correctly in unwinding
gcc/ChangeLog:
 
* config/riscv/riscv.h (RISCV_DWARF_VLENB): New.
(DWARF_FRAME_REGISTERS): New.
(DWARF_REG_TO_UNWIND_COLUMN): New.
 
libgcc/ChangeLog:
 
* config.host (riscv*-*-*): Add config/riscv/value-unwind.h.
* config/riscv/value-unwind.h: New.
---
gcc/config/riscv/riscv.h   |  7 ++
libgcc/config.host |  3 +++
libgcc/config/riscv/value-unwind.h | 39 ++
3 files changed, 49 insertions(+)
create mode 100644 libgcc/config/riscv/value-unwind.h
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 120faf17c06..5bc7f2f467d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1088,4 +1088,11 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
#define REGMODE_NATURAL_SIZE(MODE) riscv_regmode_natural_size (MODE)
+#define RISCV_DWARF_VLENB (4096 + 0xc22)
+
+#define DWARF_FRAME_REGISTERS (FIRST_PSEUDO_REGISTER + 1 /* VLENB */)
+
+#define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
+  ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
+
#endif /* ! GCC_RISCV_H */
diff --git a/libgcc/config.host b/libgcc/config.host
index 70d47e08e40..b9975de9023 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1559,6 +1559,9 @@ aarch64*-*-*)
# ILP32 needs an extra header for unwinding
tm_file="${tm_file} aarch64/value-unwind.h"
;;
+riscv*-*-*)
+ tm_file="${tm_file} riscv/value-unwind.h"
+ ;;
esac
# Setup to build a shared libgcc for VxWorks when that was requested,
diff --git a/libgcc/config/riscv/value-unwind.h 
b/libgcc/config/riscv/value-unwind.h
new file mode 100644
index 000..d7efdc14e6f
--- /dev/null
+++ b/libgcc/config/riscv/value-unwind.h
@@ -0,0 +1,39 @@
+/* Store register values as _Unwind_Word type in DWARF2 EH unwind context.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Return the value of the VLENB register.  This should only be
+   called if we know this is an vector extension enabled RISC-V host.  */
+static inline long
+riscv_vlenb (void)
+{
+  register long vlenb asm ("a0");
+  /* 0xc2202573 == csrr a0, 0xc22 */
+  asm (".insn 0xc2202573" : "=r"(vlenb));
+  return vlenb;
+}
+
+/* Lazily provide a value for VLENB, so that we don't try to execute RVV
+   instructions unless we know they're needed.  */
+#define DWARF_LAZY_REGISTER_VALUE(REGNO, VALUE) \
+  ((REGNO) == RISCV_DWARF_VLENB && ((*VALUE) = riscv_vlenb (), 1))
-- 
2.37.2
 
 


Re: [PATCH] RISC-V: Define __riscv_v_intrinsic [PR109312]

2023-03-28 Thread juzhe.zhong
LGTM。



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-03-28 22:26
To: gcc-patches; kito.cheng; jim.wilson.gcc; palmer; andrew; juzhe.zhong; 
jeffreyalaw
CC: Kito Cheng
Subject: [PATCH] RISC-V: Define __riscv_v_intrinsic [PR109312]
RVV intrinsic has defined a macro to identity the version of RVV
intrinsic spec, we missed that before, thanksful we are catch this
before release.
 
gcc/ChangeLog:
 
PR target/109312
* config/riscv/riscv-c.cc (riscv_ext_version_value): New.
(riscv_cpu_cpp_builtins): Define __riscv_v_intrinsic and
minor refactor.
 
gcc/testsuite/ChangeLog:
 
PR target/109312
* gcc.target/riscv/predef-__riscv_v_intrinsic.c: New test.
---
gcc/config/riscv/riscv-c.cc| 18 ++
.../riscv/predef-__riscv_v_intrinsic.c | 11 +++
2 files changed, 25 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index ff07d319d0b..6ad562dcb8b 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,6 +34,12 @@ along with GCC; see the file COPYING3.  If not see
#define builtin_define(TXT) cpp_define (pfile, TXT)
+static int
+riscv_ext_version_value (unsigned major, unsigned minor)
+{
+  return (major * 100) + (minor * 1000);
+}
+
/* Implement TARGET_CPU_CPP_BUILTINS.  */
void
@@ -118,7 +124,11 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 builtin_define_with_int_value ("__riscv_v_elen_fp", 0);
   if (TARGET_MIN_VLEN)
-builtin_define ("__riscv_vector");
+{
+  builtin_define ("__riscv_vector");
+  builtin_define_with_int_value ("__riscv_v_intrinsic",
+  riscv_ext_version_value (0, 11));
+}
   /* Define architecture extension test macros.  */
   builtin_define_with_int_value ("__riscv_arch_test", 1);
@@ -141,13 +151,13 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
subset != subset_list->end ();
subset = subset->next)
 {
-  int version_value = (subset->major_version * 100)
-+ (subset->minor_version * 1000);
+  int version_value = riscv_ext_version_value (subset->major_version,
+subset->minor_version);
   /* Special rule for zicsr and zifencei, it's used for ISA spec 2.2 or
earlier.  */
   if ((subset->name == "zicsr" || subset->name == "zifencei")
  && version_value == 0)
- version_value = 200;
+ version_value = riscv_ext_version_value (2, 0);
   sprintf (buf, "__riscv_%s", subset->name.c_str ());
   builtin_define_with_int_value (buf, version_value);
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
new file mode 100644
index 000..dbbedf54f87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imafdcv -mabi=lp64d" } */
+
+int main () {
+
+#if __riscv_v_intrinsic != 11000
+#error "__riscv_v_intrinsic"
+#endif
+
+  return 0;
+}
-- 
2.39.2
 
 


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-02 Thread juzhe.zhong
This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info 
backward fusion and propogation) which
is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction 
performance.

This patch is to supress the Phase 3 too aggressive backward fusion and 
propagation to the top of the function program
when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli 
instruction allows imm value instead of reg).

You may want to ask why we need Phase 3 to the job. 
Well, we have so many situations that pure LCM fails to optimize, here I can 
show you a simple case to demonstrate it:
void f (void * restrict in, void * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  for (size_t j = 0; j < m; j++){
if (cond) {
  for (size_t i = 0; i < n; i++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
  __riscv_vse8_v_i8mf8 (out + i, v, vl);
}
} else {
  for (size_t i = 0; i < n; i++)
{
  vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
  v = __riscv_vadd_vv_i32mf2 (v,v,vl);
  __riscv_vse32_v_i32mf2 (out + i, v, vl);
}
}
  }
}

You can see:
The first inner loop needs vsetvli e8 mf8 for vle+vse.
The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.

If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with :

outerloop:
...
vsetvli e8mf8
inner loop 1:


vsetvli e32mf2
inner loop 2:


However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of 
inner loop 2 into vsetvli e8 mf8, then we will end up with this result after 
phase 3:

outerloop:
...
inner loop 1:
vsetvli e32mf2


inner loop 2:
vsetvli e32mf2


Then, this demand information after phase 3 will be well optimized after phase 
4 (LCM), after Phase 4 result is:

vsetvli e32mf2
outerloop:
...
inner loop 1:


inner loop 2:


You can see this is the optimal codegen after current VSETVL PASS (Phase 3: 
Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue 
when I start to implement VSETVL PASS.
I leaved it to be fixed after I finished all target GCC 13 features. And Kito 
postpone this patch to be merged after GCC 14 is open.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-03 03:41
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR108279
 
 
On 3/27/23 00:59, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
>  PR 108270
> 
> Fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.
> 
> Consider the following testcase:
> void f (void * restrict in, void * restrict out, int l, int n, int m)
> {
>for (int i = 0; i < l; i++){
>  for (int j = 0; j < m; j++){
>for (int k = 0; k < n; k++)
>  {
>vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
>__riscv_vse8_v_i8mf8 (out + i + j, v, 17);
>  }
>  }
>}
> }
> 
> Compile option: -O3
> 
> Before this patch:
> mv a7,a2
> mv a6,a0 
>  mv t1,a1
> mv a2,a3
> vsetivli zero,17,e8,mf8,ta,ma
> ...
> 
> After this patch:
>  mv  a7,a2
>  mv  a6,a0
>  mv  t1,a1
>  mv  a2,a3
>  ble a7,zero,.L1
>  ble a4,zero,.L1
>  ble a3,zero,.L1
>  add a1,a0,a4
>  li  a0,0
>  vsetivlizero,17,e8,mf8,ta,ma
> ...
> 
> It will produce potential bug when:
> 
> int main ()
> {
>vsetivli zero, 100,.
>f (in, out, 0,0,0)
>asm volatile ("csrr a0,vl":::"memory");
> 
>// Before this patch the a0 is 17. (Wrong).
>// After this patch the a0 is 100. (Correct).
>...
> }
So why was that point selected in the first place?   I would have 
expected LCM to select the loop entry edge as the desired insertion point.
 
Essentially if LCM selects the point before those branches, then it's 
voilating a fundamental principal of LCM, namely that you never put an 
evaluation on a path where it didn't have one before.
 
So not objecting to the patch but it is raising concerns about the LCM 
results.
 
jeff
 


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-05 Thread juzhe.zhong
>> So fusion in this context is really about identifying cases where two
>> configuration settings are equivalent and you "fuse" them together.
>> Presumably this is only going to be possible when the vector insns are
>> just doing data movement rather than actual computations?

>> If my understanding is correct, I can kind of see why you're doing
>> fusion during phase 3.  My sense is there's a better way, but I'm having
>> a bit of trouble working out the details of what that should be to
>> myself.  In any event, revamping parts of the vsetvl insertion code
>> isn't the kind of thing we should be doing now.

The vsetvl demand fusion happens is not necessary "equivalent", instead, we
call it we will do demand fusion when they are "compatible".
And the fusion can happen between any vector insns including data movement
and actual computations.

What is "compatible" ??  This definition is according to RVV ISA.
For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.

According to RVV ISA:
vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
Such vsetvl instruction is configured as this demand fusion, we call it 
"compatible"
since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v

However, what case is not "incompatible", same example, if the vadd.vv demand 
SEW = 32. LMUL = MF2,
the vadd.vv is incompatible with vle.v. since we can't find a common VL/VTYPE 
vsetvl instruction available
for both of them.

We have local demand fusion which is Phase 1. Local demand fusion is doing the 
fusion within a block
And also we have global demand fusion which is Phase 3. Global demand fusion is 
doing across blocks.

After Phase 1, each block has a single demand fusion. Phase 3 is doing global 
demand fusion trying to
find the common VL/VTYPE status available for a bunch of blocks, and fuse them 
into a single vsetvl.
So that we eliminate redundant vsetvli.

Here is a example:
   
bb 0:  (vle.v demand RATIO = 32)
  /   \
bb 1  bb 2
  /  \ /   \
 bb 3   bb 4   bb 5
   vadd   vmul  vdiv
(demand  (demand  (demand 
 sew = 8,sew = 8,  sew = 8, 
lmul = mf4)  lmul = mf4,   lmul = mf4,
  tail policy = tu) mask policy = mu)

So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, bb5.
since they are compatible according to RVV ISA.
The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it 
in the bb0. Then we can avoid vsetvl in bb 3, 4, 5.

>> We have more fusion rules according to RVV ISA. Phase 3 (Global demand 
>> fusion) is 
>> really important. 

>> That would seem to indicate the function is poorly named.  Unless you're
>> using "empty" here to mean the state is valid or dirty.  Either way it
>> seems like the function name ought to be improved.

>> The comments talk about bb1 being inside a loop.  Nowhere do you check
>> that as far as I can tell.

>> When trying to understand what the patch is going I ran across this comment:

>>   /* The local_dem vector insn_info of the block.  */
 >>   vector_insn_info local_dem;


>> That comment really doesn't improve anything.  "local_dem" is clearly
>> short-hand for something (local demand?), whatever it is, make it
>> clearer in the comment.

Sorry for bad comments in the codes. Currently, I am working on the first patch
of auto-vectorization. After I sent the first patch of auto-vectorization for 
you to
review. I would like to re-check all the comments and code style of VSETVL PASS.
And refine them.




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-05 21:05
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR108279
 
 
On 4/2/23 16:40, juzhe.zh...@rivai.ai wrote:
> This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand 
> info backward fusion and propogation) which
> is I introduced into VSETVL PASS to enhance LCM && improve vsetvl 
> instruction performance.
So fusion in this context is really about identifying cases where two 
configuration settings are equivalent and you "fuse" them together. 
Presumably this is only going to be possible when the vector insns are 
just doing data movement rather than actual computations?
 
If my understanding is correct, I can kind of see why you're doing 
fusion during phase 3.  My sense i

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Since RVV has much more types than aarch64.
You can see rvv-intrinsic doc there are so many rvv intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
The rvv intrinsics explode.

For segment instructions, RVV has array type supporting NF from 2 ~ 8 for LMUL 
<= 1 (MF8,MF4,MF2,M1)
Wheras aarch64 only has array type with array size 2 ~ 4 only for a LMUL = 1(a 
whole vector).

I think, kito can explain more clearly about such issue.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 08:48, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  
> vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
> 
> PLUS the scalar machine modes that we already have in RISC-V port.
> 
> The total machine modes in RISC-V port > 256.
> 
> Current GCC can not allow us support RVV segment instructions tuple types.
> 
> So extend machine mode size from 8bit to 16bit.
> 
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
> 
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
> #endif
> 
> Not sure whether this solution is better?
> 
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
> tomorrow.
> 
> Expecting land in GCC-14, any suggestions ?
> 
> gcc/ChangeLog:
> 
>  * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
>  * cse.cc (struct qty_table_elem): Ditto.
>  (struct table_elt): Ditto.
>  (struct set): Ditto.
>  * genopinit.cc (main): Ditto.
>  * ira-int.h (struct ira_allocno): Ditto.
>  * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
>  * rtl-ssa/accesses.h: Ditto.
>  * rtl.h (struct GTY): Ditto.
>  (subreg_shape::unique_id): Ditto.
>  * rtlanal.h: Ditto.
>  * tree-core.h (struct tree_type_common): Ditto.
>  (struct tree_decl_common): Ditto.
This is likely going to be very controversial.  It's going to increase 
the size of two of most heavily used data structures in GCC (rtx and trees).
 
The first thing I would ask is whether or not we really need the full 
matrix in practice or if we can combine some of the modes.
 
Why hasn't aarch64 stumbled over this problem?
 
Jeff
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
As far as I known, they don't have tuple type for partial vector.
However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, 
vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t

But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, 
vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t

vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, 
vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t

etc

So many tuple types.  I saw there are redundant scalar mode in RISC-V port 
backend
like UQQmode, HQQmode, Not sure maybe we can reduce these scalar modes to
make total machine modes less than 256?


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 08:48, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  
> vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
> 
> PLUS the scalar machine modes that we already have in RISC-V port.
> 
> The total machine modes in RISC-V port > 256.
> 
> Current GCC can not allow us support RVV segment instructions tuple types.
> 
> So extend machine mode size from 8bit to 16bit.
> 
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
> 
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
> #endif
> 
> Not sure whether this solution is better?
> 
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
> tomorrow.
> 
> Expecting land in GCC-14, any suggestions ?
> 
> gcc/ChangeLog:
> 
>  * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
>  * cse.cc (struct qty_table_elem): Ditto.
>  (struct table_elt): Ditto.
>  (struct set): Ditto.
>  * genopinit.cc (main): Ditto.
>  * ira-int.h (struct ira_allocno): Ditto.
>  * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
>  * rtl-ssa/accesses.h: Ditto.
>  * rtl.h (struct GTY): Ditto.
>  (subreg_shape::unique_id): Ditto.
>  * rtlanal.h: Ditto.
>  * tree-core.h (struct tree_type_common): Ditto.
>  (struct tree_decl_common): Ditto.
This is likely going to be very controversial.  It's going to increase 
the size of two of most heavily used data structures in GCC (rtx and trees).
 
The first thing I would ask is whether or not we really need the full 
matrix in practice or if we can combine some of the modes.
 
Why hasn't aarch64 stumbled over this problem?
 
Jeff
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
api number explodes.

As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.


juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial.  It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
> 
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
> 
> Why hasn't aarch64 stumbled over this problem?
 
From what I can see, x86 has 130 modes and aarch64 178 right now.
 
Jakub
 
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
I saw many redundant scalar modes:

 E_CDImode,   /* machmode.def:267 */
#define HAVE_CDImode
#ifdef USE_ENUM_MODES
#define CDImode E_CDImode
#else
#define CDImode (complex_mode ((complex_mode::from_int) E_CDImode))
#endif
  E_CTImode,   /* machmode.def:267 */
#define HAVE_CTImode
#ifdef USE_ENUM_MODES
#define CTImode E_CTImode
#else
#define CTImode (complex_mode ((complex_mode::from_int) E_CTImode))
#endif
  E_HCmode,/* machmode.def:269 */
#define HAVE_HCmode
#ifdef USE_ENUM_MODES
#define HCmode E_HCmode
#else
#define HCmode (complex_mode ((complex_mode::from_int) E_HCmode))
#endif
  E_SCmode,/* machmode.def:269 */
#define HAVE_SCmode
#ifdef USE_ENUM_MODES
#define SCmode E_SCmode
#else
#define SCmode (complex_mode ((complex_mode::from_int) E_SCmode))
#endif
  E_DCmode,/* machmode.def:269 */
#define HAVE_DCmode
#ifdef USE_ENUM_MODES
#define DCmode E_DCmode
#else
#define DCmode (complex_mode ((complex_mode::from_int) E_DCmode))
#endif
  E_TCmode,/* machmode.def:269 */
#define HAVE_TCmode
#ifdef USE_ENUM_MODES
#define TCmode E_TCmode
#else
#define TCmode (complex_mode ((complex_mode::from_int) E_TCmode))
#endif
...

These scalar modes are redundant I think, can we forbid them?
There are 40+ scalar modes that are not used.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-04-10 23:22
To: jakub; Jeff Law
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
api number explodes.

As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.


juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial.  It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
> 
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
> 
> Why hasn't aarch64 stumbled over this problem?
 
From what I can see, x86 has 130 modes and aarch64 178 right now.
 
Jakub
 
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
I don't know, maybe we can try to ask rvv-intrinsic-doc define so many tuple 
types and try to 
make them reduce the api && tuple types?

I am going to remove all FP16 vector to see whether we can reduce machine modes 
<= 256.
I think it may be probably helping to fix that.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-11 04:36
To: Jakub Jelinek
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 09:18, Jakub Jelinek wrote:
> On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
>> This is likely going to be very controversial.  It's going to increase the
>> size of two of most heavily used data structures in GCC (rtx and trees).
>>
>> The first thing I would ask is whether or not we really need the full matrix
>> in practice or if we can combine some of the modes.
>>
>> Why hasn't aarch64 stumbled over this problem?
> 
>  From what I can see, x86 has 130 modes and aarch64 178 right now.
To put it another way.  Why does RISC-V have so many more modes than 
AArch64.
 
Jeff
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Another feasible solution: Maybe we can drop supporting segment intrinsics
in upstream GCC. 
We let the downstream companies support segment in their own downstream GCC ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-11 04:42
To: juzhe.zhong; jakub
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 09:22, juzhe.zh...@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
>  
> <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md>
> api number explodes.
> 
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have.  I really expect efforts to 
have > 256 modes are going to be very controversial.
 
jeff
 
 
 


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread juzhe.zhong
I don't want to seperate VSETVL PASS into 2 seperate PASS.
I want make everything cleaner.

Another example is VSETVL PASS can do the branch prediction:
https://godbolt.org/z/K44r98E5v 
In function "f", you can see we put the hoist vsetvl from a more likely block 
(i !=cond) outside the loop,
then eliminate the vsetvl of this block. (Branch prediction is not that perfect 
in VSETVL PASS, I plan to 
optimize more when GCC 14 is open).

"f2" function is the normal fuse that we do in Phase 3.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-12 05:14
To: Richard Biener; juzhe.zhong
CC: gcc-patches; kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR108279
 
 
On 4/11/23 02:55, Richard Biener wrote:
 
> 
> Just to throw in a comment here - I think you should present LCM
> with something it can identify as the same for compatible vsetvl and
> then it should just work?  OTOH if "compatible" is not transitive
> that's not possible (but then I can't quickly make up an example
> where it wouldn't be).
I'm not sure it's that simple.  Or at least not with a single iteration 
of LCM.
 
One problem is that kills may affecting one setting, but not the other. 
I couldn't mentally come up with a single pass LCM to handle the case 
Juzhe was handling.  ie, you may have two compatible settings where you 
can unify them and hoist the compatible setting to a less executed 
point.  But the transp set for one of two compatible settings may be 
different for the other compatible setting because of vector 
instructions in a block.
 
What was starting to form was a two pass approach.  One which worked 
with individual vsetvl settings, another which worked on unified vsetvl 
settings.  It wasn't clear to me which ordering would be better, but I 
didn't work through the likely scenarios -- it was clear this wasn't the 
time to introduce that kind of conceptual change.
 
jeff
 
 
 
 


Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread juzhe.zhong--- via Gcc-patches
Thanks Richard.

I have addressed all comments on V7 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624220.html 

Drop vlse/vsse codegen optimization in RISC-V backend, instead I will support 
LEN_MASK_STRIDED_LOAD/LEN_MASK_STRIDE_STORE
in the future.

Thanks. 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 17:33
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
Richard Biener  writes:
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>
>> Thanks Richard.
>> 
>> Is it correct that the better way is to add optabs 
>> (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
>> len_strided_load/len_strided_store optab (if it is strided load/store) in
>> expand_gather_load_optab_fn 
>> expand_scatter_store_optab_fn
>> 
>> of internal-fn.cc
>> 
>> Am I right? Thanks.
>
> Yes.
>
> In priciple the vectorizer can also directly take advantage of this
> and code generate an internal .LEN_STRIDED_LOAD ifn.
 
Yeah, in particular, having a strided load should relax some
of the restrictions around the relationship of the vector offset
type to the loaded/stored data.  E.g. a "gather" of N bytes with a
64-bit stride would in principle be possible without needing an
Nx64-bit vector offset type.
 
Richard