Ok.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/thread.html 
I have add comments as you suggested.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-13 07:21
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc; pan2.li
Subject: Re: [PATCH V2] RISC-V: Support RVV VLA SLP auto-vectorization
 
 
On 6/6/23 21:19, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zh...@rivai.ai>
> 
> This patch enables basic VLA SLP auto-vectorization.
> Consider this following case:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
>    for (int i = 0; i < 100; ++i)
>      {
>        a[i * 8 + 0] = b[i * 8 + 7] + 1;
>        a[i * 8 + 1] = b[i * 8 + 7] + 2;
>        a[i * 8 + 2] = b[i * 8 + 7] + 8;
>        a[i * 8 + 3] = b[i * 8 + 7] + 4;
>        a[i * 8 + 4] = b[i * 8 + 7] + 5;
>        a[i * 8 + 5] = b[i * 8 + 7] + 6;
>        a[i * 8 + 6] = b[i * 8 + 7] + 7;
>        a[i * 8 + 7] = b[i * 8 + 7] + 3;
>      }
> }
> 
> To enable VLA SLP auto-vectorization, we should be able to handle this 
> following const vector:
> 
> 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
> { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
> 16, ... }
> 
> 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
> { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> 
> And these vector can be generated at prologue.
> 
> After this patch, we end up with this following codegen:
> 
> Prologue:
> ...
>          vsetvli a7,zero,e16,m2,ta,ma
>          vid.v   v4
>          vsrl.vi v4,v4,3
>          li      a3,8
>          vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 
> 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
> ...
>          li      t1,67633152
>          addi    t1,t1,513
>          li      a3,50790400
>          addi    a3,a3,1541
>          slli    a3,a3,32
>          add     a3,a3,t1
>          vsetvli t1,zero,e64,m1,ta,ma
>          vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> ...
> LoopBody:
> ...
>          min     a3,...
>          vsetvli zero,a3,e8,m1,ta,ma
>          vle8.v  v2,0(a6)
>          vsetvli a7,zero,e8,m1,ta,ma
>          vrgatherei16.vv v1,v2,v4
>          vadd.vv v1,v1,v3
>          vsetvli zero,a3,e8,m1,ta,ma
>          vse8.v  v1,0(a2)
>          add     a6,a6,a4
>          add     a2,a2,a4
>          mv      a3,a5
>          add     a5,a5,t1
>          bgtu    a3,a4,.L3
> ...
> 
> Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
> since "vrgatherei16.vv" can cover larger
>        range than "vrgather.vv" (which only can maximum element index = 255).
> Epilogue:
>          lbu     a5,799(a1)
>          addiw   a4,a5,1
>          sb      a4,792(a0)
>          addiw   a4,a5,2
>          sb      a4,793(a0)
>          addiw   a4,a5,8
>          sb      a4,794(a0)
>          addiw   a4,a5,4
>          sb      a4,795(a0)
>          addiw   a4,a5,5
>          sb      a4,796(a0)
>          addiw   a4,a5,6
>          sb      a4,797(a0)
>          addiw   a4,a5,7
>          sb      a4,798(a0)
>          addiw   a5,a5,3
>          sb      a5,799(a0)
>          ret
> 
> There is one more last thing we need to do is the "Epilogue 
> auto-vectorization" which needs VLS modes support.
> I will support VLS modes for "Epilogue auto-vectorization" in the future.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
>          * config/riscv/riscv-v.cc 
> (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
>          (rvv_builder::single_step_npatterns_p): New function.
>          (rvv_builder::npatterns_all_equal_p): Ditto.
>          (const_vec_all_in_range_p): Support POLY handling.
>          (gen_const_vector_dup): Ditto.
>          (emit_vlmax_gather_insn): Add vrgatherei16.
>          (emit_vlmax_masked_gather_mu_insn): Ditto.
>          (expand_const_vector): Add VLA SLP const vector support.
>          (expand_vec_perm): Support POLY.
>          (struct expand_vec_perm_d): New struct.
>          (shuffle_generic_patterns): New function.
>          (expand_vec_perm_const_1): Ditto.
>          (expand_vec_perm_const): Ditto.
>          * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
>          (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
> vectorizer.
>          * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
>          * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
> 
 
 
 
 
> +}
> +
> +/* Return true if all elements of NPATTERNS are equal.
> +
> +   E.g. NPATTERNS = 4:
> +     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
> +   E.g. NPATTERNS = 8:
> +     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
> +*/
> +bool
> +rvv_builder::npatterns_all_equal_p () const
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 1; i < npatterns (); i++)
> +    {
> +      poly_int64 ele = rtx_to_poly_int64 (elt (i));
> +      if (!known_eq (ele, ele0))
> + return false;
> +    }
> +  return true;
> +}
There seems to be a disconnect here.  You only seem to check the first 
NPATTERN elements.  Don't you need to check the rest?   Or am I just 
getting confused by the function comment?
 
 
 
 
> +
> +static bool
> +expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
Needs a function comment.
 
 
>
> +
> +bool
> +expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
> +        rtx op0, rtx op1, const vec_perm_indices &sel)
Similarly.
 
 
Overall it looks really good.  Just a couple comments to fix and sort 
out whether or not I'm misinterpreting rvv_builder::npatterns_all_equal_p.
 
Jeff
 

Reply via email to