On 6/6/23 21:19, juzhe.zh...@rivai.ai wrote:
From: Juzhe-Zhong <juzhe.zh...@rivai.ai>

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
   for (int i = 0; i < 100; ++i)
     {
       a[i * 8 + 0] = b[i * 8 + 7] + 1;
       a[i * 8 + 1] = b[i * 8 + 7] + 2;
       a[i * 8 + 2] = b[i * 8 + 7] + 8;
       a[i * 8 + 3] = b[i * 8 + 7] + 4;
       a[i * 8 + 4] = b[i * 8 + 7] + 5;
       a[i * 8 + 5] = b[i * 8 + 7] + 6;
       a[i * 8 + 6] = b[i * 8 + 7] + 7;
       a[i * 8 + 7] = b[i * 8 + 7] + 3;
     }
}

To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
         vsetvli a7,zero,e16,m2,ta,ma
         vid.v   v4
         vsrl.vi v4,v4,3
         li      a3,8
         vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 
8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
         li      t1,67633152
         addi    t1,t1,513
         li      a3,50790400
         addi    a3,a3,1541
         slli    a3,a3,32
         add     a3,a3,t1
         vsetvli t1,zero,e64,m1,ta,ma
         vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
         min     a3,...
         vsetvli zero,a3,e8,m1,ta,ma
         vle8.v  v2,0(a6)
         vsetvli a7,zero,e8,m1,ta,ma
         vrgatherei16.vv v1,v2,v4
         vadd.vv v1,v1,v3
         vsetvli zero,a3,e8,m1,ta,ma
         vse8.v  v1,0(a2)
         add     a6,a6,a4
         add     a2,a2,a4
         mv      a3,a5
         add     a5,a5,t1
         bgtu    a3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since 
"vrgatherei16.vv" can cover larger
       range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
         lbu     a5,799(a1)
         addiw   a4,a5,1
         sb      a4,792(a0)
         addiw   a4,a5,2
         sb      a4,793(a0)
         addiw   a4,a5,8
         sb      a4,794(a0)
         addiw   a4,a5,4
         sb      a4,795(a0)
         addiw   a4,a5,5
         sb      a4,796(a0)
         addiw   a4,a5,6
         sb      a4,797(a0)
         addiw   a4,a5,7
         sb      a4,798(a0)
         addiw   a5,a5,3
         sb      a5,799(a0)
         ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

         * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
         * config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
         (rvv_builder::single_step_npatterns_p): New function.
         (rvv_builder::npatterns_all_equal_p): Ditto.
         (const_vec_all_in_range_p): Support POLY handling.
         (gen_const_vector_dup): Ditto.
         (emit_vlmax_gather_insn): Add vrgatherei16.
         (emit_vlmax_masked_gather_mu_insn): Ditto.
         (expand_const_vector): Add VLA SLP const vector support.
         (expand_vec_perm): Support POLY.
         (struct expand_vec_perm_d): New struct.
         (shuffle_generic_patterns): New function.
         (expand_vec_perm_const_1): Ditto.
         (expand_vec_perm_const): Ditto.
         * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
         (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

         * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
         * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
         * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
         * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.





+}
+
+/* Return true if all elements of NPATTERNS are equal.
+
+   E.g. NPATTERNS = 4:
+     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
+   E.g. NPATTERNS = 8:
+     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
+*/
+bool
+rvv_builder::npatterns_all_equal_p () const
+{
+  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 1; i < npatterns (); i++)
+    {
+      poly_int64 ele = rtx_to_poly_int64 (elt (i));
+      if (!known_eq (ele, ele0))
+       return false;
+    }
+  return true;
+}
There seems to be a disconnect here. You only seem to check the first NPATTERN elements. Don't you need to check the rest? Or am I just getting confused by the function comment?




+
+static bool
+expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
Needs a function comment.



+
+bool
+expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
+                      rtx op0, rtx op1, const vec_perm_indices &sel)
Similarly.


Overall it looks really good. Just a couple comments to fix and sort out whether or not I'm misinterpreting rvv_builder::npatterns_all_equal_p.

Jeff

Reply via email to