[PATCH] RISC-V: Remove @ of vec_series

2023-10-04 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/autovec.md (@vec_series): Remove @.
(vec_series): Ditto.
* config/riscv/riscv-v.cc (expand_const_vector): Ditto.
(shuffle_decompress_patterns): Ditto.

---
 gcc/config/riscv/autovec.md | 2 +-
 gcc/config/riscv/riscv-v.cc | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d6cf376ebca..056f2c352f6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -336,7 +336,7 @@
 ;; - vadd.vx/vadd.vi
 ;; -
 
-(define_expand "@vec_series"
+(define_expand "vec_series"
   [(match_operand:V_VLSI 0 "register_operand")
(match_operand: 1 "reg_or_int_operand")
(match_operand: 2 "reg_or_int_operand")]
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 29e138e1da2..23633a2a74d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1014,7 +1014,7 @@ expand_const_vector (rtx target, rtx src)
   rtx base, step;
   if (const_vec_series_p (src, &base, &step))
 {
-  emit_insn (gen_vec_series (mode, target, base, step));
+  expand_vec_series (target, base, step);
   return;
 }
 
@@ -1171,7 +1171,7 @@ expand_const_vector (rtx target, rtx src)
  rtx step = CONST_VECTOR_ELT (src, 2);
  /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
  rtx tmp = gen_reg_rtx (mode);
- emit_insn (gen_vec_series (mode, tmp, base1, step));
+ expand_vec_series (tmp, base1, step);
  /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
  scalar_mode elem_mode = GET_MODE_INNER (mode);
  if (!rtx_equal_p (base0, const0_rtx))
@@ -3020,7 +3020,7 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   /* Generate { 0, 1,  } mask.  */
   rtx vid = gen_reg_rtx (sel_mode);
   rtx vid_repeat = gen_reg_rtx (sel_mode);
-  emit_insn (gen_vec_series (sel_mode, vid, const0_rtx, const1_rtx));
+  expand_vec_series (vid, const0_rtx, const1_rtx);
   rtx and_ops[] = {vid_repeat, vid, const1_rtx};
   emit_vlmax_insn (code_for_pred_scalar (AND, sel_mode), BINARY_OP, and_ops);
   rtx const_vec = gen_const_vector_dup (sel_mode, 1);
-- 
2.36.3



[PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Juzhe-Zhong
This patch enables almost full coverage vectorization tests for RVV, except 
these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-

[PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Juzhe-Zhong
Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump 
vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-

[PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  6 --
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 12 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 12 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 12 
 4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..7bddc122037 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,7 @@ neg_xi (double *x)
   return res_3;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_LEN_ADD} "vect" { target { 
vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_LEN_SUB} "optimized" { target { 
vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe6

[PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..3832a660023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,5 @@ neg_xi (double *x)
   return res_3;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe642a0..5bb75206a68 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -52,8 +52,8 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "opt

[PATCH] RISC-V: Support movmisalign of RVV VLA modes

2023-10-08 Thread Juzhe-Zhong
Previously, I removed the movmisalign pattern to fix the execution FAILs in 
this commit:
https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520

I was thinking that RVV doesn't allow misaligned at the beginning so I removed 
that pattern.
However, after deep investigation && reading RVV ISA again and experiment on 
SPIKE,
I realized I was wrong.

RVV ISA reference: 
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints

"If an element accessed by a vector memory instruction is not naturally aligned 
to the size of the element, 
 either the element is transferred successfully or an address misaligned 
exception is raised on that element."

It's obvious that RVV ISA does allow misaligned vector load/store.

And experiment and confirm on SPIKE:

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
 --isa=rv64gcv --varch=vlen:128,elen:64 
~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
  a.out
bbl loader
z   ra 00010158 sp 003ffb40 gp 00012c48
tp  t0 000110da t1 000f t2 
s0 00013460 s1  a0 00012ef5 a1 00012018
a2 00012a71 a3 000d a4 0004 a5 00012a71
a6 00012a71 a7 00012018 s2  s3 
s4  s5  s6  s7 
s8  s9  sA  sB 
t3  t4  t5  t6 
pc 00010258 va/inst 020660a7 sr 80026620
Store/AMO access fault!

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
 --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 
~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
  a.out
bbl loader

We can see SPIKE can pass previous *FAILED* execution tests with specifying 
--misaligned to SPIKE.

So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the 
investigations I have done since
it can improve multiple vectorization tests and fix dumple FAILs.

This patch fixes these following dump FAILs:

FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid 
sum"

Consider this following case:

struct s {
unsigned i : 31;
char a : 4;
};

#define N 32
#define ELT0 {0x7FFFUL, 0}
#define ELT1 {0x7FFFUL, 1}
#define ELT2 {0x7FFFUL, 2}
#define ELT3 {0x7FFFUL, 3}
#define RES 48
struct s A[N]
  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};

int __attribute__ ((noipa))
f(struct s *ptr, unsigned n) {
int res = 0;
for (int i = 0; i < n; ++i)
  res += ptr[i].a;
return res;
}

-O3 -S -fno-vect-cost-model (default strict-align):

f:
mv  a4,a0
beq a1,zero,.L9
addiw   a5,a1,-1
li  a3,14
vsetivlizero,16,e64,m8,ta,ma
bleua5,a3,.L3
andia5,a0,127
bne a5,zero,.L3
srliw   a3,a1,4
sllia3,a3,7
li  a0,15
sllia0,a0,32
add a3,a3,a4
mv  a5,a4
li  a2,32
vmv.v.x v16,a0
vsetvli zero,zero,e32,m4,ta,ma
vmv.v.i v4,0
.L4:
vsetvli zero,zero,e64,m8,ta,ma
vle64.v v8,0(a5)
addia5,a5,128
vand.vv v8,v8,v16
vsetvli zero,zero,e32,m4,ta,ma
vnsrl.wxv8,v8,a2
vadd.vv v4,v4,v8
bne a5,a3,.L4
li  a3,0
andia5,a1,15
vmv.s.x v1,a3
andia3,a1,-16
vredsum.vs  v1,v4,v1
vmv.x.s a0,v1
mv  a2,a0
beq a5,zero,.L15
sll

[PATCH] TEST: Fix dump FAIL for RVV (RISCV-V vector)

2023-10-08 Thread Juzhe-Zhong
As this showed: https://godbolt.org/z/3K9oK7fx3

ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.

This is because RISC-V doesn't enable vec_pack_trunc so we will failed 
conversion and fold_extract_last at the first time analysis.
Then we succeed at the second time.

So RVV has 4 times of showing "FOLD_EXTRACT_LAST:.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant.

---
 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
index 8820075b1dc..8ea8c538713 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
@@ -42,6 +42,7 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
-/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
+/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && 
vect_pack_trunc } } } } */
+/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 4 "vect" { target { { vect_fold_extract_last } && { ! 
vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
 
-- 
2.36.3



[PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-08 Thread Juzhe-Zhong
RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this 
case well.
So, adjust dump check for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.

---
 gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c 
b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
index a61f1a9a221..829a4d41601 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
@@ -35,6 +35,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { 
! vect_unpack } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_unpack || riscv_v } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { 
{ ! vect_unpack } && { ! riscv_v } } } } } */
 
-- 
2.36.3



[PATCH] TEST: Fix dump FAIL for RVV

2023-10-08 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-cond-1.c: Fix dump FAIL for RVV.
* gcc.dg/vect/pr57705.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++--
 gcc/testsuite/gcc.dg/vect/pr57705.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
index c8024429e9c..e1ebc23505f 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
@@ -47,6 +47,6 @@ int main ()
 }
 
 /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 
1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target 
vect_element_align } } } */
-/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { 
vect_element_align && { ! amdgcn-*-* } } } } } */
-/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target 
amdgcn-*-* } } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { 
vect_element_align && { { ! amdgcn-*-* } && { ! riscv_v } } } } } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target { 
amdgcn-*-* || riscv_v } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c 
b/gcc/testsuite/gcc.dg/vect/pr57705.c
index 39c32946d74..2dacea0a7a7 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57705.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57705.c
@@ -64,5 +64,5 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target 
vect_pack_trunc } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { ! 
vect_pack_trunc } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target { 
vect_pack_trunc || riscv_v } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { { 
! vect_pack_trunc } && { ! riscv_v } } } } } */
-- 
2.36.3



[PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV

2023-10-08 Thread Juzhe-Zhong
Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop 
vectorizations.

Fix these following XPASS FAILs:

XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV.
* gcc.dg/vect/no-scevccp-outer-17.c: Ditto.
* gcc.dg/vect/no-scevccp-outer-19.c: Ditto.
* gcc.dg/vect/no-scevccp-outer-21.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
index c7c2fa8a504..12179949e00 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
@@ -59,4 +59,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! {vect_unpack } } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
index ba904a6c03e..86554a98169 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! {vect_unpack } } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
index 5cd4049d08c..624b54accf4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
@@ -49,4 +49,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! {vect_unpack } } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
index 72e53c2bfb0..b30a5d78819 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
@@ -59,4 +59,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! { vect_pack_trunc } } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */
-- 
2.36.3



[PATCH V2] RISC-V: Support movmisalign of RVV VLA modes

2023-10-09 Thread Juzhe-Zhong
This patch fixed these following FAILs in regressions:
FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts 
using SLP" 1
FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid 
sum"

Previously, I removed the movmisalign pattern to fix the execution FAILs in 
this commit:
https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520

I was thinking that RVV doesn't allow misaligned at the beginning so I removed 
that pattern.
However, after deep investigation && reading RVV ISA again and experiment on 
SPIKE,
I realized I was wrong.

RVV ISA reference: 
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints

"If an element accessed by a vector memory instruction is not naturally aligned 
to the size of the element, 
 either the element is transferred successfully or an address misaligned 
exception is raised on that element."

It's obvious that RVV ISA does allow misaligned vector load/store.

And experiment and confirm on SPIKE:

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
 --isa=rv64gcv --varch=vlen:128,elen:64 
~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
  a.out
bbl loader
z   ra 00010158 sp 003ffb40 gp 00012c48
tp  t0 000110da t1 000f t2 
s0 00013460 s1  a0 00012ef5 a1 00012018
a2 00012a71 a3 000d a4 0004 a5 00012a71
a6 00012a71 a7 00012018 s2  s3 
s4  s5  s6  s7 
s8  s9  sA  sB 
t3  t4  t5  t6 
pc 00010258 va/inst 020660a7 sr 80026620
Store/AMO access fault!

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
 --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 
~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
  a.out
bbl loader

We can see SPIKE can pass previous *FAILED* execution tests with specifying 
--misaligned to SPIKE.

So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the 
investigations I have done since
it can improve multiple vectorization tests and fix dumple FAILs.

This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support 
misalign pattern for VLA modes (By default it is enabled).

Consider this following case:

struct s {
unsigned i : 31;
char a : 4;
};

#define N 32
#define ELT0 {0x7FFFUL, 0}
#define ELT1 {0x7FFFUL, 1}
#define ELT2 {0x7FFFUL, 2}
#define ELT3 {0x7FFFUL, 3}
#define RES 48
struct s A[N]
  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};

int __attribute__ ((noipa))
f(struct s *ptr, unsigned n) {
int res = 0;
for (int i = 0; i < n; ++i)
  res += ptr[i].a;
return res;
}

-O3 -S -fno-vect-cost-model (default strict-align):

f:
mv  a4,a0
beq a1,zero,.L9
addiw   a5,a1,-1
li  a3,14
vsetivlizero,16,e64,m8,ta,ma
bleua5,a3,.L3
andia5,a0,127
bne a5,zero,.L3
srliw   a3,a1,4
sllia3,a3,7
li  a0,15
sllia0,a0,32
add a3,a3,a4
mv  a5,a4
li  a2,32
vmv.v.x v16,a0
vsetvli zero,zero,e32,m4,ta,ma
vmv.v.i v4,0
.L4:
vsetvli zero,zero,e64,m8,ta,ma
vle64.v v8,0(a5)
addia5,a5,128
 

[PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

2023-10-09 Thread Juzhe-Zhong
Reference: https://godbolt.org/z/G9jzf5Grh

RVV is able to vectorize this case using SLP. However, with 
-fno-vect-cost-model,
RVV vectorize it by vec_load_lanes with stride 6.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.

---
 gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c 
b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
index 7c7acd5bab6..96751faae7f 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
@@ -18,4 +18,4 @@ foo (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { ! vect_strided6 } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression test: Fix FAIL of pr45752.c for RVV

2023-10-09 Thread Juzhe-Zhong
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model
instead of SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr45752.c: Adapt dump check for target supports 
load_lanes with stride = 5.

---
 gcc/testsuite/gcc.dg/vect/pr45752.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c 
b/gcc/testsuite/gcc.dg/vect/pr45752.c
index e8b364f29eb..3c87d9b04fc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr45752.c
+++ b/gcc/testsuite/gcc.dg/vect/pr45752.c
@@ -159,4 +159,4 @@ int main (int argc, const char* argv[])
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "gaps requires scalar epilogue loop" 0 
"vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
{target { ! { vect_load_lanes && vect_strided5 } } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression tests: Fix FAIL of pr97832* for RVV

2023-10-09 Thread Juzhe-Zhong
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports 
load_lanes with stride = 8.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/pr97832-2.c | 4 ++--
 gcc/testsuite/gcc.dg/vect/pr97832-3.c | 4 ++--
 gcc/testsuite/gcc.dg/vect/pr97832-4.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-2.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
index 4f0578120ee..7d8d2691432 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
@@ -25,5 +25,5 @@ void foo1x1(double* restrict y, const double* restrict x, int 
clen)
   }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-3.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-3.c
index ad1225ddbaa..c0603e1432e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-3.c
@@ -46,5 +46,5 @@ void foo(double* restrict y, const double* restrict x0, const 
double* restrict x
   }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-4.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
index 74ae27ff873..c03442816a4 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-4.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
@@ -24,5 +24,5 @@ void foo1x1(double* restrict y, const double* restrict x, int 
clen)
   }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression test: Fix FAIL of slp-12a.c

2023-10-09 Thread Juzhe-Zhong
This case is vectorized by stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.

---
 gcc/testsuite/gcc.dg/vect/slp-12a.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c 
b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index f0dda55acae..973de6ada21 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -76,5 +76,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_strided8 && vect_int_mult } } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression test: Adapt SLP tests like ARM SVE

2023-10-09 Thread Juzhe-Zhong
Like ARM SVE, RVV is vectorizing these 2 cases in the same way.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
* gcc.dg/vect/slp-perm-10.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/slp-23.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-10.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-23.c 
b/gcc/testsuite/gcc.dg/vect/slp-23.c
index d32ee5ba73b..8836acf0330 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-23.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-23.c
@@ -114,5 +114,5 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { ! vect_perm } } } } */
 /* SLP fails for the second loop with variable-length SVE because
the load size is greater than the minimum vector size.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target vect_perm xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target vect_perm xfail { { aarch64_sve || riscv_v } && vect_variable_length } } 
} } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
index 2cce30c2444..03de4c61b50 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
@@ -53,4 +53,4 @@ int main ()
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_perm } } } */
 /* SLP fails for variable-length SVE because the load size is greater
than the minimum vector size.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_perm xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_perm xfail { { aarch64_sve || riscv_v } && vect_variable_length } } 
} } */
-- 
2.36.3



[PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

2023-10-09 Thread Juzhe-Zhong
RVV vectorize it with stride5 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.

---
 gcc/testsuite/gcc.dg/vect/slp-perm-4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
index 107968f1f7c..f4bda39c837 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
@@ -115,4 +115,4 @@ int main (int argc, const char* argv[])
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "gaps requires scalar epilogue loop" 0 
"vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { ! { vect_load_lanes && vect_strided5 } } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV

2023-10-09 Thread Juzhe-Zhong
RVV vectortizes this case with stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.

---
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
index 15f5c259e98..e2fe01bb13d 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
@@ -60,6 +60,6 @@ int main (void)
 /* For variable-length SVE, the number of scalar statements in the
reduction exceeds the number of elements in a 128-bit granule.  */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { ! vect_multiple_sizes } xfail { vect_no_int_min_max || { aarch64_sve 
&& vect_variable_length } } } } } */
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_multiple_sizes } } } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_multiple_sizes && { ! { vect_load_lanes && vect_strided8 } } } } } } */
 /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
aarch64_sve && vect_variable_length } } } } */
 
-- 
2.36.3



[PATCH] TEST: Add vectorization check

2023-10-09 Thread Juzhe-Zhong
These cases won't check SLP for load_lanes support target.

Add vectorization check for situations.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97832-2.c: Add vectorization check.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/pr97832-2.c | 1 +
 gcc/testsuite/gcc.dg/vect/pr97832-3.c | 1 +
 gcc/testsuite/gcc.dg/vect/pr97832-4.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-2.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
index 7d8d2691432..60e8e8516fc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
@@ -27,3 +27,4 @@ void foo1x1(double* restrict y, const double* restrict x, int 
clen)
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
 /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-3.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-3.c
index c0603e1432e..2dc76e5b565 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-3.c
@@ -48,3 +48,4 @@ void foo(double* restrict y, const double* restrict x0, const 
double* restrict x
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
 /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-4.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
index c03442816a4..7e74c9313d5 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97832-4.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
@@ -26,3 +26,4 @@ void foo1x1(double* restrict y, const double* restrict x, int 
clen)
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
! { vect_load_lanes && vect_strided8 } } } } } */
 /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target 
{ ! { vect_load_lanes && vect_strided8 } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-- 
2.36.3



[PATCH] RISC-V: Add available vector size for RVV

2023-10-09 Thread Juzhe-Zhong
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
from M1 to M8.

For example, when TARGET_MIN_VLEN = 128 bits, we enable
128/256/512/1024 bits VLS modes.

This patch fixes following FAIL:
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: 
basic block" 2

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add 256/512/1024

---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index af52c38433d..dc366d35a0a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8881,7 +8881,7 @@ proc available_vector_sizes { } {
lappend result 4096 2048 1024 512 256 128 64 32 16 8 4 2
 } elseif { [istarget riscv*-*-*] } {
if { [check_effective_target_riscv_v] } {
-   lappend result 0 32 64 128
+   lappend result 0 32 64 128 256 512 1024
}
lappend result 128
 } else {
-- 
2.36.3



[PATCH] RISC-V Regression: Fix dump check of bb-slp-68.c

2023-10-09 Thread Juzhe-Zhong
Like GCN, RVV also has 64 bytes vectors (512 bits) which cause FAIL in this 
test.

It's more reasonable to use "vect512" instead of AMDGCN.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-68.c: Use vect512.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
index e7573a14933..2dd3d8ee90c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
@@ -20,4 +20,4 @@ void foo ()
 
 /* We want to have the store group split into 4, 2, 4 when using 32byte 
vectors.
Unfortunately it does not work when 64-byte vectors are available.  */
-/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } 
} } */
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail vect512 } } } 
*/
-- 
2.36.3



[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-09 Thread Juzhe-Zhong
Here is the reference comparing dump IR between ARM SVE and RVV.

https://godbolt.org/z/zqess8Gss

We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.

The codegen is reasonable good.

However, I saw GCN also has 1024 bit vector.
This patch may cause this case FAIL in GCN port ?

Hi, GCN folk, could you check this patch in GCN port for me ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
* lib/target-supports.exp: Ditto.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c | 3 ++-
 gcc/testsuite/lib/target-supports.exp  | 6 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 8df35327e7a..9ef1330b47c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -67,7 +67,8 @@ int main()
 
 /* We should also be able to use 2-lane SLP to initialize the real and
imaginary components in the first loop of main.  */
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" { 
target {! { vect1024 } } } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 11 "slp1" { 
target { { vect1024 } } } } } */
 /* We should see the s->phase[dir] operand splatted and no other operand built
from scalars.  See PR97334.  */
 /* { dg-final { scan-tree-dump "Using a splat" "slp1" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index dc366d35a0a..95c489d7f76 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8903,6 +8903,12 @@ proc check_effective_target_vect_variable_length { } {
 return [expr { [lindex [available_vector_sizes] 0] == 0 }]
 }
 
+# Return 1 if the target supports vectors of 1024 bits.
+
+proc check_effective_target_vect1024 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 1024] >= 0 }]
+}
+
 # Return 1 if the target supports vectors of 512 bits.
 
 proc check_effective_target_vect512 { } {
-- 
2.36.3



[PATCH] RISC-V Regression: Make match patterns more accurate

2023-10-09 Thread Juzhe-Zhong
This patch fixes following 2 FAILs in RVV regression since the check is not 
accurate.

It's inspired by Robin's previous patch:
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern.
* gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
index 543ee98b5a4..058d1d2db2d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
@@ -77,4 +77,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
target vect_widen_mult_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
index 7049e4936b9..6f2b2210b11 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c
@@ -30,4 +30,4 @@ unsigned int main1 ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_widen_sum_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 
1 "vect" { target vect_widen_sum_hi_to_si } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target 
vect_widen_sum_hi_to_si } } } */
-- 
2.36.3



[PATCH] RISC-V Regression: Fix FAIL of predcom-2.c

2023-10-09 Thread Juzhe-Zhong
Like GCN, add -fno-tree-vectorize.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/predcom-2.c: Add riscv.

---
 gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
index f19edd4cd74..681ff7c696b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 
-fpredictive-commoning -fdump-tree-pcom-details-blocks -fno-tree-pre" } */
-/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */
+/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* 
riscv*-*-* } } */
 
 void abort (void);
 
-- 
2.36.3



[Committed] RISC-V: Add testcase for SCCVN optimization[PR111751]

2023-10-10 Thread Juzhe-Zhong
Add testcase for PR111751 which has been fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html

PR target/111751

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111751.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/pr111751.c   | 55 +++
 1 file changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c
new file mode 100644
index 000..0f1e8a7d567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#define N 16
+
+int foo1 ()
+{
+  int i;
+  char ia[N];
+  char ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  char ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+  /* Not vectorizable, multiplication */
+  for (i = 0; i < N; i++)
+{
+  ia[i] = ib[i] * ic[i];
+}
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+{
+  if (ia[i] != (char) (ib[i] * ic[i]))
+__builtin_abort ();
+}
+
+  return 0;
+}
+
+typedef int half_word;
+
+int foo2 ()
+{
+  int i;
+  half_word ia[N];
+  half_word ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  half_word ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+
+  /* Not worthwhile, only 2 parts per int */
+  for (i = 0; i < N; i++)
+{
+  ia[i] = ib[i] + ic[i];
+}
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+{
+  if (ia[i] != ib[i] + ic[i])
+__builtin_abort ();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,0\s+ret} 2 } } */
+/* { dg-final { scan-assembler-not {vset} } } */
-- 
2.36.3



[Committed] RISC-V: Add VLS BOOL mode vcond_mask[PR111751]

2023-10-10 Thread Juzhe-Zhong
Richard patch resolve PR111751: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65

which cause ICE in RISC-V regression:

FAIL: gcc.dg/torture/pr53144.c   -O2  (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in gimple_expand_vec_cond_expr, 
at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (test for excess errors)

VLS BOOL modes vcond_mask is needed to fix this regression ICE.

More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111751

Tested and Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS BOOL modes.

---
 gcc/config/riscv/autovec.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 53e9d34eea1..41bff3a318f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -575,10 +575,10 @@
 ;; -
 
 (define_expand "vcond_mask_"
-  [(match_operand:VB 0 "register_operand")
-   (match_operand:VB 1 "register_operand")
-   (match_operand:VB 2 "register_operand")
-   (match_operand:VB 3 "register_operand")]
+  [(match_operand:VB_VLS 0 "register_operand")
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "register_operand")
+   (match_operand:VB_VLS 3 "register_operand")]
   "TARGET_VECTOR"
   {
 /* mask1 = operands[3] & operands[1].  */
-- 
2.36.3



[PATCH] RISC-V Regression: Fix FAIL of pr65947-8.c for RVV

2023-10-10 Thread Juzhe-Zhong
This test is testing fold_extract_last pattern so it's more reasonable use
vect_fold_extract_last instead of specifying targets.

This is the vect_fold_extract_last property:
proc check_effective_target_vect_fold_extract_last { } {
return [expr { [check_effective_target_aarch64_sve]
   || [istarget amdgcn*-*-*]
   || [check_effective_target_riscv_v] }]
}

include ARM SVE/GCN/RVV.

It perfectly matches what we want and more reasonable, better maintainment.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last.

---
 gcc/testsuite/gcc.dg/vect/pr65947-8.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
index d0426792e35..9ced4dbb69f 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-8.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
@@ -41,6 +41,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { 
amdgcn*-*-* || aarch64_sve } } } } } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { amdgcn*-*-* 
|| aarch64_sve } } } } */
-/* { dg-final { scan-tree-dump "multiple types in double reduction or 
condition reduction" "vect" { target { ! { amdgcn*-*-* || aarch64_sve } } } } } 
*/
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { 
vect_fold_extract_last } } } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { 
vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump "multiple types in double reduction or 
condition reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */
-- 
2.36.3




[PATCH] RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVV

2023-10-10 Thread Juzhe-Zhong
As Richard suggested: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html

Add vect_ext_char_longlong to fix FAIL for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV.
* lib/target-supports.exp: Add vect_ext_char_longlong property.

---
 gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++--
 gcc/testsuite/lib/target-supports.exp  | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c 
b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
index a61f1a9a221..fd17ad7437e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
@@ -35,6 +35,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { 
! vect_unpack } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
{ vect_unpack } || { vect_variable_length && vect_ext_char_longlong } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { 
{ ! vect_unpack } && {! { vect_variable_length && vect_ext_char_longlong } } } 
} } } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 95c489d7f76..b454b07359a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4215,6 +4215,15 @@ proc check_effective_target_vect_floatuint_cvt { } {
&& [check_effective_target_riscv_v]) }}]
 }
 
+# Return 1 if the target supports vector integer char -> long long extend optab
+#
+
+proc check_effective_target_vect_ext_char_longlong { } {
+return [check_cached_effective_target_indexed vect_ext_char_longlong {
+  expr { ([istarget riscv*-*-*]
+ && [check_effective_target_riscv_v]) }}]
+}
+
 # Return 1 if peeling for alignment might be profitable on the target
 #
 
-- 
2.36.3



[PATCH] RISC-V Regression: Make pattern match more accurate of vect-live-2.c

2023-10-10 Thread Juzhe-Zhong
Like previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.

---
 gcc/testsuite/gcc.dg/vect/vect-live-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
index dae36e9ed67..0a49c96d4e0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
@@ -58,4 +58,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not 
relevant" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not 
relevant(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
-- 
2.36.3



[PATCH] RISC-V: Remove XFAIL of ssa-dom-cse-2.c

2023-10-10 Thread Juzhe-Zhong
Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not.

Remove XFAIL,  to fix:
XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv.

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index a879d305971..5c89e3f8698 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -27,4 +27,4 @@ foo ()
but the loop reads only one element at a time, and DOM cannot resolve these.
The same happens on powerpc depending on the SIMD support available.  */
 
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* 
hppa*64*-*-* nvptx*-*-* mmix-knuth-mmixware } || { { { lp64 && { powerpc*-*-* 
sparc*-*-* riscv*-*-* } } || aarch64_sve } || { arm*-*-* && { ! arm_neon } } } 
} } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* 
hppa*64*-*-* nvptx*-*-* mmix-knuth-mmixware } || { { { lp64 && { powerpc*-*-* 
sparc*-*-* } } || aarch64_sve } || { arm*-*-* && { ! arm_neon } } } } } } } */
-- 
2.36.3



[PATCH] RISC-V: Enable full coverage vect tests

2023-10-10 Thread Juzhe-Zhong
I have analyzed all existing FAILs.

Except these following FAILs need to be addressed:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_(LEN_)?SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_(LEN_)?SUB"

All other FAILs are dumple fail can be ignored (Confirm ARM SVE also has such 
FAILs and didn't fix them on either tests or implementation).

Now, It's time to enable full coverage vect tests including vec_unpack, 
vec_pack, vec_interleave, ... etc.

To see what we are still missing:

Before this patch:

=== gcc Summary ===

# of expected passes182839
# of unexpected failures79
# of unexpected successes   11
# of expected failures  1275
# of unresolved testcases   4
# of unsupported tests  4223


After this patch:

=== gcc Summary ===

# of expected passes183411
# of unexpected failures93
# of unexpected successes   7
# of expected failures  1285
# of unresolved testcases   4
# of unsupported tests  4157

There is an important issue increased that I have noticed after this patch:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

It has a related PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721

I am gonna fix this first in the middle-end after commit this patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add RVV.

---
 gcc/testsuite/lib/target-supports.exp | 45 ---
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b454b07359a..8037dbcee53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7876,7 +7876,9 @@ proc check_effective_target_vect_sdot_qi { } {
 || [istarget aarch64*-*-*]
 || [istarget arm*-*-*]
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) }}]
+&& [et-is-effective-target mips_msa])
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v]) }}]
 }
 
 # Return 1 if the target plus current options supports a vector
@@ -7891,7 +7893,9 @@ proc check_effective_target_vect_udot_qi { } {
 || [istarget arm*-*-*]
 || [istarget ia64-*-*]
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) }}]
+&& [et-is-effective-target mips_msa])
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v]) }}]
 }
 
 # Return 1 if the target plus current options supports a vector
@@ -7918,7 +7922,9 @@ proc check_effective_target_vect_sdot_hi { } {
 || [istarget ia64-*-*]
 || [istarget i?86-*-*] || [istarget x86_64-*-*]
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) }}]
+&& [et-is-effective-target mips_msa])
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v]) }}]
 }
 
 # Return 1 if the target plus current options supports a vector
@@ -7930,7 +7936,9 @@ proc check_effective_target_vect_udot_hi { } {
 return [check_cached_effective_target_indexed vect_udot_hi {
   expr { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*])
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) }}]
+&& [et-is-effective-target mips_msa])
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v]) }}]
 }
 
 # Return 1 if the target plus current options supports a vector
@@ -7945,7 +7953,9 @@ proc check_effective_target_vect_usad_char { } {
  || ([istarget aarch64*-*-*]
  && ![check_effective_target_aarch64_sve])
  || ([istarget powerpc*-*-*]
- && [check_p9vector_hw_available])}}]
+ && [check_p9vector_hw_available])
+ || ([istarget riscv*-*-*]
+ && [check_effective_target_riscv_v]) }}]
 }
 
 # Return 1 if the target plus current options supports both signed
@@ -7971,8 +7981,10 @@ proc check_effective_target_vect_mulhrs_hi {} {
 # by power-of-2 operations on vectors of 4-byte integers.
 
 proc check_effective_target_vect_sdiv_pow2_si {} {
-return [expr { [istarget aarch64*-*-*]
-  && [check_effective_targ

[PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly I made a mistake that was lucky un-exposed.

https://godbolt.org/z/c3jzrh7or

GCC is using 32 bit index offset:

vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v1,(a1),v1

This is wrong since v1 may overflow 32bit after vsll.vi.

After this patch:

vsext.vf2   v8,v4
vsll.vi v8,v8,2
vluxei64.v  v8,(a1),v8

Same as Clang.

Regression passed. Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Fix offset bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New 
function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug.
(gather_scatter_valid_offset_p): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New 
test.

---
 gcc/config/riscv/autovec.md   | 28 +--
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 16 +--
 .../autovec/gather-scatter/offset_extend-1.c  | 14 ++
 4 files changed, 42 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 41bff3a318f..07607bff71e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -104,7 +104,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -119,7 +119,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -134,7 +134,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -153,7 +153,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -172,7 +172,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -187,7 +187,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -202,7 +202,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -217,7 +217,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -232,7 +232,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_

[PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly discovered I made a mistake that was lucky un-exposed.

https://godbolt.org/z/c3jzrh7or

GCC is using 32 bit index offset:

vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v1,(a1),v1

This is wrong since v1 may overflow 32bit after vsll.vi.

After this patch:

vsext.vf2   v8,v4
vsll.vi v8,v8,2
vluxei64.v  v8,(a1),v8

Same as Clang.

Regression passed. Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Fix index bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New 
function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug.
(gather_scatter_valid_offset_mode_p): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New 
test.

---
 gcc/config/riscv/autovec.md   | 28 +--
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 13 +++--
 .../autovec/gather-scatter/offset_extend-1.c  | 14 ++
 4 files changed, 39 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 41bff3a318f..a346ad8ec1a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -104,7 +104,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -119,7 +119,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -134,7 +134,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -153,7 +153,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -172,7 +172,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -187,7 +187,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -202,7 +202,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -217,7 +217,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -232,7 +232,7 @@
(match_operand: 5 "vector_mask_operand")
(match_o

[PATCH V3] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread Juzhe-Zhong
I suddenly discovered I made a mistake that was lucky un-exposed.

https://godbolt.org/z/c3jzrh7or

GCC is using 32 bit index offset:

vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v1,(a1),v1

This is wrong since v1 may overflow 32bit after vsll.vi.

After this patch:

vsext.vf2   v8,v4
vsll.vi v8,v8,2
vluxei64.v  v8,(a1),v8

Same as Clang.

Regression passed. Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Fix index bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New 
function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug.
(gather_scatter_valid_offset_mode_p): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New 
test.

---
 gcc/config/riscv/autovec.md   | 28 +--
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 13 +++--
 .../autovec/gather-scatter/offset_extend-1.c  | 14 ++
 4 files changed, 39 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 41bff3a318f..a346ad8ec1a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -104,7 +104,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -119,7 +119,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -134,7 +134,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -153,7 +153,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -172,7 +172,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -187,7 +187,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -202,7 +202,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -217,7 +217,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -232,7 +232,7 @@
(match_operand: 5 "vector_mask_operand")
(

[PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-11 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
tree-vect-patterns.cc if it is same
situation as GATHER_LOAD (no conditional mask).

So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
argument is a dummy mask.

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map):
(vect_build_slp_tree_1):
(vect_build_slp_tree_2):
* tree-vect-stmts.cc (vectorizable_load):

---
 gcc/tree-vect-slp.cc   | 18 --
 gcc/tree-vect-stmts.cc |  4 ++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fa098f9ff4e..712c04ec278 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
  case IFN_MASK_GATHER_LOAD:
return arg1_arg4_map;
 
+ case IFN_MASK_LEN_GATHER_LOAD:
+   /* In tree-vect-patterns.cc, we will have these 2 situations:
+
+   - Unconditional gather load transforms
+ into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
+
+   - Conditional gather load transforms
+ into MASK_LEN_GATHER_LOAD with real conditional mask.*/
+   return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
+ : nullptr;
+
  case IFN_MASK_STORE:
return arg3_arg2_map;
 
@@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
 
  if (cfn == CFN_MASK_LOAD
  || cfn == CFN_GATHER_LOAD
- || cfn == CFN_MASK_GATHER_LOAD)
+ || cfn == CFN_MASK_GATHER_LOAD
+ || cfn == CFN_MASK_LEN_GATHER_LOAD)
ldst_p = true;
  else if (cfn == CFN_MASK_STORE)
{
@@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
  && rhs_code != CFN_GATHER_LOAD
  && rhs_code != CFN_MASK_GATHER_LOAD
+ && rhs_code != CFN_MASK_LEN_GATHER_LOAD
  /* Not grouped loads are handled as externals for BB
 vectorization.  For loop vectorization we can handle
 splats the same we handle single element interleaving.  */
@@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   if (gcall *stmt = dyn_cast  (stmt_info->stmt))
gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
|| gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
-   || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
+   || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+   || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
   else
{
  *max_nunits = this_max_nunits;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..263acf5d3cd 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
return false;
 
   mask_index = internal_fn_mask_index (ifn);
-  if (mask_index >= 0 && slp_node)
+  if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
-  if (mask_index >= 0
+  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
  &mask, NULL, &mask_dt, &mask_vectype))
return false;
-- 
2.36.3



[PATCH V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-11 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

To naturally reuse the current flow of GATHER_LOAD/MASK_GATHER_LOAD.

I adjust MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE pattern in 
tree-vect-patterns.cc

Here is adjustment in tree-vect-patterns.cc:

1. For un-conditional gather load/scatter store:

 MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) ---> 
MASK_LEN_GATHER_LOAD (base, offset, scale, zero)

 Note that we remove the dummy mask (-1) of MASK_LEN_GATHER_LOAD, so that 
we can reuse the current SLP flow of GATHER_LOAD.

2. For conditional gather load/scatter store:

 We don't change the IR, so they have an additional conditional mask. Then, 
we reuse the current flow of MASK_GATHER_LOAD.

So, after the recognization of patterns (tree-vect-patterns.cc), we will end up 
with scalar gather/scatter IR with different
num arguments. (4 arguments for un-conditional, 5 arguments for conditional).

The difference only apply on scalar gather/scatter IR. Pass through "call" 
argument to "internal_fn_mask_index" and return
the mask_index according to CALL for mask_len_gather/mask_len_scatter.

For vector IR, they are always same (keep original format): MASK_GATHER_LOAD 
(ptr, offset, scale, zero, mask, len, bias).
Hence, the optab of mask_len gather/scatter don't change.

To conclude, we only change the format of mask_len gather/scatter scalar IR in 
tree-vect-patterns.cc

It seems the flow of MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE after this 
patch seems to be more natural and reasonable.

Also, I realize that SLP of conditional gather_load is missing so I append a 
test for that.

RISC-V regression passed and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* internal-fn.cc (internal_fn_mask_index): Add call argument.
* internal-fn.h (internal_fn_mask_index): Ditto.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Delete 
MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE.
* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Ditto.
(vectorizable_store): Adapt for new interface of internal_fn_mask_index.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/internal-fn.cc| 16 ++--
 gcc/internal-fn.h |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-patterns.cc |  4 +---
 gcc/tree-vect-slp.cc  | 17 +++--
 gcc/tree-vect-stmts.cc|  6 +++---
 6 files changed, 49 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..009ebd95785 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4701,7 +4701,7 @@ internal_fn_len_index (internal_fn fn)
otherwise return -1.  */
 
 int
-internal_fn_mask_index (internal_fn fn)
+internal_fn_mask_index (internal_fn fn, gcall *call)
 {
   switch (fn)
 {
@@ -4717,9 +4717,21 @@ internal_fn_mask_index (internal_fn fn)
 
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
+  return 4;
+
 case IFN_MASK_LEN_GATHER_LOAD:
 case IFN_MASK_LEN_SCATTER_STORE:
-  return 4;
+  /* In tree-vect-patterns.cc, we will have these 2 situations:
+
+ - Unconditional gather load transforms
+   into MASK_LEN_GATHER_LOAD with no mask.
+
+ - Conditional gather load transforms
+   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
+  if (!call || gimple_num_args (call) == 5)
+   return 4;
+  else
+   return -1;
 
 default:
   return (conditional_internal_fn_code (fn) != ERROR_MARK
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..62fbbd537f4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -235,7 +235,7 @@ extern bool can_interpret_as_conditional_op_p (gimple *, 
tree *,
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
 extern bool internal_gather_scatter_fn_p (internal_fn);
-extern int internal_fn_mask_index (internal_fn);
+extern int internal_fn_mask_index (internal_fn, gcall * = nullptr);
 extern int internal_fn_len_index (internal_fn);
 extern int internal_fn_sto

[PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:

1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.

2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

https://godbolt.org/z/WG3M3n7Mo

GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:

f:
ble a3,zero,.L5
li  a5,1
csrrt1,vlenb
sllia5,a5,33
srlia7,t1,2
addia5,a5,1
sllia3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minua5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv  a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtua6,a7,.L3
.L5:
ret

Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.

Tested on RISC-V and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-slp.cc  | 22 ++
 gcc/tree-vect-stmts.cc| 10 +-
 3 files changed, 42 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  if (cond[i * 2])
+   y[i * 2] = x[indices[i * 2]] + 1;
+  if (cond[i * 2 + 1])
+   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fa098f9ff4e..38fe6ba6296 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -542,6 +542,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_map;
 
  case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
  case IFN_MASK_STORE:
@@ -700,8 +701,7 @@ vect_get_and_check_sl

[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

2023-10-12 Thread Juzhe-Zhong
Like ARM SVE and GCN, add RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Add RVV.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
index b348526b62f..f63b42a271a 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
@@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2)
 /* Disable for SVE because for long or variable-length vectors we don't
get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
because there we can vectorize the epilogue using mixed vector sizes.
-   Likewise for AMD GCN.  */
-/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! amdgcn*-*-* 
} } } } } */
+   Likewise for AMD GCN and RVV.  */
+/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! 
amdgcn*-*-* } && { ! riscv_v } } } } } } */
-- 
2.36.3



[PATCH] RISC-V Regression: Fix FAIL of bb-slp-68.c for RVV

2023-10-12 Thread Juzhe-Zhong
Like comment said, this test failed on 64 bytes vector.
Both RVV and GCN has 64 bytes vector.

So it's more reasonable to use vect512.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-68.c: Use vect512.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
index e7573a14933..2dd3d8ee90c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
@@ -20,4 +20,4 @@ void foo ()
 
 /* We want to have the store group split into 4, 2, 4 when using 32byte 
vectors.
Unfortunately it does not work when 64-byte vectors are available.  */
-/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } 
} } */
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail vect512 } } } 
*/
-- 
2.36.3



[Committed] RISC-V: Remove redundant iterators.

2023-10-13 Thread Juzhe-Zhong
These iterators are redundant, removed and commmitted.
gcc/ChangeLog:

* config/riscv/vector-iterators.md: Remove redundant iterators.

---
 gcc/config/riscv/vector-iterators.md | 110 ---
 1 file changed, 110 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 96ddd34c958..6800f8d3d76 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -295,83 +295,6 @@
   RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
 ])
 
-(define_mode_iterator VLMULEXT2 [
-  RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
-
-  RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-
-  RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
-  (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-
-  (RVVM4DI "TARGET_VECTOR_ELEN_64") (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI 
"TARGET_VECTOR_ELEN_64")
-
-  (RVVM4DF "TARGET_VECTOR_ELEN_FP_64") (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") 
(RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-])
-
-(define_mode_iterator VLMULEXT4 [
-  RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
-
-  RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  (RVVM2HF "TARGET_VECTOR_ELEN_FP_16") (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-
-  RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
-  (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-
-  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
-
-  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-])
-
-(define_mode_iterator VLMULEXT8 [
-  RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
-
-  RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-
-  RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-
-  (RVVM1DI "TARGET_VECTOR_ELEN_64")
-
-  (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-])
-
-(define_mode_iterator VLMULEXT16 [
-  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
-
-  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && 
TARGET_MIN_VLEN > 32")
-
-  (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
-  (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VLMULEXT32 [
-  RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
-
-  (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VLMULEXT64 [
-  (RVVMF8QI "TARGET_MIN_VLEN > 32")
-])
-
 (define_mode_iterator VEI16 [
   RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
 
@@ -1579,39 +1502,6 @@
   RVVM4x2QI
 ])
 
-(define_mode_iterator VQI [
-  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
-])
-
-(define_mode_iterator VHI [
-  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VSI [
-  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VDI [
-  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
-  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
-])
-
-(define_mode_iterator VHF [
-  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
-  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
-  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VSF [
-  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-])
-
-(define_mode_iterator VDF [
-  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
-  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-])
-
 (define_mode_attr V_LMUL1 [
   (RVVM8QI "RVVM1QI") (RVVM4QI "RVVM1QI") (RVVM2QI "RVVM1QI") (RVVM1QI 
"RVVM1QI") (RVVMF2QI "RVVM1QI") (RVVMF4QI "RVVM1QI") (RVVMF8QI "RVVM1QI")
 
-- 
2.36.3



[Committed] RISC-V: Fix vsingle attribute

2023-10-14 Thread Juzhe-Zhong
RVVM2x2QI should be rvvm2qi instead of rvvmq1i.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix vsingle incorrect attribute for 
RVVM2x2QI.

---
 gcc/config/riscv/vector-iterators.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 6800f8d3d76..0850475edc1 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -2230,7 +2230,7 @@
   (RVVM1x5QI "rvvm1qi") (RVVMF2x5QI "rvvmf2qi") (RVVMF4x5QI "rvvmf4qi") 
(RVVMF8x5QI "rvvmf8qi")
   (RVVM2x4QI "rvvm2qi") (RVVM1x4QI "rvvm1qi") (RVVMF2x4QI "rvvmf2qi") 
(RVVMF4x4QI "rvvmf4qi") (RVVMF8x4QI "rvvmf8qi")
   (RVVM2x3QI "rvvm2qi") (RVVM1x3QI "rvvm1qi") (RVVMF2x3QI "rvvmf2qi") 
(RVVMF4x3QI "rvvmf4qi") (RVVMF8x3QI "rvvmf8qi")
-  (RVVM4x2QI "rvvm4qi") (RVVM2x2QI "rvvm1qi") (RVVM1x2QI "rvvm1qi") 
(RVVMF2x2QI "rvvmf2qi") (RVVMF4x2QI "rvvmf4qi") (RVVMF8x2QI "rvvmf8qi")
+  (RVVM4x2QI "rvvm4qi") (RVVM2x2QI "rvvm2qi") (RVVM1x2QI "rvvm1qi") 
(RVVMF2x2QI "rvvmf2qi") (RVVMF4x2QI "rvvmf4qi") (RVVMF8x2QI "rvvmf8qi")
 
   (RVVM1x8HI "rvvm1hi") (RVVMF2x8HI "rvvmf2hi") (RVVMF4x8HI "rvvmf4hi")
   (RVVM1x7HI "rvvm1hi") (RVVMF2x7HI "rvvmf2hi") (RVVMF4x7HI "rvvmf4hi")
-- 
2.36.3



[PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-15 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

2023-10-16 Thread Juzhe-Zhong
void
foo8 (int64_t *restrict a)
{
  for (int i = 0; i < 16; ++i)
a[i] = a[i]-16;
}

We use VLS modes instead of VLA modes even it is specified by dynamic LMUL.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use 
VLS modes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 13 ++--
 .../costmodel/riscv/rvv/no-dynamic-lmul-1.c   | 64 +++
 2 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 11257f7c2bd..4482af2e039 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -530,10 +530,6 @@ costs::preferred_new_lmul_p (const vector_costs 
*uncast_other) const
   auto other_loop_vinfo = as_a (other->m_vinfo);
   class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo);
 
-  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo)
-  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo))
-return false;
-
   if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p)
 return false;
   else if (loop_autovec_infos.get (loop))
@@ -567,6 +563,15 @@ costs::preferred_new_lmul_p (const vector_costs 
*uncast_other) const
   machine_mode biggest_mode
 = compute_local_live_ranges (program_points_per_bb, live_ranges_per_bb);
 
+  /* If we can use simple VLS modes to handle NITERS element.
+ We don't need to use VLA modes with partial vector auto-vectorization.  */
+  if (LOOP_VINFO_NITERS_KNOWN_P (this_loop_vinfo)
+  && known_le (tree_to_poly_int64 (LOOP_VINFO_NITERS (this_loop_vinfo))
+* GET_MODE_SIZE (biggest_mode).to_constant (),
+  (int) RVV_M8 * BYTES_PER_RISCV_VECTOR)
+  && pow2p_hwi (LOOP_VINFO_INT_NITERS (this_loop_vinfo)))
+return vector_costs::better_main_loop_than_p (other);
+
   /* Update live ranges according to PHI.  */
   update_local_live_ranges (other->m_vinfo, program_points_per_bb,
live_ranges_per_bb);
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c
new file mode 100644
index 000..7ede148396f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl4096b -mabi=lp64d -fdump-tree-vect-details" 
} */
+
+#include 
+
+void
+foo (int8_t *restrict a)
+{
+  for (int i = 0; i < 4096; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo2 (int16_t *restrict a)
+{
+  for (int i = 0; i < 2048; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo3 (int32_t *restrict a)
+{
+  for (int i = 0; i < 1024; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo4 (int64_t *restrict a)
+{
+  for (int i = 0; i < 512; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo5 (int8_t *restrict a)
+{
+  for (int i = 0; i < 16; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo6 (int16_t *restrict a)
+{
+  for (int i = 0; i < 16; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo7 (int32_t *restrict a)
+{
+  for (int i = 0; i < 16; ++i)
+a[i] = a[i]-16;
+}
+
+void
+foo8 (int64_t *restrict a)
+{
+  for (int i = 0; i < 16; ++i)
+a[i] = a[i]-16;
+}
+
+/* { dg-final { scan-tree-dump-not "Maximum lmul" "vect" } } */
+/* { dg-final { scan-assembler-times {vsetvli} 4 } } */
+/* { dg-final { scan-assembler-times {vsetivli} 4 } } */
-- 
2.36.3



[PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[PATCH V4] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[PATCH V4] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:

1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.

2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

https://godbolt.org/z/WG3M3n7Mo

GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:

f:
ble a3,zero,.L5
li  a5,1
csrrt1,vlenb
sllia5,a5,33
srlia7,t1,2
addia5,a5,1
sllia3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minua5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv  a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtua6,a7,.L3
.L5:
ret

Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.

Tested on RISC-V and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-slp.cc  | 22 ++
 gcc/tree-vect-stmts.cc|  9 -
 3 files changed, 41 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  if (cond[i * 2])
+   y[i * 2] = x[indices[i * 2]] + 1;
+  if (cond[i * 2 + 1])
+   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index af8f5031bd2..b379278446b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -550,6 +550,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_map;
 
  case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
  case IFN_MASK_STORE:
@@ -717,8 +718,7 @@ vect_get_and_check_slp

[PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

2023-10-17 Thread Juzhe-Zhong
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html

which is caused by assertion FAIL.

When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832

Now, we enable more tests in rvv.exp in this patch and fix the bug.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.

---
 gcc/config/riscv/riscv-vector-costs.cc | 19 +--
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 10 --
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 33061efb1d0..af87388a1e4 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -154,6 +154,14 @@ compute_local_program_points (
 }
 }
 
+static machine_mode
+get_biggest_mode (machine_mode mode1, machine_mode mode2)
+{
+  unsigned int mode1_size = GET_MODE_BITSIZE (mode1).to_constant ();
+  unsigned int mode2_size = GET_MODE_BITSIZE (mode2).to_constant ();
+  return mode1_size >= mode2_size ? mode1 : mode2;
+}
+
 /* Compute local live ranges of each vectorized variable.
Note that we only compute local live ranges (within a block) since
local live ranges information is accurate enough for us to determine
@@ -201,12 +209,12 @@ compute_local_live_ranges (
{
  unsigned int point = program_point.point;
  gimple *stmt = program_point.stmt;
- machine_mode mode = biggest_mode;
  tree lhs = gimple_get_lhs (stmt);
  if (lhs != NULL_TREE && is_gimple_reg (lhs)
  && !POINTER_TYPE_P (TREE_TYPE (lhs)))
{
- mode = TYPE_MODE (TREE_TYPE (lhs));
+ biggest_mode = get_biggest_mode (biggest_mode,
+  TYPE_MODE (TREE_TYPE (lhs)));
  bool existed_p = false;
  pair &live_range
= live_ranges->get_or_insert (lhs, &existed_p);
@@ -225,7 +233,9 @@ compute_local_live_ranges (
 the future.  */
  if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
{
- mode = TYPE_MODE (TREE_TYPE (var));
+ biggest_mode
+   = get_biggest_mode (biggest_mode,
+   TYPE_MODE (TREE_TYPE (var)));
  bool existed_p = false;
  pair &live_range
= live_ranges->get_or_insert (var, &existed_p);
@@ -238,9 +248,6 @@ compute_local_live_ranges (
live_range = pair (0, point);
}
}
- if (GET_MODE_SIZE (mode).to_constant ()
- > GET_MODE_SIZE (biggest_mode).to_constant ())
-   biggest_mode = mode;
}
  if (dump_enabled_p ())
for (hash_map::iterator iter = live_ranges->begin ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index ff76e17d0e6..674ba0d72b4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -58,10 +58,12 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O3 --param riscv-autovec-lmul=dynamic} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} \
-  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} ]
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic} ]
 foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/partial/*.\[cS\]]] \
 "" "$op"
@@ -104,18 +106,22 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=dynamic -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} 

[PATCH] RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx

2023-10-17 Thread Juzhe-Zhong
This patch optimize this following permutation with consecutive patterns index:

typedef char vnx16i __attribute__ ((vector_size (16)));

#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15

vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, MASK_16);
}

Before this patch:

lui a5,%hi(.LC0)
addia5,a5,%lo(.LC0)
vsetivlizero,16,e8,m1,ta,ma
vle8.v  v3,0(a5)
vle8.v  v2,0(a1)
vrgather.vv v1,v2,v3
vse8.v  v1,0(a0)
ret

After this patch:

vsetivlizero,16,e8,mf8,ta,ma
vle8.v  v2,0(a1)
vsetivlizero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivlizero,16,e8,mf8,ta,ma
vse8.v  v1,0(a0)
ret

Overal reduce 1 instruction which is vector load instruction which is much more 
expansive
than VL toggling.

Also, with this patch, we are using vrgather.vi which reduce 1 vector register 
consumption.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
(expand_vec_perm_const_1): Add consecutive pattern recognition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 85 +
 .../rvv/autovec/vls-vlmax/consecutive-1.c | 21 +
 .../rvv/autovec/vls-vlmax/consecutive-2.c | 45 +
 .../rvv/autovec/vls-vlmax/consecutive_run-1.c | 27 ++
 .../rvv/autovec/vls-vlmax/consecutive_run-2.c | 51 ++
 .../riscv/rvv/autovec/vls/consecutive-1.c | 94 +++
 .../riscv/rvv/autovec/vls/consecutive-2.c | 68 ++
 .../riscv/rvv/autovec/vls/consecutive-3.c | 68 ++
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  6 ++
 9 files changed, 465 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 21d86c3f917..895c11d13fc 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2822,6 +2822,89 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* Recognize the consecutive index that we can use a single
+   vrgather.v[x|i] to shuffle the vectors.
+
+   e.g. short[8] = VEC_PERM_EXPR 
+   Use SEW = 32, index = 1 vrgather.vi to get the result.  */
+static bool
+shuffle_consecutive_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  scalar_mode smode = GET_MODE_INNER (vmode);
+  poly_int64 vec_len = d->perm.length ();
+  HOST_WIDE_INT elt;
+
+  if (!vec_len.is_constant () || !d->perm[0].is_constant (&elt))
+return false;
+  int vlen = vec_len.to_constant ();
+
+  /* Compute the last element index of consecutive pattern from the leading
+ consecutive elements.  */
+  int last_consecutive_idx = -1;
+  int consecutive_num = -1;
+  for (int i = 1; i < vlen; i++)
+{
+  if (maybe_ne (d->perm[i], d->perm[i - 1] + 1))
+   break;
+  last_consecutive_idx = i;
+  consecutive_num = last_consecutive_idx + 1;
+}
+
+  int new_vlen = vlen / consecutive_num;
+  if (last_consecutive_idx < 0 || consecutive_num == vlen
+  || !pow2p_hwi (consecutive_num) || !pow2p_hwi (new_vlen))
+return false;
+  /* VEC_PERM <..., (index, index + 1, ... index + consecutive_num - 1)>.
+ All elements of index, index + 1, ... index + consecutive_num - 1 should
+ locate at the same vector.  */
+  if (maybe_ge (d->perm[0], vec_len)
+  != maybe_ge (d->perm[last_consecutive_idx], vec_len))
+return false;
+  /* If a vector has 8 elements.  We allow optimizations on consecutive
+ patterns e.g. <0, 1, 2, 3, 0, 1, 2, 3> or <4, 5, 6, 7, 4, 5, 6, 7>.
+ Other patterns like <2, 3, 4, 5, 2, 3, 4, 5> are not feasible patterns
+ to be optimiz

[PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread Juzhe-Zhong
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

But it generate horrible register spillings.

The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.

So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain 
better optimizations:

1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.

Before this patch:

f3:
ble a4,zero,.L8
csrrt0,vlenb
sllit1,t0,4
csrra6,vlenb
sub sp,sp,t1
csrra5,vlenb
sllia6,a6,3
sllia5,a5,2
add a6,a6,sp
vsetvli a7,zero,e16,m8,ta,ma
sllia4,a4,3
vid.v   v8
addit6,a5,-1
vand.vi v8,v8,-2
neg t5,a5
vs8r.v  v8,0(sp)
vadd.vi v8,v8,1
vs8r.v  v8,0(a6)
j   .L4
.L12:
vsetvli a7,zero,e16,m8,ta,ma
.L4:
csrrt0,vlenb
sllit0,t0,3
vl8re16.v   v16,0(sp)
add t0,t0,sp
vmv.v.x v8,t6
mv  t1,a4
vand.vv v24,v16,v8
mv  a6,a4
vl8re16.v   v16,0(t0)
vand.vv v8,v16,v8
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v20,0(a2)
vle8.v  v16,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v24
vadd.vv v4,v16,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v20,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L12
csrrt0,vlenb
sllit1,t0,4
add sp,sp,t1
jr  ra
.L8:
ret

After this patch:

bar:
ble a3,zero,.L5
csrra5,vlenb
csrrt1,vlenb
srlia5,a5,1
srlia7,t1,1
addia5,a5,-1
vsetvli a4,zero,e32,m2,ta,ma
sllia3,a3,1
vmv.v.x v2,a5
vid.v   v18
vmv.v.x v6,a1
vand.vi v10,v18,-2
vand.vi v0,v18,1
vadd.vi v16,v10,1
vmseq.viv0,v0,1
vand.vv v10,v10,v2
vand.vv v16,v16,v2
sllit1,t1,1
vsetvli zero,a4,e32,m2,ta,ma
neg t3,a7
viota.m v4,v0
vsetvli a4,zero,e32,m2,ta,mu
vmv.v.x v8,a2
vrgather.vv v14,v6,v4
vrgather.vv v12,v8,v4
vmv.v.i v2,0
vrgather.vv v14,v8,v4,v0.t
vrgather.vv v12,v6,v4,v0.t
.L4:
mv  a2,a3
mv  a5,a3
bleua3,a7,.L3
mv  a5,a7
.L3:
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v6,0(a0)
vsetvli a6,zero,e32,m2,ta,ma
add a3,a3,t3
vrgather.vv v4,v6,v10
vrgather.vv v8,v6,v16
vsub.vv v4,v4,v12
add a0,a0,t1
vsetvli zero,a5,e32,m2,tu,ma
vadd.vv v2,v2,v4
vmacc.vvv2,v14,v8
bgtua2,a7,.L4
li  a5,-1
vsetvli a6,zero,e32,m2,ta,ma
li  a4,0
vmv.v.i v4,0
vmul.vx v0,v18,a5
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v4
vand.vi v18,v18,1
vmerge.vvm  v6,v4,v2,v0
vmseq.vvv18,v18,v4
vmv.s.x v1,a4
vmv1r.v v0,v18
vredsum.vs  v6,v6,v1
vmerge.vvm  v4,v4,v2,v0
vmv.x.s a0,v6
vredsum.vs  v4,v4,v1
vmv.x.s a5,v4
addwa0,a0,a5
ret
.L5:
li  a0,0
ret

Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/

[PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread Juzhe-Zhong
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

But it generate horrible register spillings.

The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.

So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain 
better optimizations:

1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.

Before this patch:

f3:
ble a4,zero,.L8
csrrt0,vlenb
sllit1,t0,4
csrra6,vlenb
sub sp,sp,t1
csrra5,vlenb
sllia6,a6,3
sllia5,a5,2
add a6,a6,sp
vsetvli a7,zero,e16,m8,ta,ma
sllia4,a4,3
vid.v   v8
addit6,a5,-1
vand.vi v8,v8,-2
neg t5,a5
vs8r.v  v8,0(sp)
vadd.vi v8,v8,1
vs8r.v  v8,0(a6)
j   .L4
.L12:
vsetvli a7,zero,e16,m8,ta,ma
.L4:
csrrt0,vlenb
sllit0,t0,3
vl8re16.v   v16,0(sp)
add t0,t0,sp
vmv.v.x v8,t6
mv  t1,a4
vand.vv v24,v16,v8
mv  a6,a4
vl8re16.v   v16,0(t0)
vand.vv v8,v16,v8
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v20,0(a2)
vle8.v  v16,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v24
vadd.vv v4,v16,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v20,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L12
csrrt0,vlenb
sllit1,t0,4
add sp,sp,t1
jr  ra
.L8:
ret

After this patch:

f3:
ble a4,zero,.L6
csrra6,vlenb
csrra5,vlenb
sllia6,a6,2
sllia5,a5,2
addia6,a6,-1
sllia4,a4,3
neg t5,a5
vsetvli t1,zero,e16,m8,ta,ma
vmv.v.x v24,a6
vid.v   v8
vand.vi v8,v8,-2
vadd.vi v16,v8,1
vand.vv v8,v8,v24
vand.vv v16,v16,v24
.L4:
mv  t1,a4
mv  a6,a4
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v28,0(a2)
vle8.v  v24,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v8
vadd.vv v4,v24,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v28,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v16
vadd.vv v4,v4,v24
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L4
.L6:
ret

Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c 
execution test

They failed are all because of bugs on VSETVL PASS:

10dd4:   0c707057vsetvli zero,zero,e8,mf2,ta,ma
   10dd8:   5e06b8d7vmv.v.i v17,13
   10ddc:   9ed030d7vmv1r.v v1,v13
   10de0:   b21040d7vncvt.x.x.w v1,v1   > 
raise illegal instruction since we don't have SEW = 8 -> SEW = 4 narrowing.
   10de4:   5e0785d7vmv.v.v v11,v15

Confirm the recent VSETVL refactor patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of 
them.

So this patch should be committed after the VSETVL refactor patch.

PR target/111848

gcc/Change

[PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-18 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:

1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.

2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

https://godbolt.org/z/WG3M3n7Mo

GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:

f:
ble a3,zero,.L5
li  a5,1
csrrt1,vlenb
sllia5,a5,33
srlia7,t1,2
addia5,a5,1
sllia3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minua5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv  a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtua6,a7,.L3
.L5:
ret

Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.

Tested on RISC-V and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-slp.cc  | 22 ++
 gcc/tree-vect-stmts.cc| 12 +++-
 3 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  if (cond[i * 2])
+   y[i * 2] = x[indices[i * 2]] + 1;
+  if (cond[i * 2 + 1])
+   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d081999a763..146dba658a2 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_map;
 
  case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
  case IFN_MASK_STORE:
@@ -719,8 +720,7 @@ vect_get_and_check_

[PATCH] RISC-V: Add RVV FMA auto-vectorization support

2023-05-26 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
   PASS when testing ternop-3.c.

gcc/ChangeLog:

* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add ternary enum.
(emit_vlmax_ternop_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternop_insn): Ditto.
* config/riscv/vector.md: Fix ternary patterns bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add ternop tests.
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   |  65 +++
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   |  22 
 gcc/config/riscv/vector.md|   2 +-
 .../riscv/rvv/autovec/ternop/ternop-1.c   |  27 +
 .../riscv/rvv/autovec/ternop/ternop-2.c   |  33 ++
 .../riscv/rvv/autovec/ternop/ternop-3.c   |  33 ++
 .../riscv/rvv/autovec/ternop/ternop_run-1.c   |  84 ++
 .../riscv/rvv/autovec/ternop/ternop_run-2.c   | 104 ++
 .../riscv/rvv/autovec/ternop/ternop_run-3.c   | 104 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 11 files changed, 477 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..ba1240014dc 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,3 +373,68 @@
 DONE;
   }
 )
+
+;; =
+;; == Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VMACC and VMADD
+;; -
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;;The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;;The vmadd is the ideal instruction when operands[1|2] overlaps 
operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand 
which
+;;is the operands[5]. Since operands[5] should overlap operands[0], this 
operand
+;;should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow 
undefined
+;;operand.
+;; 3. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL 
operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap 
operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register 
allocation
+;;result after reload_completed.
+(define_expand "fma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+ (plus:VI
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"))
+   (match_operand:VI 3 "register_operand"   " vr")))
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+   (plus:VI
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+   (match_operand:VI 2 "register_operand" " vr, vr,   vr"))
+ (match_operand:VI 3 "register_operand"   

[PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-26 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
   PASS when testing ternop-3.c.

gcc/ChangeLog:

* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
* config/riscv/vector.md: Fix vimuladd instruction bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   |  65 +++
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   |  20 
 gcc/config/riscv/vector.md|   2 +-
 .../riscv/rvv/autovec/ternop/ternop-1.c   |  28 +
 .../riscv/rvv/autovec/ternop/ternop-2.c   |  34 ++
 .../riscv/rvv/autovec/ternop/ternop-3.c   |  33 ++
 .../riscv/rvv/autovec/ternop/ternop_run-1.c   |  84 ++
 .../riscv/rvv/autovec/ternop/ternop_run-2.c   | 104 ++
 .../riscv/rvv/autovec/ternop/ternop_run-3.c   | 104 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 11 files changed, 477 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..04825df1210 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,3 +373,68 @@
 DONE;
   }
 )
+
+;; =
+;; == Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VMACC and VMADD
+;; -
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;;The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;;The vmadd is the ideal instruction when operands[1|2] overlaps 
operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand 
which
+;;is the operands[5]. Since operands[5] should overlap operands[0], this 
operand
+;;should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow 
undefined
+;;operand.
+;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL 
operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap 
operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register 
allocation
+;;result after reload_completed.
+(define_expand "fma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+ (plus:VI
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"))
+   (match_operand:VI 3 "register_operand"   " vr")))
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+   (plus:VI
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+   (match_operand:VI 2 "register_operand" " vr, vr,   vr"))
+ (match_operand:VI 3 "register_operand"   

[PATCH] RISC-V: Remove redundant printf of abs-run.c

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong 

Notice that this testcase cause unexpected fail:
FAIL: gcc.target/riscv/rvv/autovec/unop/abs-run.c (test for excess errors)
Excess errors:
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7:
 warning: implicit declaration of function 'printf' 
[-Wimplicit-function-declaration]
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7:
 warning: incompatible implicit declaration of built-in function 'printf' 
[-Wbuiltin-declaration-mismatch]
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7:
 warning: incompatible implicit declaration of built-in function 'printf' 
[-Wbuiltin-declaration-mismatch]
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7:
 warning: incompatible implicit declaration of built-in function 'printf' 
[-Wbuiltin-declaration-mismatch]
/work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7:
 warning: incompatible implicit declaration of built-in function 'printf' 
[-Wbuiltin-declaration-mismatch]

spawn /work/home/jzzhong/work/rvv-opensource/output/sim/bin/spike 
--isa=RV64GCVZfh 
/work/home/jzzhong/work/rvv-opensource/output/sim/riscv64-rivai-elf/bin/pk 
./abs-run.exe^M
bbl loader^M^M
0 0 -64^M
1 63 -63^M
2 2 -62^M
3 61 -61^M
4 4 -60^M
5 59 -59^M
6 6 -58^M
7 57 -57^M
8 8 -56^M
9 55 -55^M
10 10 -54^M
11 53 -53^M
12 12 -52^M
13 51 -51^M

Remove printf since it's unnecessary.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Remove redundant printf.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
index 7404dbe037e..d864b54229b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
@@ -19,7 +19,6 @@
   vabs_##TYPE (a##TYPE, a##TYPE, SZ);  \
   for (int i = 0; i < SZ; i++) \
 {  \
-  printf ("%d %d %d\n", i, a##TYPE[i], i - 64);\
   if (i & 1)   \
assert (a##TYPE[i] == abs (i - 64));\
   else \
-- 
2.36.3



[PATCH] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong 

Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc 
is not updated
and we can't support mode switching for this.

We can support floating-point to integer conversion now since it's not 
depending on FRM and
we don't need mode switching support for this ('rtz' conversions independent 
FRM).

gcc/ChangeLog:

* config/riscv/autovec.md (2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New 
test.

---
 gcc/config/riscv/autovec.md   | 23 
 gcc/config/riscv/iterators.md |  4 +-
 gcc/config/riscv/vector-iterators.md  |  5 ++
 .../rvv/autovec/conversions/vfcvt_rtz-run.c   | 52 +++
 .../autovec/conversions/vfcvt_rtz-rv32gcv.c   |  6 +++
 .../autovec/conversions/vfcvt_rtz-rv64gcv.c   |  6 +++
 .../autovec/conversions/vfcvt_rtz-template.h  | 15 ++
 7 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b24867ae4d0..3989ffb26ee 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -478,6 +478,29 @@
   DONE;
 })
 
+;; =
+;; == Conversions
+;; =
+
+;; -
+;;  [INT<-FP] Conversions
+;; -
+;; Includes:
+;; - vfcvt.rtz.xu.f.v
+;; - vfcvt.rtz.x.f.v
+;; -
+
+(define_expand "2"
+  [(set (match_operand: 0 "register_operand")
+   (any_fix:
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Unary arithmetic
 ;; =
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8afe98e4410..d374a10810c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -225,7 +225,9 @@
 (ss_minus "sssub")
 (us_minus "ussub")
 (sign_extend "extend")
-(zero_extend "zero_extend")])
+(zero_extend "zero_extend")
+(fix "fix_trunc")
+(unsigned_fix "fixuns_trunc")])
 
 ;;  code attributes
 (define_code_attr or_optab [(ior "ior")
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70fb5b80b1b..937ec3c7f67 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1208,6 +1208,11 @@
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") 
(VNx16DF "VNx16DI")
 ])
 
+(define_mode_attr vconvert [
+  (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") 
(VNx16SF "vnx16si") (VNx32SF "vnx32si")
+  (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") 
(VNx16DF "vnx16di")
+])
+
 (define_mode_attr VNCONVERT [
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") 
(VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") 
(VNx16DI "VNx16SF")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c 
b/gcc/testsuite/gcc.target/riscv/r

[PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong 

Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc 
is not updated
and we can't support mode switching for this.

We can support floating-point to integer conversion now since it's not 
depending on FRM and
we don't need mode switching support for this ('rtz' conversions independent 
FRM).

gcc/ChangeLog:

* config/riscv/autovec.md (2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New 
test.

---
 gcc/config/riscv/autovec.md   | 23 
 gcc/config/riscv/iterators.md |  4 +-
 gcc/config/riscv/vector-iterators.md  |  5 ++
 .../rvv/autovec/conversions/vfcvt_rtz-run.c   | 52 +++
 .../autovec/conversions/vfcvt_rtz-rv32gcv.c   |  6 +++
 .../autovec/conversions/vfcvt_rtz-rv64gcv.c   |  6 +++
 .../autovec/conversions/vfcvt_rtz-template.h  | 15 ++
 7 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b24867ae4d0..3989ffb26ee 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -478,6 +478,29 @@
   DONE;
 })
 
+;; =
+;; == Conversions
+;; =
+
+;; -
+;;  [INT<-FP] Conversions
+;; -
+;; Includes:
+;; - vfcvt.rtz.xu.f.v
+;; - vfcvt.rtz.x.f.v
+;; -
+
+(define_expand "2"
+  [(set (match_operand: 0 "register_operand")
+   (any_fix:
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Unary arithmetic
 ;; =
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8afe98e4410..d374a10810c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -225,7 +225,9 @@
 (ss_minus "sssub")
 (us_minus "ussub")
 (sign_extend "extend")
-(zero_extend "zero_extend")])
+(zero_extend "zero_extend")
+(fix "fix_trunc")
+(unsigned_fix "fixuns_trunc")])
 
 ;;  code attributes
 (define_code_attr or_optab [(ior "ior")
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70fb5b80b1b..937ec3c7f67 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1208,6 +1208,11 @@
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") 
(VNx16DF "VNx16DI")
 ])
 
+(define_mode_attr vconvert [
+  (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") 
(VNx16SF "vnx16si") (VNx32SF "vnx32si")
+  (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") 
(VNx16DF "vnx16di")
+])
+
 (define_mode_attr VNCONVERT [
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") 
(VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") 
(VNx16DI "VNx16SF")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c 
b/gcc/testsuite/gcc.target/riscv/r

[PATCH] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong 

Like FMA, Add FNMA auto-vectorization support.

gcc/ChangeLog:

* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.

---
 gcc/config/riscv/autovec.md   |  45 
 .../riscv/rvv/autovec/ternop/ternop-4.c   |  28 +
 .../riscv/rvv/autovec/ternop/ternop-5.c   |  34 ++
 .../riscv/rvv/autovec/ternop/ternop-6.c   |  33 ++
 .../riscv/rvv/autovec/ternop/ternop_run-4.c   |  84 ++
 .../riscv/rvv/autovec/ternop/ternop_run-5.c   | 104 ++
 .../riscv/rvv/autovec/ternop/ternop_run-6.c   | 104 ++
 7 files changed, 432 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index eff3e484fb4..20004a8af27 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -606,3 +606,48 @@
   }
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] VMACC and VMADD
+;; -
+;; Includes:
+;; - vnmsac
+;; - vnmsub
+;; -
+
+(define_expand "fnma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+ (minus:VI
+   (match_operand:VI 3 "register_operand"   " vr")
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fnma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+   (minus:VI
+ (match_operand:VI 3 "register_operand"   " vr,  0,   vr")
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+   (match_operand:VI 2 "register_operand" " vr, vr,   vr"
+   (clobber (match_scratch:SI 4 "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+PUT_MODE (operands[4], Pmode);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+if (which_alternative == 2)
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vimuladd")
+   (set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
new file mode 100644
index 000..22d11de89a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE)
\
+  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,
\
+ TYPE *__restrict a,  \
+ TYPE *__restrict b, int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] += -(a[i] * b[i]);
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_T

[PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-28 Thread juzhe . zhong
From: Juzhe-Zhong 

Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.

gcc/ChangeLog:

* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.

---
 gcc/config/riscv/autovec.md   |  45 
 .../riscv/rvv/autovec/ternop/ternop-4.c   |  28 +
 .../riscv/rvv/autovec/ternop/ternop-5.c   |  34 ++
 .../riscv/rvv/autovec/ternop/ternop-6.c   |  33 ++
 .../riscv/rvv/autovec/ternop/ternop_run-4.c   |  84 ++
 .../riscv/rvv/autovec/ternop/ternop_run-5.c   | 104 ++
 .../riscv/rvv/autovec/ternop/ternop_run-6.c   | 104 ++
 7 files changed, 432 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index eff3e484fb4..a1028d71467 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -606,3 +606,48 @@
   }
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] VNMSAC and VNMSUB
+;; -
+;; Includes:
+;; - vnmsac
+;; - vnmsub
+;; -
+
+(define_expand "fnma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+ (minus:VI
+   (match_operand:VI 3 "register_operand"   " vr")
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fnma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+   (minus:VI
+ (match_operand:VI 3 "register_operand"   " vr,  0,   vr")
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+   (match_operand:VI 2 "register_operand" " vr, vr,   vr"
+   (clobber (match_scratch:SI 4 "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+PUT_MODE (operands[4], Pmode);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+if (which_alternative == 2)
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vimuladd")
+   (set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
new file mode 100644
index 000..22d11de89a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE)
\
+  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,
\
+ TYPE *__restrict a,  \
+ TYPE *__restrict b, int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] += -(a[i] * b[i]);
\
+  }
+
+#define TEST_ALL()

[PATCH] RISC-V: Fix warning in riscv.md

2023-05-29 Thread juzhe . zhong
From: Juzhe-Zhong 

Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ???rtx_def* 
gen_anddi3(rtx, rtx, rtx)???:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))

Add unsigned conversion to fix this warning.

gcc/ChangeLog:

* config/riscv/riscv.md: Fix signed and unsigned comparison warning.

---
 gcc/config/riscv/riscv.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index aba203318a7..3d71f59c3a9 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1353,9 +1353,9 @@
   if (CONST_INT_P (operands[2]))
 {
   enum machine_mode tmode = VOIDmode;
-  if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
+  if ((unsigned HOST_WIDE_INT) INTVAL (operands[2]) == GET_MODE_MASK 
(HImode))
tmode = HImode;
-  else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
+  else if ((unsigned HOST_WIDE_INT) INTVAL (operands[2]) == GET_MODE_MASK 
(SImode))
tmode = SImode;
 
   if (tmode != VOIDmode)
-- 
2.36.3



[PATCH V2] RISC-V: Fix warning in riscv.md

2023-05-29 Thread juzhe . zhong
From: Juzhe-Zhong 

Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ???rtx_def* 
gen_anddi3(rtx, rtx, rtx):
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
   else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))

Add unsigned conversion to fix this warning.

gcc/ChangeLog:

* config/riscv/riscv.md: Fix signed and unsigned comparison warning.

---
 gcc/config/riscv/riscv.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index aba203318a7..f545874edc1 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1353,9 +1353,9 @@
   if (CONST_INT_P (operands[2]))
 {
   enum machine_mode tmode = VOIDmode;
-  if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
+  if (UINTVAL (operands[2]) == GET_MODE_MASK (HImode))
tmode = HImode;
-  else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
+  else if (UINTVAL (operands[2]) == GET_MODE_MASK (SImode))
tmode = SImode;
 
   if (tmode != VOIDmode)
-- 
2.36.3



[PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

ALL tests (decrement IV) of RVV are passed.
Ok for trunk?

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 40 +
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..ef28711c58f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,24 +538,26 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
-goto ; [83.33%]
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
+goto ; [83.33%]
   else
-goto ; [16.67%]
+goto ; [16.67%]
   */
   nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
-insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 &preheader_seq, &header_seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-&iv_step);
+&iv_step, &compare_step);
 
iv_rgc = rgc;
  }
@@ -884,11 +887,22 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
+  gcond *cond_stmt;
   tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
-  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step,
+NULL_TREE, NULL_TREE);
+  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+}
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
  Subtract one from this to get the latch count.  */
-- 
2.36.1



[PATCH] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_perm): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.

---
 gcc/config/riscv/autovec.md   |  19 +++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 130 +
 .../riscv/rvv/autovec/vls-vlmax/perm-1.c  |  58 
 .../riscv/rvv/autovec/vls-vlmax/perm-2.c  |  33 +
 .../riscv/rvv/autovec/vls-vlmax/perm-3.c  |  29 
 .../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  58 
 .../riscv/rvv/autovec/vls-vlmax/perm-5.c  |  49 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-6.c  |  58 
 .../riscv/rvv/autovec/vls-vlmax/perm-7.c  |  49 +++
 .../riscv/rvv/autovec/vls-vlmax/perm.h|  70 +
 .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c  | 104 +
 .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c  |  32 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c  |  20 +++
 .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c  | 104 +
 .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c  | 137 ++
 .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c  | 104 +
 .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c  | 135 +
 19 files changed, 1195 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0314e7587d0..4834bb4b412 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -83,6 +83,25 @@
   }
 )
 
+;; -
+;;  [INT,FP] permutation
+;; -
+;; This is the pattern permutes the vector
+;; -
+
+(define_expand "vec_perm"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand:V 2 "register_operand")
+   (match_operand: 3 &

[PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

Apparently, we are missing vrsub.vi tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.

---
 .../riscv/rvv/autovec/binop/vsub-run.c| 30 ++-
 .../riscv/rvv/autovec/binop/vsub-rv32gcv.c|  1 +
 .../riscv/rvv/autovec/binop/vsub-rv64gcv.c|  1 +
 .../riscv/rvv/autovec/binop/vsub-template.h   | 28 +
 4 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
index 8c6d8e88d1a..4f254872e33 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
@@ -27,6 +27,22 @@
   for (int i = 0; i < SZ; i++) \
 assert (as##TYPE[i] == 999 - VAL);
 
+#define RUN3(TYPE) \
+  TYPE as2##TYPE[SZ];  \
+  for (int i = 0; i < SZ; i++) \
+as2##TYPE[i] = i * 33 - 779;   \
+  vsubi_##TYPE (as2##TYPE, as2##TYPE, SZ); \
+  for (int i = 0; i < SZ; i++) \
+assert (as2##TYPE[i] == (TYPE)(-16 - (i * 33 - 779)));
+
+#define RUN4(TYPE) \
+  TYPE as3##TYPE[SZ];  \
+  for (int i = 0; i < SZ; i++) \
+as3##TYPE[i] = i * -17 + 667;  \
+  vsubi2_##TYPE (as3##TYPE, as3##TYPE, SZ);\
+  for (int i = 0; i < SZ; i++) \
+assert (as3##TYPE[i] == (TYPE)(15 - (i * -17 + 667)));
+
 #define RUN_ALL()  \
  RUN(int16_t, 1)   \
  RUN(uint16_t, 2)  \
@@ -39,7 +55,19 @@
  RUN2(int32_t, 9)  \
  RUN2(uint32_t, 10)\
  RUN2(int64_t, 11) \
- RUN2(uint64_t, 12)
+ RUN2(uint64_t, 12)\
+ RUN3(int16_t) \
+ RUN3(uint16_t)\
+ RUN3(int32_t) \
+ RUN3(uint32_t)\
+ RUN3(int64_t) \
+ RUN3(uint64_t)\
+ RUN4(int16_t) \
+ RUN4(uint16_t)\
+ RUN4(int32_t) \
+ RUN4(uint32_t)\
+ RUN4(int64_t) \
+ RUN4(uint64_t)
 
 int main ()
 {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
index e2bdd0fe904..a0d3802be65 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
@@ -4,3 +4,4 @@
 #include "vsub-template.h"
 
 /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
index f7a2691b9f3..562c026a7e4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
@@ -4,3 +4,4 @@
 #include "vsub-template.h"
 
 /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
index 8c0a9c99217..47f07f13462 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
@@ -16,6 +16,22 @@
   dst[i] = a[i] - b;   \
   }
 
+#define TEST3_TYPE(TYPE)   \
+  __attribute__((noipa))   \
+  void vsubi_##TYPE (TYPE *dst, TYPE *a, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = -16 - a[i]; \
+  }
+
+#define TEST4_TYPE(TYPE)   \
+  __attribute__((noipa))   \
+  void vsubi2_##TYPE (TYPE *dst, TYPE *a, int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = 15 - a[i];  \
+  }
+
 /* *int8_t not autovec currently. */
 #define TEST_ALL() \
  TEST_TYPE(int16_t)\
@@ -30,5 +46,17 @@
  TEST2_TYPE(uint32_t)  \
  TEST2_TYPE(int64_t)   \
  TEST2_TYPE(uint64_t)
+ TEST3_TYPE(int16_t)   \
+ TEST3_TYPE(uint16_t)  \
+ TEST3_TYPE(int32_t)   \
+ TEST3_TYPE(uint32_t)  \
+ TEST3_TYPE(int64_t)   \
+ TEST3_TYPE(uint64_t)  \
+ TEST4_TYPE(int16_t)   \
+ TEST4_TYPE(uint16_t)  \
+ TEST4_TYPE(int32_t)   \
+ TES

[PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching 
support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cd696da5d89..28e7e63ce69 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7180,10 +7180,8 @@
 (match_operand 5 "const_int_operand" "i,i")
 (match_operand 6 "const_int_operand" "i,i")
 (match_operand 7 "const_int_operand" "i,i")
-(match_operand 8 "const_int_operand" "i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (float_extend:VWEXTF
 (match_operand: 3 "register_operand" "   vr,   
vr"))
  (match_operand:VWEXTF 2 "vector_merge_operand"  "   vu,
0")))]
-- 
2.36.1



[PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode 
switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 28e7e63ce69..3c4565dc775 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7159,10 +7159,8 @@
 (match_operand 5 "const_int_operand""i,i")
 (match_operand 6 "const_int_operand""i,i")
 (match_operand 7 "const_int_operand""i,i")
-(match_operand 8 "const_int_operand""i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (any_float:VF
 (match_operand: 3 "register_operand" "   vr,   vr"))
  (match_operand:VF 2 "vector_merge_operand" "   vu,0")))]
-- 
2.36.1



[PATCH] RISC-V: Remove FRM for vfncvt.rod instruction

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3c4565dc775..cd41ebbb24f 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7286,10 +7286,8 @@
 (match_operand 5 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (match_operand 6 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (match_operand 7 "const_int_operand"  "  i,  i,  
i,  i,i,i")
-(match_operand 8 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:
[(float_truncate:
   (match_operand:VWEXTF 3 "register_operand"  "  0,  0,  
0,  0,   vr,   vr"))] UNSPEC_ROD)
-- 
2.36.1



[PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

The approach is quite simple and obvious, changing extension pattern into 
define_insn_and_split
will make combine PASS combine into widen operations naturally.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Change 
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 13 ---
 .../riscv/rvv/autovec/widen/widen-1.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-2.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-3.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-4.c | 23 +
 .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-4.c | 31 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++
 10 files changed, 259 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4834bb4b412..e96de60123b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -401,16 +401,21 @@
 ;; - vsext.vf[2|4|8]
 ;; -
 
-(define_expand "2"
-  [(set (match_operand:VWEXTI 0 "register_operand")
+(define_insn_and_split "2"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=&vr")
 (any_extend:VWEXTI
- (match_operand: 1 "register_operand")))]
+ (match_operand: 1 "register_operand" "vr")))]
   "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
 {
   insn_code icode = code_for_pred_vf2 (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
-})
+}
+  [(set_attr "type" "vext")
+   (set_attr "mode" "")])
 
 (define_expand "2"
   [(set (match_operand:VQEXTI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
new file mode 100644
index 000..00edecab089
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE1, TYPE2)
\
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   
\
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] = (TYPE1) a[i] + (TYPE1) b[i];
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_TYPE (int16_t, int8_t)  
\
+  TEST_TYPE (uint16_t, uint8_t)
\
+  TEST_TYPE (int32_t, int16_t) 
\
+  TEST_TYPE (uint32_t, uint16_t)   
\
+  TEST_TYPE (int64_t, int32_t)   

[PATCH V2] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine 
PASS
;; to combine instructions as below:
;;   vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv

gcc/ChangeLog:

* config/riscv/autovec.md (2): Change 
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 16 ++---
 .../riscv/rvv/autovec/widen/widen-1.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-2.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-3.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-4.c | 23 +
 .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-4.c | 31 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++
 10 files changed, 262 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4834bb4b412..2a21ce3f93c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -401,16 +401,24 @@
 ;; - vsext.vf[2|4|8]
 ;; -
 
-(define_expand "2"
-  [(set (match_operand:VWEXTI 0 "register_operand")
+;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine 
PASS
+;; to combine instructions as below:
+;;   vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv
+(define_insn_and_split "2"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=&vr")
 (any_extend:VWEXTI
- (match_operand: 1 "register_operand")))]
+ (match_operand: 1 "register_operand" "vr")))]
   "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
 {
   insn_code icode = code_for_pred_vf2 (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
-})
+}
+  [(set_attr "type" "vext")
+   (set_attr "mode" "")])
 
 (define_expand "2"
   [(set (match_operand:VQEXTI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
new file mode 100644
index 000..00edecab089
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE1, TYPE2)
\
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   
\
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] = (TYPE1) a[i] + (TYPE1) b[i];
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_TYPE (int16_t, int8_t)  
\
+  TEST_TYPE (uint16_t, uint8_t)  

[PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.


This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin  

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
-insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 &preheader_seq, &header_seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-&iv_step);
+&iv_step, &compare_step);
 
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3



[PATCH V2] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.

Fixed following comments from Robin.
Ok for trunk?

gcc/ChangeLog:

* config/riscv/autovec.md (vec_perm): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.

---
 gcc/config/riscv/autovec.md   |  18 +++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 153 ++
 .../riscv/rvv/autovec/vls-vlmax/perm-1.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-2.c  |  33 
 .../riscv/rvv/autovec/vls-vlmax/perm-3.c  |  29 
 .../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-5.c  |  49 ++
 .../riscv/rvv/autovec/vls-vlmax/perm-6.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-7.c  |  49 ++
 .../riscv/rvv/autovec/vls-vlmax/perm.h|  70 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c  |  32 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c  |  20 +++
 .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c  | 137 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c  | 135 
 19 files changed, 1217 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3a1e1316732..5c3aad7ee44 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -83,6 +83,24 @@
   }
 )
 
+;; -
+;;  [INT,FP] permutation
+;; -
+;; This is the pattern permutes the vector
+;; -
+
+(define_expand "vec_perm"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand:V 2 "regis

[PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

2023-05-31 Thread juzhe . zhong
From: Juzhe-Zhong 

1. This patch optimize the codegen of the following auto-vectorization codes:

void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict 
c, int n)
{
for (int i = 0; i < n; i++)
  c[i] = (int64_t)a[i] + b[i];
}

Combine instruction from:

...
vsext.vf2
vadd.vv
...

into:

...
vwadd.wv
...

Since for PLUS operation, GCC prefer the following RTL operand order when 
combining:

(plus: (sign_extend:..)
   (reg:)

instead of

(plus: (reg:..)
   (sign_extend:)

which is different from MINUS pattern.

I split patterns of vwadd/vwsub, and add dedicated patterns for them.

2. This patch not only optimize the case as above (1) mentioned, also enhance 
vwadd.vv/vwsub.vv
   optimization for complicate PLUS/MINUS codes, consider this following codes:
   
__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int8_t *__restrict a,
  int8_t *__restrict b, int8_t *__restrict a2,
  int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] + (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
}
}

Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v  v2,0(a3)
vle8.v  v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2   v3,v2
vsext.vf2   v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v  v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2   v1,v4
vadd.vv v2,v1,v2
...

After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v  v3,0(a4)
vle8.v  v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vvv2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v  v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vvv4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vvv3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...

The reason why current upstream GCC can not optimize codes using vwadd 
thoroughly is combine PASS 
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then 
base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.

So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv 
intrinsic API expander
* config/riscv/vector.md 
(@pred_single_widen_): Remove it.
(@pred_single_widen_sub): New pattern.
(@pred_single_widen_add): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  |  8 +++--
 gcc/config/riscv/vector.md| 29 +---
 .../riscv/rvv/autovec/widen/widen-5.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-6.c | 27 +++
 .../rvv/autovec/widen/widen-complicate-1.c| 31 +
 .../rvv/autovec/widen/widen-complicate-2.c| 31 +
 .../riscv/rvv/autovec/widen/widen_run-5.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-6.c | 34 +++
 8 files changed, 215 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index a8113f6602b..3f92084929d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -361,8 +361,12 @@ public:
return e.use_exact_insn (
  code_for_pred_dual_widen_scalar (CODE1, CODE2, e.vector_mode ()));
   case OP_TYPE_wv:
-   return e.use_exact_insn (
- code_for_pred_single_widen (CODE1, CODE2, e.vecto

[PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.


This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin  

  PR tree-optimization/109971

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
-insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 &preheader_seq, &header_seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-&iv_step);
+&iv_step, &compare_step);
 
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3



[PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int16_t *__restrict dst4,
  int8_t *__restrict a, int8_t *__restrict b,
  int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] * (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
  dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}

In such complicate case, the operand is not single used, used by multiple 
statements.
GCC combine optimization will iterate the combination of the operands.

First round -> combine one of the operand and change vsext + vmul into vwmul.wv
Second round -> combine the other operand and change vwmul.wv into vwmul.vv

Notice when I add a pseudo vwmul.wv pattern, it makes vwmulsu.vv testcase fail
since GCC prefer such pattern order:

(mul: (zero_extend)
  (sign_exted))

So change vwmulsu.vv instruction operands order.

gcc/ChangeLog:

* config/riscv/vector.md: Shift zero_extend and sign_extend order.
* config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 56 +++
 gcc/config/riscv/vector.md|  9 +--
 .../riscv/rvv/autovec/widen/widen-7.c | 27 +
 .../rvv/autovec/widen/widen-complicate-3.c| 32 +++
 .../riscv/rvv/autovec/widen/widen_run-7.c | 34 +++
 5 files changed, 154 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/riscv/autovec-opt.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 000..5b7dc9bef8c
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,56 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul"
+  [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
+   (if_then_else:VWEXTI
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
+(match_operand 5 "vector_length_operand"  "   rK,   
rK")
+(match_operand 6 "const_int_operand"  "i,
i")
+(match_operand 7 "const_int_operand"  "i,
i")
+(match_operand 8 "const_int_operand"  "i,
i")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 4 "register_operand" "   vr,   
vr"))
+   (match_operand:VWEXTI 3 "register_operand" "   vr,   
vr"))
+ (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,
0")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vf2 (, mode);
+rtx tmp = gen_reg_rtx (mode);
+rtx ops[] = {tmp, operands[4]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+emit_

[PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int16_t *__restrict dst4,
  int8_t *__restrict a, int8_t *__restrict b,
  int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] * (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
  dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}

In such complicate case, the operand is not single used, used by multiple 
statements.
GCC combine optimization will iterate the combination of the operands.

Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv 
optimization.
Currently, we have format:

(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend)) 

To handle this following cases (sign and unsigned widening multiplication 
mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int16_t *__restrict dst4,
  int8_t *__restrict a, uint8_t *__restrict b,
  uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] * (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
  dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}

Before this patch:

...
   vsetvli zero,t1,e8,m1,ta,ma
vle8.v  v1,0(a4)
vsetvli t3,zero,e16,m2,ta,ma
vsext.vf2   v6,v1
vsetvli zero,t1,e8,m1,ta,ma
vle8.v  v1,0(a5)
vsetvli t3,zero,e16,m2,ta,ma
add t0,a0,t4
vzext.vf2   v4,v1
vmul.vv v2,v4,v6
vsetvli zero,t1,e16,m2,ta,ma
vse16.v v2,0(t0)
vle8.v  v1,0(a6)
vsetvli t3,zero,e16,m2,ta,ma
add t0,a1,t4
vzext.vf2   v2,v1
vmul.vv v4,v2,v4
vsetvli zero,t1,e16,m2,ta,ma
vse16.v v4,0(t0)
vsetvli t3,zero,e16,m2,ta,ma
add t0,a2,t4
vmul.vv v2,v2,v6
vsetvli zero,t1,e16,m2,ta,ma
vse16.v v2,0(t0)
add t0,a3,t4
vle8.v  v1,0(a7)
vsetvli t3,zero,e16,m2,ta,ma
sub t6,t6,t1
vsext.vf2   v2,v1
vmul.vv v2,v2,v6
vsetvli zero,t1,e16,m2,ta,ma
vse16.v v2,0(t0)
...

After this patch:
...
  vsetvli zero,t1,e8,mf2,ta,ma
vle8.v  v1,0(a4)
vle8.v  v3,0(a5)
vsetvli t6,zero,e8,mf2,ta,ma
add t0,a0,t3
vwmulsu.vv  v2,v1,v3
vsetvli zero,t1,e16,m1,ta,ma
vse16.v v2,0(t0)
vle8.v  v2,0(a6)
vsetvli t6,zero,e8,mf2,ta,ma
add t0,a1,t3
vwmulu.vv   v4,v3,v2
vsetvli zero,t1,e16,m1,ta,ma
vse16.v v4,0(t0)
vsetvli t6,zero,e8,mf2,ta,ma
add t0,a2,t3
vwmulsu.vv  v3,v1,v2
vsetvli zero,t1,e16,m1,ta,ma
vse16.v v3,0(t0)
add t0,a3,t3
vle8.v  v3,0(a7)
vsetvli t6,zero,e8,mf2,ta,ma
sub t4,t4,t1
vwmul.vvv2,v1,v3
vsetvli zero,t1,e16,m1,ta,ma
vse16.v v2,0(t0)
...

gcc/ChangeLog:

* config/riscv/vector.md: Add vector-opt.md.
* config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 80 +++
 gcc/config/riscv/vector.md|  3 +-
 .../riscv/rvv/autovec/widen/widen-7.c | 27 +++
 .../rvv/autovec/widen/widen-complicate-3.c| 32 
 .../rvv/autovec/widen/widen-complicate-4.c| 31 +++
 .../riscv/rvv/autovec/widen/widen_run-7.c | 34 
 6 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/autovec-opt.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 000..92cdc4e9a16
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,80 @@
+;; Machine description for optimizat

[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233

Add _mu C++ overloaded intrinsics for load && viota && vid.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded 
intrinsics.

---
 gcc/config/riscv/riscv-vector-builtins-bases.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index a8113f6602b..498c6ba042e 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -164,7 +164,7 @@ public:
   {
 if (STORE_P || LST_TYPE == LST_INDEXED)
   return true;
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -963,7 +963,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -979,7 +979,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -1749,7 +1749,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -1794,7 +1794,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
-- 
2.36.1



[PATCH] RISC-V: Add __RISCV_ prefix to VXRM and FRM enum

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

According to doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

Add __RISCV_ prefix to VXRM and FRM enum.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add 
__RISCV_ prefix.
(DEF_RVV_FRM_ENUM): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-10.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-11.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-12.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-6.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-7.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-8.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.

---
 gcc/config/riscv/riscv-vector-builtins.cc |  8 
 gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c   | 10 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c  |  8 
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c |  8 
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-11.c |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-12.c |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c  |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c  |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c  |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c  |  8 
 10 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 43bf6d8f262..9e6dae98a6d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4026,11 +4026,11 @@ register_vxrm ()
 {
   auto_vec values;
 #define DEF_RVV_VXRM_ENUM(NAME, VALUE) 
 \
-  values.quick_push (string_int_pair ("VXRM_" #NAME, VALUE));
+  values.quick_push (string_int_pair ("__RISCV_VXRM_" #NAME, VALUE));
 #include "riscv-vector-builtins.def"
 #undef DEF_RVV_VXRM_ENUM
 
-  lang_hooks.types.simulate_enum_decl (input_location, "RVV_VXRM", &values);
+  lang_hooks.types.simulate_enum_decl (input_location, "__RISCV_VXRM", 
&values);
 }
 
 /* Register the frm enum.  */
@@ -4039,11 +4039,11 @@ register_frm ()
 {
   auto_vec values;
 #define DEF_RVV_FRM_ENUM(NAME, VALUE)  
\
-  values.quick_push (string_int_pair ("FRM_" #NAME, VALUE));
+  values.quick_push (string_int_pair ("__RISCV_FRM_" #NAME, VALUE));
 #include "riscv-vector-builtins.def"
 #undef DEF_RVV_FRM_ENUM
 
-  lang_hooks.types.simulate_enum_decl (input_location, "RVV_FRM", &values);
+  lang_hooks.types.simulate_enum_decl (input_location, "__RISCV_FRM", &values);
 }
 
 /* Implement #pragma riscv intrinsic vector.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c
index f5635fb959e..ff19c8bc089 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c
@@ -5,27 +5,27 @@
 
 size_t f0 ()
 {
-  return FRM_RNE;
+  return __RISCV_FRM_RNE;
 }
 
 size_t f1 ()
 {
-  return FRM_RTZ;
+  return __RISCV_FRM_RTZ;
 }
 
 size_t f2 ()
 {
-  return FRM_RDN;
+  return __RISCV_FRM_RDN;
 }
 
 size_t f3 ()
 {
-  return FRM_RUP;
+  return __RISCV_FRM_RUP;
 }
 
 size_t f4 ()
 {
-  return FRM_RMM;
+  return __RISCV_FRM_RMM;
 }
 
 /* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*0} 1} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
index 0d364787ad0..b0ed27b0520 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
@@ -5,22 +5,22 @@
 
 size_t f0 ()
 {
-  return VXRM_RNU;
+  return __RISCV_VXRM_RNU;
 }
 
 size_t f1 ()
 {
-  return VXRM_RNE;
+  return __RISCV_VXRM_RNE;
 }
 
 size_t f2 ()
 {
-  return VXRM_RDN;
+  return __RISCV_VXRM_RDN;
 }
 
 size_t f3 ()
 {
-  return VXRM_ROD;
+  return __RISCV_VXRM_ROD;
 }
 
 /* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*0} 1} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
index a707aa1645e..3c7872bb73d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
@@ -8,16 +8,16 @@ void f (void * in, void *out, int32_t x, int n, int m)
   for (int i = 0; i < n; i++) {
 vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
 vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
-vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
-v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
+vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, __RISCV_VXR

[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233

Add _mu C++ overloaded intrinsics for load && viota && vid.

Co-authored-by: KuanLin Chen 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded 
intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): 
Ditto.

---
 gcc/config/riscv/riscv-vector-builtins-bases.cc | 17 +++--
 .../riscv/riscv-vector-builtins-shapes.cc   |  5 ++---
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 3f92084929d..09870c327fa 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -164,7 +164,7 @@ public:
   {
 if (STORE_P || LST_TYPE == LST_INDEXED)
   return true;
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -967,7 +967,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -983,7 +983,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -1715,6 +1715,11 @@ public:
 return CP_READ_MEMORY | CP_WRITE_CSR;
   }
 
+  bool can_be_overloaded_p (enum predication_type_index pred) const override
+  {
+return pred != PRED_TYPE_none;
+  }
+
   gimple *fold (gimple_folder &f) const override
   {
 return fold_fault_load (f);
@@ -1753,7 +1758,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -1798,7 +1803,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -1888,7 +1893,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   gimple *fold (gimple_folder &f) const override
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 76262f07ce4..c8daae01f91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -550,9 +550,8 @@ struct fault_load_def : public build_base
   char *get_name (function_builder &b, const function_instance &instance,
  bool overloaded_p) const override
   {
-if (overloaded_p)
-  if (instance.pred == PRED_TYPE_none || instance.pred == PRED_TYPE_mu)
-   return nullptr;
+if (overloaded_p && !instance.base->can_be_overloaded_p (instance.pred))
+  return nullptr;
 tree type = builtin_types[instance.type.index].vector;
 machine_mode mode = TYPE_MODE (type);
 int sew = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
-- 
2.36.1



[PATCH V2] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233

Add _mu C++ overloaded intrinsics for load && viota && vid.

Co-authored-by: KuanLin Chen 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded 
intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): 
Ditto.

---
 gcc/config/riscv/riscv-vector-builtins-bases.cc | 17 +++--
 .../riscv/riscv-vector-builtins-shapes.cc   |  5 ++---
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 3f92084929d..09870c327fa 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -164,7 +164,7 @@ public:
   {
 if (STORE_P || LST_TYPE == LST_INDEXED)
   return true;
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -967,7 +967,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -983,7 +983,7 @@ public:
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
 return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
-  || pred == PRED_TYPE_tumu;
+  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
   }
 
   rtx expand (function_expander &e) const override
@@ -1715,6 +1715,11 @@ public:
 return CP_READ_MEMORY | CP_WRITE_CSR;
   }
 
+  bool can_be_overloaded_p (enum predication_type_index pred) const override
+  {
+return pred != PRED_TYPE_none;
+  }
+
   gimple *fold (gimple_folder &f) const override
   {
 return fold_fault_load (f);
@@ -1753,7 +1758,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -1798,7 +1803,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   rtx expand (function_expander &e) const override
@@ -1888,7 +1893,7 @@ public:
 
   bool can_be_overloaded_p (enum predication_type_index pred) const override
   {
-return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
+return pred != PRED_TYPE_none;
   }
 
   gimple *fold (gimple_folder &f) const override
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 76262f07ce4..c8daae01f91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -550,9 +550,8 @@ struct fault_load_def : public build_base
   char *get_name (function_builder &b, const function_instance &instance,
  bool overloaded_p) const override
   {
-if (overloaded_p)
-  if (instance.pred == PRED_TYPE_none || instance.pred == PRED_TYPE_mu)
-   return nullptr;
+if (overloaded_p && !instance.base->can_be_overloaded_p (instance.pred))
+  return nullptr;
 tree type = builtin_types[instance.type.index].vector;
 machine_mode mode = TYPE_MODE (type);
 int sew = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
-- 
2.36.1



[PATCH] RISC-V: Fix warning in predicated.md

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool 
arith_operand_or_mode_mask(rtx, machine_mode)???:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
 (match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
 || INTVAL (op) == GET_MODE_MASK (SImode)"

gcc/ChangeLog:

* config/riscv/predicates.md: Change INTVAL into UINTVAL.

---
 gcc/config/riscv/predicates.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 1ed84850e35..d14b1ca30bb 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -31,7 +31,7 @@
   (ior (match_operand 0 "arith_operand")
(and (match_code "const_int")
 (match_test "INTVAL (op) == GET_MODE_MASK (HImode)
-|| INTVAL (op) == GET_MODE_MASK (SImode)"
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
 
 (define_predicate "lui_operand"
   (and (match_code "const_int")
-- 
2.36.1



[PATCH] RISC-V: Optimize reverse series index vector

2023-06-01 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, , 0]

Before this patch:
vid
vmul
vadd

After this patch:
vid
vrsub

This patch is an obvious and simple optimization, ok for trunk?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series 
index vector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.

---
 gcc/config/riscv/riscv-v.cc | 17 +
 .../riscv/rvv/autovec/vls-vlmax/perm-4.c|  2 ++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 1cd3bd3438e..75cf00b7eba 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -530,6 +530,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   machine_mode mode = GET_MODE (dest);
   machine_mode mask_mode;
   gcc_assert (get_mask_mode (mode).exists (&mask_mode));
+  poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
+  poly_int64 value;
 
   /* VECT_IV = BASE + I * STEP.  */
 
@@ -545,6 +547,21 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   rtx step_adj;
   if (rtx_equal_p (step, const1_rtx))
 step_adj = vid;
+  else if (rtx_equal_p (step, constm1_rtx) && poly_int_rtx_p (base, &value)
+  && known_eq (nunits_m1, value))
+{
+  /* Special case:
+  {nunits - 1, nunits - 2, ... , 0}.
+  nunits can be either const_int or const_poly_int.
+
+Code sequence:
+  vid.v v
+  vrsub nunits - 1, v.  */
+  rtx ops[] = {dest, vid, gen_int_mode (nunits_m1, GET_MODE_INNER (mode))};
+  insn_code icode = code_for_pred_sub_reverse_scalar (mode);
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
+  return;
+}
   else
 {
   step_adj = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
index 179c8274a92..aa328810c30 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
@@ -56,3 +56,5 @@
 TEST_ALL (PERMUTE)
 
 /* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */
+/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */
+/* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */
-- 
2.36.1



[PATCH V2] RISC-V: Fix warning in predicated.md

2023-06-02 Thread juzhe . zhong
From: Juzhe-Zhong 

Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool 
arith_operand_or_mode_mask(rtx, machine_mode):
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
 (match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison 
between signed and unsigned integer expressions [-Wsign-compare]
 || INTVAL (op) == GET_MODE_MASK (SImode)"

gcc/ChangeLog:

* config/riscv/predicates.md: Change INTVAL into UINTVAL.

---
 gcc/config/riscv/predicates.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index d14b1ca30bb..04ca6ceabc7 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -30,7 +30,7 @@
 (define_predicate "arith_operand_or_mode_mask"
   (ior (match_operand 0 "arith_operand")
(and (match_code "const_int")
-(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
+(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
 || UINTVAL (op) == GET_MODE_MASK (SImode)"
 
 (define_predicate "lui_operand"
-- 
2.36.1



[PATCH] RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong 

PR target/110109

This patch is to fix PR110109 issue. This issue happens is because:

(define_insn_and_split "*vlmul_extx2"
  [(set (match_operand: 0 "register_operand"  "=vr, ?&vr")
   (subreg:
 (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
  "TARGET_VECTOR"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  emit_insn (gen_rtx_SET (gen_lowpart (mode, operands[0]), operands[1]));
  DONE;
})

Such pattern generate such codes in insn-recog.cc:
static int
pattern57 (rtx x1)
{
  rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0];
  rtx x2;
  int res ATTRIBUTE_UNUSED;
  if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0))
return -1;
...

PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable
RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant ()

I create that patterns is to optimize the following test:
vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
  return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
}

codegen:
test_vlmul_ext_v_f32mf2_f32m2:
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.i v2,0
vsetvli a5,zero,e32,mf2,ta,ma
vle32.v v2,0(a1)
vs2r.v  v2,0(a0)
ret

There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike 
LLVM, LLVM has undef/poison).
For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into register. 
However, I think it's not
a big issue after we support subreg livness tracking.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
* config/riscv/vector.md (@vlmul_extx2): Remove it.
(@vlmul_extx4): Ditto.
(@vlmul_extx8): Ditto.
(@vlmul_extx16): Ditto.
(@vlmul_extx32): Ditto.
(@vlmul_extx64): Ditto.
(*vlmul_extx2): Ditto.
(*vlmul_extx4): Ditto.
(*vlmul_extx8): Ditto.
(*vlmul_extx16): Ditto.
(*vlmul_extx32): Ditto.
(*vlmul_extx64): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110109-1.c: New test.
* gcc.target/riscv/rvv/base/pr110109-2.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  |  28 +-
 gcc/config/riscv/vector.md| 120 -
 .../gcc.target/riscv/rvv/base/pr110109-1.c|  40 ++
 .../gcc.target/riscv/rvv/base/pr110109-2.c| 485 ++
 4 files changed, 529 insertions(+), 144 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 09870c327fa..87a684dd127 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1565,30 +1565,10 @@ public:
 
   rtx expand (function_expander &e) const override
   {
-e.add_input_operand (0);
-switch (e.op_info->ret.base_type)
-  {
-  case RVV_BASE_vlmul_ext_x2:
-   return e.generate_insn (
- code_for_vlmul_extx2 (e.vector_mode ()));
-  case RVV_BASE_vlmul_ext_x4:
-   return e.generate_insn (
- code_for_vlmul_extx4 (e.vector_mode ()));
-  case RVV_BASE_vlmul_ext_x8:
-   return e.generate_insn (
- code_for_vlmul_extx8 (e.vector_mode ()));
-  case RVV_BASE_vlmul_ext_x16:
-   return e.generate_insn (
- code_for_vlmul_extx16 (e.vector_mode ()));
-  case RVV_BASE_vlmul_ext_x32:
-   return e.generate_insn (
- code_for_vlmul_extx32 (e.vector_mode ()));
-  case RVV_BASE_vlmul_ext_x64:
-   return e.generate_insn (
- code_for_vlmul_extx64 (e.vector_mode ()));
-  default:
-   gcc_unreachable ();
-  }
+tree arg = CALL_EXPR_ARG (e.exp, 0);
+rtx src = expand_normal (arg);
+emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src));
+return e.target;
   }
 };
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 79f1644732a..2496eff7874 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -498,126 +498,6 @@
   }
 )
 
-(define_expand "@vlmul_extx2"
-  [(set (match_operand: 0 "register_operand")
-   (subreg:
- (match_operand:VLMULEXT2 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx4"
-  [(set (match_operand: 0 "register_operand")
-   (subreg:
- (match_operand:VLMULEXT4 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx8"
-  [(set (match_operand: 0 "register_operand")
-   (subreg:
- (match_operand:VLMULEXT8 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-

[NFC] RISC-V: Reorganize riscv-v.cc

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is just reorganizing the functions for the following patch.

I put rvv_builder and emit_* functions located before expand_const_vector
function since I will use them in expand_const_vector in the following patch.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Reorganize functions.
(rvv_builder::can_duplicate_repeating_sequence_p): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(rvv_builder::get_merge_scalar_mask): Ditto.
(emit_scalar_move_insn): Ditto.
(emit_vlmax_integer_move_insn): Ditto.
(emit_nonvlmax_integer_move_insn): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(get_repeating_sequence_dup_machine_mode): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 497 ++--
 1 file changed, 249 insertions(+), 248 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 75cf00b7eba..fa13bd94f9d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -239,6 +239,165 @@ private:
   expand_operand m_ops[MAX_OPERANDS];
 };
 
+
+class rvv_builder : public rtx_vector_builder
+{
+public:
+  rvv_builder () : rtx_vector_builder () {}
+  rvv_builder (machine_mode mode, unsigned int npatterns,
+  unsigned int nelts_per_pattern)
+: rtx_vector_builder (mode, npatterns, nelts_per_pattern)
+  {
+m_inner_mode = GET_MODE_INNER (mode);
+m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
+m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
+
+gcc_assert (
+  int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
+  }
+
+  bool can_duplicate_repeating_sequence_p ();
+  rtx get_merged_repeating_sequence ();
+
+  bool repeating_sequence_use_merge_profitable_p ();
+  rtx get_merge_scalar_mask (unsigned int) const;
+
+  machine_mode new_mode () const { return m_new_mode; }
+  scalar_mode inner_mode () const { return m_inner_mode; }
+  scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
+  unsigned int inner_bits_size () const { return m_inner_bits_size; }
+  unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
+
+private:
+  scalar_mode m_inner_mode;
+  scalar_int_mode m_inner_int_mode;
+  machine_mode m_new_mode;
+  scalar_int_mode m_new_inner_mode;
+  unsigned int m_inner_bits_size;
+  unsigned int m_inner_bytes_size;
+};
+
+/* Return true if the vector duplicated by a super element which is the fusion
+   of consecutive elements.
+
+ v = { a, b, a, b } super element = ab, v = { ab, ab }  */
+bool
+rvv_builder::can_duplicate_repeating_sequence_p ()
+{
+  poly_uint64 new_size = exact_div (full_nelts (), npatterns ());
+  unsigned int new_inner_size = m_inner_bits_size * npatterns ();
+  if (!int_mode_for_size (new_inner_size, 0).exists (&m_new_inner_mode)
+  || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
+  || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
+return false;
+  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+}
+
+/* Return true if it is a repeating sequence that using
+   merge approach has better codegen than using default
+   approach (slide1down).
+
+   Sequence A:
+ {a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b}
+
+   nelts = 16
+   npatterns = 2
+
+   for merging a we need mask 101010
+   for merging b we need mask 010101
+
+   Foreach element in the npattern, we need to build a mask in scalar register.
+   Mostely we need 3 instructions (aka COST = 3), which is consist of 2 scalar
+   instruction and 1 scalar move to v0 register.  Finally we need vector merge
+   to merge them.
+
+   lui a5, #imm
+   add a5, #imm
+   vmov.s.xv0, a5
+   vmerge.vxm  v9, v9, a1, v0
+
+   So the overall (roughly) COST of Sequence A = (3 + 1) * npatterns = 8.
+   If we use slide1down, the COST = nelts = 16 > 8 (COST of merge).
+   So return true in this case as it is profitable.
+
+   Sequence B:
+ {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h}
+
+   nelts = 16
+   npatterns = 8
+
+   COST of merge approach = (3 + 1) * npatterns = 24
+   COST of slide1down approach = nelts = 16
+   Return false in this case as it is NOT profitable in merge approach.
+*/
+bool
+rvv_builder::repeating_sequence_use_merge_profitable_p ()
+{
+  if (inner_bytes_size () > UNITS_PER_WORD)
+return false;
+
+  unsigned int nelts = full_nelts ().to_constant ();
+
+  if (!repeating_sequence_p (0, nelts, npatterns ()))
+return false;
+
+  unsigned int merge_cost = 1;
+  unsigned int build_merge_mask_cost = 3;
+  unsigned int slide1down_cost = nelts;
+
+  return (build_merge_mask_cost + merge_cost) * npatterns () < slide1down_cost;
+}
+
+/* Merge the repeating sequence into a single element and return the RT

[PATCH] RISC-V: Split arguments of expand_vec_perm

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong 

Since the following patch will calls expand_vec_perm with
splitted arguments, change the expand_vec_perm interface in
this patch.

gcc/ChangeLog:

* config/riscv/autovec.md: Split arguments.
* config/riscv/riscv-protos.h (expand_vec_perm): Ditto.
* config/riscv/riscv-v.cc (expand_vec_perm): Ditto.

---
 gcc/config/riscv/autovec.md | 3 ++-
 gcc/config/riscv/riscv-protos.h | 2 +-
 gcc/config/riscv/riscv-v.cc | 6 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5c3aad7ee44..ec038fe87cd 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -96,7 +96,8 @@
(match_operand: 3 "vector_perm_operand")]
   "TARGET_VECTOR && GET_MODE_NUNITS (mode).is_constant ()"
   {
-riscv_vector::expand_vec_perm (operands);
+riscv_vector::expand_vec_perm (operands[0], operands[1],
+  operands[2], operands[3]);
 DONE;
   }
 )
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d032f569a36..00e1b20c6c6 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -241,7 +241,7 @@ opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
 void expand_vec_init (rtx, rtx);
 void expand_vcond (rtx *);
-void expand_vec_perm (rtx *);
+void expand_vec_perm (rtx, rtx, rtx, rtx);
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum vxrm_field_enum
 {
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fa13bd94f9d..49752cd8899 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2025,12 +2025,8 @@ expand_vcond (rtx *ops)
 /* Implement vec_perm.  */
 
 void
-expand_vec_perm (rtx *operands)
+expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
 {
-  rtx target = operands[0];
-  rtx op0 = operands[1];
-  rtx op1 = operands[2];
-  rtx sel = operands[3];
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
 
-- 
2.36.3



[NFC] RISC-V: Move optimization patterns into autovec-opt.md

2023-06-04 Thread juzhe . zhong
From: Juzhe-Zhong 

Move all optimization patterns into autovec-opt.md to make organization
easier maintain.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*not): Move to 
autovec-opt.md.
(*n): Ditto.
* config/riscv/autovec.md (*not): Ditto.
(*n): Ditto.
* config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/autovec-opt.md | 92 +
 gcc/config/riscv/autovec.md | 52 ---
 gcc/config/riscv/vector.md  | 39 --
 3 files changed, 92 insertions(+), 91 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 92cdc4e9a16..f6052b50572 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -78,3 +78,95 @@
   "vwmulsu.vv\t%0,%3,%4%p1"
   [(set_attr "type" "viwmul")
(set_attr "mode" "")])
+
+;; 
-
+;;  Integer Compare Instructions Simplification
+;; 
-
+;; Simplify OP(V, V) Instructions to VMCLR.m Includes:
+;; - 1.  VMSNE
+;; - 2.  VMSLT
+;; - 3.  VMSLTU
+;; - 4.  VMSGT
+;; - 5.  VMSGTU
+;; 
-
+;; Simplify OP(V, V) Instructions to VMSET.m Includes:
+;; - 1.  VMSEQ
+;; - 2.  VMSLE
+;; - 3.  VMSLEU
+;; - 4.  VMSGE
+;; - 5.  VMSGEU
+;; 
-
+
+(define_split
+  [(set (match_operand:VB  0 "register_operand")
+   (if_then_else:VB
+ (unspec:VB
+   [(match_operand:VB 1 "vector_all_trues_mask_operand")
+(match_operand4 "vector_length_operand")
+(match_operand5 "const_int_operand")
+(match_operand6 "const_int_operand")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (match_operand:VB3 "vector_move_operand")
+ (match_operand:VB2 "vector_undef_operand")))]
+  "TARGET_VECTOR"
+  [(const_int 0)]
+  {
+emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
+RVV_VUNDEF (mode), operands[3],
+operands[4], operands[5]));
+DONE;
+  }
+)
+
+;; -
+;;  [BOOL] Binary logical operations (inverted second input)
+;; -
+;; Includes:
+;; - vmandnot.mm
+;; - vmornot.mm
+;; -
+
+(define_insn_and_split "*not"
+  [(set (match_operand:VB 0 "register_operand"   "=vr")
+   (bitmanip_bitwise:VB
+ (not:VB (match_operand:VB 2 "register_operand" " vr"))
+ (match_operand:VB 1 "register_operand" " vr")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_not (, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+DONE;
+  }
+  [(set_attr "type" "vmalu")
+   (set_attr "mode" "")])
+
+;; -
+;;  [BOOL] Binary logical operations (inverted result)
+;; -
+;; Includes:
+;; - vmnand.mm
+;; - vmnor.mm
+;; - vmxnor.mm
+;; -
+
+(define_insn_and_split "*n"
+  [(set (match_operand:VB 0 "register_operand" "=vr")
+   (not:VB
+ (any_bitwise:VB
+   (match_operand:VB 1 "register_operand" " vr")
+   (match_operand:VB 2 "register_operand" " vr"]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_n (, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+DONE;
+  }
+  [(set_attr "type" "vmalu")
+   (set_attr "mode" "")])
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ec038fe87cd..9f4492db23c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -229,58 +229,6 @@
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
 
-;; -
-;;  [BOOL] Binary logical operations (inverted second

[PATCH V2] VECT: Add SELECT_VL support

2023-06-04 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch address comments from Richard and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after_incr);
+ tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+  index_before_incr, nitems_step);
+ gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
+   }
+  else
+   {
+ create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
+ gimple_seq_add_stmt (header_seq,
+  gimple_build_assign (step, MIN_EXPR,
+   index_before_incr,
+   nitems_step));
+   }
   *iv_step = step;
   *compare_step = nitems_step;
-  return index_before_incr;
+  return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr
+  : index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
-  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
 {
   gcc_assert (compare_step);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..68c3432c0a4 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,53 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
 
+  /* If we're using decrement IV approach in loop control, we can use output of
+ SELECT_VL to adjust IV of loop control and data reference when it 
satisfies
+ the following checks:
+
+ (a) SELECT_VL is supported by the target.
+ (b) LOOP_VINFO is single-rgroup control.
+ (c) non-SLP.
+ (d) LOOP can not be unrolled.
+
+ Otherwise, we use MIN_EXPR approach.
+
+ 1. We only apply SELECT_VL on single-rgroup since:
+
+ (1). Multiple-rgroup controls N vector loads/stores would need N pointer
+ updates by variable amounts.
+ (2). SELECT_VL allows flexible length (<=VF) in each iteration.
+ (3). For decrement IV approach, we calculate the MAX length of the loop
+ and then deduce the length of each control from this MAX length.
+
+ Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
+ multiple-rgroup control, we need to generate multiple SELECT_VL to
+ carefully adjust length of each control. Such approach is very inefficient
+ and unprofitable for targets that are using a standalone instruction
+ to configure the length of each 

[PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford

This patch address comments from Richard and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i

---
 gcc/doc/md.texi | 22 
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 -
 gcc/tree-vect-loop.cc   | 72 +
 gcc/tree-vect-stmts.cc  | 66 ++
 gcc/tree-vectorizer.h   |  6 
 7 files changed, 191 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after_incr);
+ tree len = gimple_build (header_seq, IFN_SELEC

[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-05 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8 + 0] = b[i * 8 + 7] + 1;
  a[i * 8 + 1] = b[i * 8 + 7] + 2;
  a[i * 8 + 2] = b[i * 8 + 7] + 8;
  a[i * 8 + 3] = b[i * 8 + 7] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 7] + 6;
  a[i * 8 + 6] = b[i * 8 + 7] + 7;
  a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}

To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v   v4
vsrl.vi v4,v4,3
li  a3,8
vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 
8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li  t1,67633152
addit1,t1,513
li  a3,50790400
addia3,a3,1541
sllia3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v  v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v  v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv  a3,a5
add a5,a5,t1
bgtua3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
since "vrgatherei16.vv" can cover larger
  range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw   a4,a5,1
sb  a4,792(a0)
addiw   a4,a5,2
sb  a4,793(a0)
addiw   a4,a5,8
sb  a4,794(a0)
addiw   a4,a5,4
sb  a4,795(a0)
addiw   a4,a5,5
sb  a4,796(a0)
addiw   a4,a5,6
sb  a4,797(a0)
addiw   a4,a5,7
sb  a4,798(a0)
addiw   a5,a5,3
sb  a5,799(a0)
ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/

[PATCH] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
  int n)
{
  for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}

Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..

After this patch:
...
vwmaccsu.vv
...

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_fma): New pattern.
(*single_mult_plus): Ditto.
(*double_mult_plus): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 162 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/rvv/autovec/widen/widen-8.c |  27 +++
 .../riscv/rvv/autovec/widen/widen-9.c |  23 +++
 .../rvv/autovec/widen/widen-complicate-5.c|  32 
 .../rvv/autovec/widen/widen-complicate-6.c|  30 
 .../riscv/rvv/autovec/widen/widen_run-8.c |  38 
 .../riscv/rvv/autovec/widen/widen_run-9.c |  35 
 8 files changed, 348 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f6052b50572..b18783b22eb 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -170,3 +170,165 @@
   }
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
+
+;; =
+;; == Widening Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VWMACC
+;; -
+;; Includes:
+;; - vwmacc.vv
+;; - vwmaccu.vv
+;; -
+
+;; Combine ext + ext + fma ===> widen fma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
+;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
+;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
+(define_insn_and_split "*_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (any_extend:VWEXTI
+ (match_operand: 3 "register_operand")))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus 
(, mode),
+  riscv_vector::RVV_WIDEN_TERNOP, ops);
+DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "")])
+
+;; Enhance the combine optimizations.
+(define_insn_and_split "*single_mult_plus"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (match_operand:VWEXTI 3 "register_operand"))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vf2 (, mode);
+rtx tmp = gen_reg_rtx (mode);
+rtx ext_ops[] = {tmp, operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+rtx dst = expand_ternary_op (mode, fma_optab, tmp,

[PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

Fix according to comments from Robin of V1 patch.

This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
  int n)
{
  for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}

Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..

After this patch:
...
vwmaccsu.vv
...

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_fma): New pattern.
(*single_mult_plus): Ditto.
(*double_mult_plus): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 162 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/rvv/autovec/widen/widen-8.c |  27 +++
 .../riscv/rvv/autovec/widen/widen-9.c |  23 +++
 .../rvv/autovec/widen/widen-complicate-5.c|  32 
 .../rvv/autovec/widen/widen-complicate-6.c|  30 
 .../riscv/rvv/autovec/widen/widen_run-8.c |  38 
 .../riscv/rvv/autovec/widen/widen_run-9.c |  35 
 8 files changed, 348 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f6052b50572..1c36b5f56be 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -170,3 +170,165 @@
   }
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
+
+;; =
+;; == Widening Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VWMACC
+;; -
+;; Includes:
+;; - vwmacc.vv
+;; - vwmaccu.vv
+;; -
+
+;; Combine ext + ext + fma ===> widen fma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
+;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
+;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
+(define_insn_and_split "*_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (any_extend:VWEXTI
+ (match_operand: 3 "register_operand")))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus 
(, mode),
+  riscv_vector::RVV_WIDEN_TERNOP, ops);
+DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "")])
+
+;; This helps to match ext + fma to enhance the combine optimizations.
+(define_insn_and_split "*single_mult_plus"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (match_operand:VWEXTI 3 "register_operand"))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vf2 (, mode);
+rtx tmp = gen_reg_rtx (mode);
+rtx ext_ops[] = {tmp, operands[2]};
+riscv_vector::emit_vlmax_insn (icode,

[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8 + 0] = b[i * 8 + 7] + 1;
  a[i * 8 + 1] = b[i * 8 + 7] + 2;
  a[i * 8 + 2] = b[i * 8 + 7] + 8;
  a[i * 8 + 3] = b[i * 8 + 7] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 7] + 6;
  a[i * 8 + 6] = b[i * 8 + 7] + 7;
  a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}

To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v   v4
vsrl.vi v4,v4,3
li  a3,8
vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 
8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li  t1,67633152
addit1,t1,513
li  a3,50790400
addia3,a3,1541
sllia3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v  v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v  v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv  a3,a5
add a5,a5,t1
bgtua3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
since "vrgatherei16.vv" can cover larger
  range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw   a4,a5,1
sb  a4,792(a0)
addiw   a4,a5,2
sb  a4,793(a0)
addiw   a4,a5,8
sb  a4,794(a0)
addiw   a4,a5,4
sb  a4,795(a0)
addiw   a4,a5,5
sb  a4,796(a0)
addiw   a4,a5,6
sb  a4,797(a0)
addiw   a4,a5,7
sb  a4,798(a0)
addiw   a5,a5,3
sb  a5,799(a0)
ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/

[PATCH] RISC-V: Enable SELECT_VL for RVV

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/autovec.md (select_vl): New pattern.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export 
global.
* config/riscv/riscv-v.cc (force_vector_length_operand): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Adapt test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: New test.

---
 gcc/config/riscv/autovec.md   | 19 +
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   |  2 +-
 .../riscv/rvv/autovec/partial/select_vl-1.c   | 28 +++
 .../riscv/rvv/autovec/ternop/ternop-2.c   |  2 +-
 .../riscv/rvv/autovec/ternop/ternop-5.c   |  2 +-
 6 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9f4492db23c..c298f069714 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -18,6 +18,25 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+;; =
+;; == SELECT_VL
+;; =
+
+(define_expand "select_vl"
+  [(match_operand:P 0 "register_operand")
+   (match_operand:P 1 "vector_length_operand")
+   (match_operand:P 2 "")]
+  "TARGET_VECTOR"
+{
+  poly_int64 nunits = rtx_to_poly_int64 (operands[2]);
+  /* We arbitrary picked QImode as inner scalar mode to get vector mode.
+ since vsetvl only demand ratio. We let VSETVL PASS to optimize it.  */
+  scalar_int_mode mode = QImode;
+  machine_mode rvv_mode = riscv_vector::get_vector_mode (mode, nunits).require 
();
+  emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (rvv_mode, 
operands[0], operands[1]));
+  DONE;
+})
+
 ;; =
 ;; == Loads/Stores
 ;; =
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 00e1b20c6c6..d770e5e826e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -232,6 +232,7 @@ enum vlen_enum
   RVV_64 = 64,
   RVV_65536 = 65536
 };
+rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx);
 bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 49752cd8899..83277fc2c05 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1280,7 +1280,7 @@ force_vector_length_operand (rtx vl)
   return vl;
 }
 
-static rtx
+rtx
 gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl)
 {
   unsigned int sew = get_sew (vmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
new file mode 100644
index 000..b8e0ca0f1f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fno-vect-cost-model 
-fno-tree-loop-distribute-patterns -fdump-tree-optimized-details" } */
+
+#include 
+
+#define TEST_TYPE(TYPE)
\
+  __attribute__ ((noipa)) void select_vl_##TYPE (TYPE *__restrict dst, 
\
+TYPE *__restrict a, int n)\
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] = a[i];   
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_TYPE (int8_t)   
\
+  TEST_TYPE (uint8_t)  
\
+  TEST_TYPE (int16_t)  
\
+  TEST_TYPE (uint16_t) 
\
+  TEST_TYPE (int32_t)  
\
+  TEST_TYPE (uint32_t) 
\
+  TEST_TYPE (int64_t)  
\
+  TEST_TYPE (uint64_t) 
\
+  TEST_TYPE (float)   

[PATCH V3] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

Fix according to comments from Robin of V1 patch.

This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
  int n)
{
  for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}

Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..

After this patch:
...
vwmaccsu.vv
...

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_fma): New pattern.
(*single_mult_plus): Ditto.
(*double_mult_plus): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 162 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/rvv/autovec/widen/widen-8.c |  27 +++
 .../riscv/rvv/autovec/widen/widen-9.c |  23 +++
 .../rvv/autovec/widen/widen-complicate-5.c|  32 
 .../rvv/autovec/widen/widen-complicate-6.c|  30 
 .../riscv/rvv/autovec/widen/widen_run-8.c |  38 
 .../riscv/rvv/autovec/widen/widen_run-9.c |  35 
 8 files changed, 348 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f6052b50572..1c36b5f56be 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -170,3 +170,165 @@
   }
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
+
+;; =
+;; == Widening Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VWMACC
+;; -
+;; Includes:
+;; - vwmacc.vv
+;; - vwmaccu.vv
+;; -
+
+;; Combine ext + ext + fma ===> widen fma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
+;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
+;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
+(define_insn_and_split "*_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (any_extend:VWEXTI
+ (match_operand: 3 "register_operand")))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus 
(, mode),
+  riscv_vector::RVV_WIDEN_TERNOP, ops);
+DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "")])
+
+;; This helps to match ext + fma to enhance the combine optimizations.
+(define_insn_and_split "*single_mult_plus"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (match_operand:VWEXTI 3 "register_operand"))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vf2 (, mode);
+rtx tmp = gen_reg_rtx (mode);
+rtx ext_ops[] = {tmp, operands[2]};
+riscv_vector::emit_vlmax_insn (icode,

[PATCH V4] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

Fix according to comments from Robin of V1 patch.

This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
  int n)
{
  for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}

Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..

After this patch:
...
vwmaccsu.vv
...

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_fma): New pattern.
(*single_mult_plus): Ditto.
(*double_mult_plus): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 160 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/rvv/autovec/widen/widen-8.c |  27 +++
 .../riscv/rvv/autovec/widen/widen-9.c |  23 +++
 .../rvv/autovec/widen/widen-complicate-5.c|  32 
 .../rvv/autovec/widen/widen-complicate-6.c|  30 
 .../riscv/rvv/autovec/widen/widen_run-8.c |  38 +
 .../riscv/rvv/autovec/widen/widen_run-9.c |  35 
 8 files changed, 346 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f6052b50572..7bb93eed220 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -170,3 +170,163 @@
   }
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
+
+;; =
+;; == Widening Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VWMACC
+;; -
+;; Includes:
+;; - vwmacc.vv
+;; - vwmaccu.vv
+;; -
+
+;; Combine ext + ext + fma ===> widen fma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
+;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
+;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
+(define_insn_and_split "*_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (any_extend:VWEXTI
+ (match_operand: 3 "register_operand")))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus 
(, mode),
+  riscv_vector::RVV_WIDEN_TERNOP, 
operands);
+DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "")])
+
+;; This helps to match ext + fma.
+(define_insn_and_split "*single_mult_plus"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+   (plus:VWEXTI
+ (mult:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand"))
+   (match_operand:VWEXTI 3 "register_operand"))
+ (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vf2 (, mode);
+rtx tmp = gen_reg_rtx (mode);
+rtx ext_ops[] = {tmp, operands[2]};
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+rtx dst = expand_ternary_op (mode, fma_optab, tmp, operands[3],

[PATCH V2] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8 + 0] = b[i * 8 + 7] + 1;
  a[i * 8 + 1] = b[i * 8 + 7] + 2;
  a[i * 8 + 2] = b[i * 8 + 7] + 8;
  a[i * 8 + 3] = b[i * 8 + 7] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 7] + 6;
  a[i * 8 + 6] = b[i * 8 + 7] + 7;
  a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}

To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v   v4
vsrl.vi v4,v4,3
li  a3,8
vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 
8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li  t1,67633152
addit1,t1,513
li  a3,50790400
addia3,a3,1541
sllia3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v  v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v  v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv  a3,a5
add a5,a5,t1
bgtua3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
since "vrgatherei16.vv" can cover larger
  range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw   a4,a5,1
sb  a4,792(a0)
addiw   a4,a5,2
sb  a4,793(a0)
addiw   a4,a5,8
sb  a4,794(a0)
addiw   a4,a5,4
sb  a4,795(a0)
addiw   a4,a5,5
sb  a4,796(a0)
addiw   a4,a5,6
sb  a4,797(a0)
addiw   a4,a5,7
sb  a4,798(a0)
addiw   a5,a5,3
sb  a5,799(a0)
ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/

[PATCH V4] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
Co-authored-by: Richard Biener 

---
 gcc/doc/md.texi | 22 ++
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 ++
 gcc/tree-vect-loop.cc   | 72 ++
 gcc/tree-vect-stmts.cc  | 87 -
 gcc/tree-vectorizer.h   |  6 +++
 7 files changed, 202 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 3ac9d82aace..5d638de6d06 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6c064ff4993..f31b69c5d85 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after

[PATCH V5] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
Co-authored-by: Richard Biener 

---
 gcc/doc/md.texi | 22 ++
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 ++
 gcc/tree-vect-loop.cc   | 72 +++
 gcc/tree-vect-stmts.cc  | 86 -
 gcc/tree-vectorizer.h   |  6 +++
 7 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 3ac9d82aace..5d638de6d06 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6c064ff4993..f31b69c5d85 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_afte

[PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after_incr);
+ tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+  index_before_incr, nitems_step);
+ gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
+   }
+  else
+   {
+ create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
+ gimple_seq_add_stmt (header_seq,
+  gimple_build_assign (step, MIN_EXPR,
+   index_before_incr,
+   nitems_step));
+   }
   *iv_step = step;
   *compare_step = nitems_step;
-  return index_before_incr;
+  return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr
+  : index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
-  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
 {
   gcc_assert (compare_step);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..ace9e759f5b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,77 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
 
+  /* If a loop uses length controls and has a decrementing loop control IV,
+ we will normally pass that IV through a MIN_EXPR to calcaluate the
+ basis for the length controls.  E.g. in a loop that processes one
+ element per scalar iteration, the number of elements would be
+ MIN_EXPR , where N is the number of scalar iterations left.
+
+ This MIN_EXPR approach allows us to use pointer IVs with an invariant
+ step, since only the final iteration of the vector loop can have
+ inactive lanes.
+
+ However, some targets have a dedicated instruction for calculating the
+ preferred length, given the total number of elements that still need to
+ be processed.  This is encapsulated in the SELECT_VL internal function.
+
+ If the target supports SELECT_VL, we can use it instead of MIN_EXPR
+ to determine the basis for the length controls.  However, unlike the
+ MIN_EXPR calculation, the SELECT_VL calculation can decide to make
+ lanes inactive in any iteration of the vector loop, not just the 

[PATCH] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
Phase 6
are quite messy and cause some bugs discovered by my downstream 
auto-vectorization
test-generator.

Before this patch.

Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
"a5" is used in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction 
doesn't need AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.

Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
   E.g.
  vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, mf2
  vsetvli zero,a2, e32, mf2  ==> eliminate
2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and 
optmize user vsetvli base on the new RTL_SSA.

There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
local user vsetvl optimizations better than
   Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
the local user vsetvli instructions optimizaiton
   in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't 
put the test in this patch since we are missing autovec
   patterns for it so we can't use the upstream GCC directly reproduce such 
issue but I will remember put it back after I support the
   necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
framework. The issue description is this:
   
Before Phase 6:
   ...
   insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
   slli a4,a3,3
   ...
   insn2: vsetvli zero, a3, ... 
   load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is 
removed in Phase 5)
   ...

In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need 
to now).
Base on this situation, the def_info of insn2 has the information 
"set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has 
aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my 
downstream test-generator
execution test failed.

Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.

Besides, we don't like to initialize RTL_SSA second time it seems to be a waste 
since we just need to do a little optimization.

Base on all circumstances I described above, I rework and reorganize Phase 5 && 
Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base 
on the RTL_SSA information (The RTL_SSA is initialized
   at the beginning of the VSETVL PASS, no need to re-new it again). This phase 
includes 3 optimizaitons:
   1). local_eliminate_vsetvl_insn we already have (no change).
   2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal 
Phase 6 but with more powerful and reliable implementation.
  E.g. 
  void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
size_t avl;
if (m > 100)
  avl = __riscv_vsetvl_e16mf4(vl << 4);
else{
  avl = __riscv_vsetvl_e8mf8(vl);
}
for (size_t i = 0; i < m; i++) {
  vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
  __riscv_vse8_v_i8mf8(out + i, v0, avl);
}
  }
  This example failed to global user vsetvl optimize before this patch:
  f:
  ...
  vsetvli a2,a2,e16,mf4,ta,mu
  .L3:
  li  a5,0
  vsetvli zero,a2,e8,mf8,ta,ma
  .L5:
  ...
  vle8.v  v1,0(a6)
  addia5,a5,1
  vse8.v  v1,0(a4)
  bgtua3,a5,.L5
  .L10:
  ret
  .L2:
  beq a3,zero,.L10
  vsetvli a2,a2,e8,mf8,ta,mu
  j   .L3
  With this patch:
  f:
  ...
  vsetvli zer

  1   2   3   4   5   6   7   8   9   10   >