[PATCH] RISC-V: Remove @ of vec_series
gcc/ChangeLog: * config/riscv/autovec.md (@vec_series): Remove @. (vec_series): Ditto. * config/riscv/riscv-v.cc (expand_const_vector): Ditto. (shuffle_decompress_patterns): Ditto. --- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/riscv-v.cc | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index d6cf376ebca..056f2c352f6 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -336,7 +336,7 @@ ;; - vadd.vx/vadd.vi ;; - -(define_expand "@vec_series" +(define_expand "vec_series" [(match_operand:V_VLSI 0 "register_operand") (match_operand: 1 "reg_or_int_operand") (match_operand: 2 "reg_or_int_operand")] diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 29e138e1da2..23633a2a74d 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1014,7 +1014,7 @@ expand_const_vector (rtx target, rtx src) rtx base, step; if (const_vec_series_p (src, &base, &step)) { - emit_insn (gen_vec_series (mode, target, base, step)); + expand_vec_series (target, base, step); return; } @@ -1171,7 +1171,7 @@ expand_const_vector (rtx target, rtx src) rtx step = CONST_VECTOR_ELT (src, 2); /* Step 1 - { base1, base1 + step, base1 + step * 2, ... } */ rtx tmp = gen_reg_rtx (mode); - emit_insn (gen_vec_series (mode, tmp, base1, step)); + expand_vec_series (tmp, base1, step); /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... } */ scalar_mode elem_mode = GET_MODE_INNER (mode); if (!rtx_equal_p (base0, const0_rtx)) @@ -3020,7 +3020,7 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) /* Generate { 0, 1, } mask. */ rtx vid = gen_reg_rtx (sel_mode); rtx vid_repeat = gen_reg_rtx (sel_mode); - emit_insn (gen_vec_series (sel_mode, vid, const0_rtx, const1_rtx)); + expand_vec_series (vid, const0_rtx, const1_rtx); rtx and_ops[] = {vid_repeat, vid, const1_rtx}; emit_vlmax_insn (code_for_pred_scalar (AND, sel_mode), BINARY_OP, and_ops); rtx const_vec = gen_const_vector_dup (sel_mode, 1); -- 2.36.3
[PATCH] RISC-V: Enable more tests of "vect" for RVV
This patch enables almost full coverage vectorization tests for RVV, except these following tests (not enabled yet): 1. Will enable soon: check_effective_target_vect_call_lrint check_effective_target_vect_call_btrunc check_effective_target_vect_call_btruncf check_effective_target_vect_call_ceil check_effective_target_vect_call_ceilf check_effective_target_vect_call_floor check_effective_target_vect_call_floorf check_effective_target_vect_call_lceil check_effective_target_vect_call_lfloor check_effective_target_vect_call_nearbyint check_effective_target_vect_call_nearbyintf check_effective_target_vect_call_round check_effective_target_vect_call_roundf 2. Not sure we will need to enable or not: check_effective_target_vect_complex_* check_effective_target_vect_simd_clones check_effective_target_vect_bswap check_effective_target_vect_widen_shift check_effective_target_vect_widen_mult_* check_effective_target_vect_widen_sum_* check_effective_target_vect_unpack check_effective_target_vect_interleave check_effective_target_vect_extract_even_odd check_effective_target_vect_pack_trunc check_effective_target_vect_check_ptrs check_effective_target_vect_sdiv_pow2_si check_effective_target_vect_usad_* check_effective_target_vect_udot_* check_effective_target_vect_sdot_* check_effective_target_vect_gather_load_ifn After this patch, we will have these following additional FAILs: XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-
[PATCH] TEST: Fix XPASS of TSVC testsuites for RVV
Fix these following XPASS FAILs of TSVC for RVV: XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops" gcc/testsuite/ChangeLog: * gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS. * gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-
[PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_SUB" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_SUB" 1 For RVV, the expected dumple IR is COND_LEN_* pattern. Also, we are still failing at this check: FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_LEN_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_LEN_SUB" Since we have a known bug in GIMPLE_FOLD that Robin is working on it. @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug fix patch. Ok for trunk ? gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. * gcc.dg/vect/vect-cond-arith-4.c: Ditto. * gcc.dg/vect/vect-cond-arith-5.c: Ditto. * gcc.dg/vect/vect-cond-arith-6.c: Ditto. --- gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 6 -- gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 12 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 12 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 12 4 files changed, 28 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c index 38994ea82a5..7bddc122037 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c @@ -41,5 +41,7 @@ neg_xi (double *x) return res_3; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { vect_double_cond_arith && vect_fully_masked } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_LEN_ADD} "vect" { target { vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_LEN_SUB} "optimized" { target { vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c index 1af0fe6
[PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_SUB" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_SUB" 1 For RVV, the expected dumple IR is COND_LEN_* pattern. Also, we are still failing at this check: FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_LEN_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_LEN_SUB" Since we have a known bug in GIMPLE_FOLD that Robin is working on it. @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug fix patch. Ok for trunk ? gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. * gcc.dg/vect/vect-cond-arith-4.c: Ditto. * gcc.dg/vect/vect-cond-arith-5.c: Ditto. * gcc.dg/vect/vect-cond-arith-6.c: Ditto. --- gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++-- gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c index 38994ea82a5..3832a660023 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c @@ -41,5 +41,5 @@ neg_xi (double *x) return res_3; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { vect_double_cond_arith && vect_fully_masked } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "vect" { target { vect_double_cond_arith && vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target { vect_double_cond_arith && vect_fully_masked } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c index 1af0fe642a0..5bb75206a68 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c @@ -52,8 +52,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "opt
[PATCH] RISC-V: Support movmisalign of RVV VLA modes
Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigation && reading RVV ISA again and experiment on SPIKE, I realized I was wrong. RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred successfully or an address misaligned exception is raised on that element." It's obvious that RVV ISA does allow misaligned vector load/store. And experiment and confirm on SPIKE: [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader z ra 00010158 sp 003ffb40 gp 00012c48 tp t0 000110da t1 000f t2 s0 00013460 s1 a0 00012ef5 a1 00012018 a2 00012a71 a3 000d a4 0004 a5 00012a71 a6 00012a71 a7 00012018 s2 s3 s4 s5 s6 s7 s8 s9 sA sB t3 t4 t5 t6 pc 00010258 va/inst 020660a7 sr 80026620 Store/AMO access fault! [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE. So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since it can improve multiple vectorization tests and fix dumple FAILs. This patch fixes these following dump FAILs: FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum" Consider this following case: struct s { unsigned i : 31; char a : 4; }; #define N 32 #define ELT0 {0x7FFFUL, 0} #define ELT1 {0x7FFFUL, 1} #define ELT2 {0x7FFFUL, 2} #define ELT3 {0x7FFFUL, 3} #define RES 48 struct s A[N] = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; int __attribute__ ((noipa)) f(struct s *ptr, unsigned n) { int res = 0; for (int i = 0; i < n; ++i) res += ptr[i].a; return res; } -O3 -S -fno-vect-cost-model (default strict-align): f: mv a4,a0 beq a1,zero,.L9 addiw a5,a1,-1 li a3,14 vsetivlizero,16,e64,m8,ta,ma bleua5,a3,.L3 andia5,a0,127 bne a5,zero,.L3 srliw a3,a1,4 sllia3,a3,7 li a0,15 sllia0,a0,32 add a3,a3,a4 mv a5,a4 li a2,32 vmv.v.x v16,a0 vsetvli zero,zero,e32,m4,ta,ma vmv.v.i v4,0 .L4: vsetvli zero,zero,e64,m8,ta,ma vle64.v v8,0(a5) addia5,a5,128 vand.vv v8,v8,v16 vsetvli zero,zero,e32,m4,ta,ma vnsrl.wxv8,v8,a2 vadd.vv v4,v4,v8 bne a5,a3,.L4 li a3,0 andia5,a1,15 vmv.s.x v1,a3 andia3,a1,-16 vredsum.vs v1,v4,v1 vmv.x.s a0,v1 mv a2,a0 beq a5,zero,.L15 sll
[PATCH] TEST: Fix dump FAIL for RVV (RISCV-V vector)
As this showed: https://godbolt.org/z/3K9oK7fx3 ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times. This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis. Then we succeed at the second time. So RVV has 4 times of showing "FOLD_EXTRACT_LAST:. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant. --- gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c index 8820075b1dc..8ea8c538713 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c @@ -42,6 +42,7 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && vect_pack_trunc } } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target { { vect_fold_extract_last } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */ -- 2.36.3
[PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV
RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this case well. So, adjust dump check for RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV. --- gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c index a61f1a9a221..829a4d41601 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c @@ -35,6 +35,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! vect_unpack } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_unpack || riscv_v } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { { ! vect_unpack } && { ! riscv_v } } } } } */ -- 2.36.3
[PATCH] TEST: Fix dump FAIL for RVV
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-cond-1.c: Fix dump FAIL for RVV. * gcc.dg/vect/pr57705.c: Ditto. --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++-- gcc/testsuite/gcc.dg/vect/pr57705.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c index c8024429e9c..e1ebc23505f 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c @@ -47,6 +47,6 @@ int main () } /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target vect_element_align } } } */ -/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { vect_element_align && { ! amdgcn-*-* } } } } } */ -/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target amdgcn-*-* } } } */ +/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { vect_element_align && { { ! amdgcn-*-* } && { ! riscv_v } } } } } } */ +/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target { amdgcn-*-* || riscv_v } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c b/gcc/testsuite/gcc.dg/vect/pr57705.c index 39c32946d74..2dacea0a7a7 100644 --- a/gcc/testsuite/gcc.dg/vect/pr57705.c +++ b/gcc/testsuite/gcc.dg/vect/pr57705.c @@ -64,5 +64,5 @@ main () return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target vect_pack_trunc } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { ! vect_pack_trunc } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target { vect_pack_trunc || riscv_v } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { { ! vect_pack_trunc } && { ! riscv_v } } } } } */ -- 2.36.3
[PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV
Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop vectorizations. Fix these following XPASS FAILs: XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 gcc/testsuite/ChangeLog: * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV. * gcc.dg/vect/no-scevccp-outer-17.c: Ditto. * gcc.dg/vect/no-scevccp-outer-19.c: Ditto. * gcc.dg/vect/no-scevccp-outer-21.c: Ditto. --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c index c7c2fa8a504..12179949e00 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c @@ -59,4 +59,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! {vect_unpack } } } } } */ +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c index ba904a6c03e..86554a98169 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c @@ -65,4 +65,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! {vect_unpack } } } } } */ +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c index 5cd4049d08c..624b54accf4 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c @@ -49,4 +49,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! {vect_unpack } } } } } */ +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c index 72e53c2bfb0..b30a5d78819 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c @@ -59,4 +59,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! { vect_pack_trunc } } } } } */ +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */ -- 2.36.3
[PATCH V2] RISC-V: Support movmisalign of RVV VLA modes
This patch fixed these following FAILs in regressions: FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum" Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigation && reading RVV ISA again and experiment on SPIKE, I realized I was wrong. RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred successfully or an address misaligned exception is raised on that element." It's obvious that RVV ISA does allow misaligned vector load/store. And experiment and confirm on SPIKE: [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader z ra 00010158 sp 003ffb40 gp 00012c48 tp t0 000110da t1 000f t2 s0 00013460 s1 a0 00012ef5 a1 00012018 a2 00012a71 a3 000d a4 0004 a5 00012a71 a6 00012a71 a7 00012018 s2 s3 s4 s5 s6 s7 s8 s9 sA sB t3 t4 t5 t6 pc 00010258 va/inst 020660a7 sr 80026620 Store/AMO access fault! [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE. So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since it can improve multiple vectorization tests and fix dumple FAILs. This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled). Consider this following case: struct s { unsigned i : 31; char a : 4; }; #define N 32 #define ELT0 {0x7FFFUL, 0} #define ELT1 {0x7FFFUL, 1} #define ELT2 {0x7FFFUL, 2} #define ELT3 {0x7FFFUL, 3} #define RES 48 struct s A[N] = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; int __attribute__ ((noipa)) f(struct s *ptr, unsigned n) { int res = 0; for (int i = 0; i < n; ++i) res += ptr[i].a; return res; } -O3 -S -fno-vect-cost-model (default strict-align): f: mv a4,a0 beq a1,zero,.L9 addiw a5,a1,-1 li a3,14 vsetivlizero,16,e64,m8,ta,ma bleua5,a3,.L3 andia5,a0,127 bne a5,zero,.L3 srliw a3,a1,4 sllia3,a3,7 li a0,15 sllia0,a0,32 add a3,a3,a4 mv a5,a4 li a2,32 vmv.v.x v16,a0 vsetvli zero,zero,e32,m4,ta,ma vmv.v.i v4,0 .L4: vsetvli zero,zero,e64,m8,ta,ma vle64.v v8,0(a5) addia5,a5,128
[PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV
Reference: https://godbolt.org/z/G9jzf5Grh RVV is able to vectorize this case using SLP. However, with -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6. gcc/testsuite/ChangeLog: * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6. --- gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c index 7c7acd5bab6..96751faae7f 100644 --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c @@ -18,4 +18,4 @@ foo (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_strided6 } } } } */ -- 2.36.3
[PATCH] RISC-V Regression test: Fix FAIL of pr45752.c for RVV
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model instead of SLP. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5. --- gcc/testsuite/gcc.dg/vect/pr45752.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c b/gcc/testsuite/gcc.dg/vect/pr45752.c index e8b364f29eb..3c87d9b04fc 100644 --- a/gcc/testsuite/gcc.dg/vect/pr45752.c +++ b/gcc/testsuite/gcc.dg/vect/pr45752.c @@ -159,4 +159,4 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "gaps requires scalar epilogue loop" 0 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_load_lanes && vect_strided5 } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression tests: Fix FAIL of pr97832* for RVV
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP with -fno-vect-cost-model. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr97832-4.c: Ditto. --- gcc/testsuite/gcc.dg/vect/pr97832-2.c | 4 ++-- gcc/testsuite/gcc.dg/vect/pr97832-3.c | 4 ++-- gcc/testsuite/gcc.dg/vect/pr97832-4.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-2.c b/gcc/testsuite/gcc.dg/vect/pr97832-2.c index 4f0578120ee..7d8d2691432 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-2.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-2.c @@ -25,5 +25,5 @@ void foo1x1(double* restrict y, const double* restrict x, int clen) } } -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-3.c b/gcc/testsuite/gcc.dg/vect/pr97832-3.c index ad1225ddbaa..c0603e1432e 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-3.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-3.c @@ -46,5 +46,5 @@ void foo(double* restrict y, const double* restrict x0, const double* restrict x } } -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-4.c b/gcc/testsuite/gcc.dg/vect/pr97832-4.c index 74ae27ff873..c03442816a4 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-4.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-4.c @@ -24,5 +24,5 @@ void foo1x1(double* restrict y, const double* restrict x, int clen) } } -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression test: Fix FAIL of slp-12a.c
This case is vectorized by stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-12a.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c index f0dda55acae..973de6ada21 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-12a.c +++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c @@ -76,5 +76,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression test: Adapt SLP tests like ARM SVE
Like ARM SVE, RVV is vectorizing these 2 cases in the same way. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-23.c: Add RVV like ARM SVE. * gcc.dg/vect/slp-perm-10.c: Ditto. --- gcc/testsuite/gcc.dg/vect/slp-23.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-perm-10.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-23.c b/gcc/testsuite/gcc.dg/vect/slp-23.c index d32ee5ba73b..8836acf0330 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-23.c +++ b/gcc/testsuite/gcc.dg/vect/slp-23.c @@ -114,5 +114,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_perm } } } } */ /* SLP fails for the second loop with variable-length SVE because the load size is greater than the minimum vector size. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm xfail { aarch64_sve && vect_variable_length } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm xfail { { aarch64_sve || riscv_v } && vect_variable_length } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c index 2cce30c2444..03de4c61b50 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c @@ -53,4 +53,4 @@ int main () /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ /* SLP fails for variable-length SVE because the load size is greater than the minimum vector size. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm xfail { aarch64_sve && vect_variable_length } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm xfail { { aarch64_sve || riscv_v } && vect_variable_length } } } } */ -- 2.36.3
[PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV
RVV vectorize it with stride5 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-perm-4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c index 107968f1f7c..f4bda39c837 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c @@ -115,4 +115,4 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "gaps requires scalar epilogue loop" 0 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! { vect_load_lanes && vect_strided5 } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV
RVV vectortizes this case with stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes. --- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c index 15f5c259e98..e2fe01bb13d 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c @@ -60,6 +60,6 @@ int main (void) /* For variable-length SVE, the number of scalar statements in the reduction exceeds the number of elements in a 128-bit granule. */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_multiple_sizes } xfail { vect_no_int_min_max || { aarch64_sve && vect_variable_length } } } } } */ -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { vect_multiple_sizes } } } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { vect_multiple_sizes && { ! { vect_load_lanes && vect_strided8 } } } } } } */ /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */ -- 2.36.3
[PATCH] TEST: Add vectorization check
These cases won't check SLP for load_lanes support target. Add vectorization check for situations. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Add vectorization check. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr97832-4.c: Ditto. --- gcc/testsuite/gcc.dg/vect/pr97832-2.c | 1 + gcc/testsuite/gcc.dg/vect/pr97832-3.c | 1 + gcc/testsuite/gcc.dg/vect/pr97832-4.c | 1 + 3 files changed, 3 insertions(+) diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-2.c b/gcc/testsuite/gcc.dg/vect/pr97832-2.c index 7d8d2691432..60e8e8516fc 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-2.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-2.c @@ -27,3 +27,4 @@ void foo1x1(double* restrict y, const double* restrict x, int clen) /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-3.c b/gcc/testsuite/gcc.dg/vect/pr97832-3.c index c0603e1432e..2dc76e5b565 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-3.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-3.c @@ -48,3 +48,4 @@ void foo(double* restrict y, const double* restrict x0, const double* restrict x /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-4.c b/gcc/testsuite/gcc.dg/vect/pr97832-4.c index c03442816a4..7e74c9313d5 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97832-4.c +++ b/gcc/testsuite/gcc.dg/vect/pr97832-4.c @@ -26,3 +26,4 @@ void foo1x1(double* restrict y, const double* restrict x, int clen) /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target { ! { vect_load_lanes && vect_strided8 } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -- 2.36.3
[PATCH] RISC-V: Add available vector size for RVV
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN from M1 to M8. For example, when TARGET_MIN_VLEN = 128 bits, we enable 128/256/512/1024 bits VLS modes. This patch fixes following FAIL: FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 2 FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add 256/512/1024 --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index af52c38433d..dc366d35a0a 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8881,7 +8881,7 @@ proc available_vector_sizes { } { lappend result 4096 2048 1024 512 256 128 64 32 16 8 4 2 } elseif { [istarget riscv*-*-*] } { if { [check_effective_target_riscv_v] } { - lappend result 0 32 64 128 + lappend result 0 32 64 128 256 512 1024 } lappend result 128 } else { -- 2.36.3
[PATCH] RISC-V Regression: Fix dump check of bb-slp-68.c
Like GCN, RVV also has 64 bytes vectors (512 bits) which cause FAIL in this test. It's more reasonable to use "vect512" instead of AMDGCN. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-68.c: Use vect512. --- gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c index e7573a14933..2dd3d8ee90c 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c @@ -20,4 +20,4 @@ void foo () /* We want to have the store group split into 4, 2, 4 when using 32byte vectors. Unfortunately it does not work when 64-byte vectors are available. */ -/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } } } */ +/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail vect512 } } } */ -- 2.36.3
[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV
Here is the reference comparing dump IR between ARM SVE and RVV. https://godbolt.org/z/zqess8Gss We can see RVV has one more dump IR: optimized: basic block part vectorized using 128 byte vectors since RVV has 1024 bit vectors. The codegen is reasonable good. However, I saw GCN also has 1024 bit vector. This patch may cause this case FAIL in GCN port ? Hi, GCN folk, could you check this patch in GCN port for me ? gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant. * lib/target-supports.exp: Ditto. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c | 3 ++- gcc/testsuite/lib/target-supports.exp | 6 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c index 8df35327e7a..9ef1330b47c 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c @@ -67,7 +67,8 @@ int main() /* We should also be able to use 2-lane SLP to initialize the real and imaginary components in the first loop of main. */ -/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" { target {! { vect1024 } } } } } */ +/* { dg-final { scan-tree-dump-times "optimized: basic block" 11 "slp1" { target { { vect1024 } } } } } */ /* We should see the s->phase[dir] operand splatted and no other operand built from scalars. See PR97334. */ /* { dg-final { scan-tree-dump "Using a splat" "slp1" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index dc366d35a0a..95c489d7f76 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8903,6 +8903,12 @@ proc check_effective_target_vect_variable_length { } { return [expr { [lindex [available_vector_sizes] 0] == 0 }] } +# Return 1 if the target supports vectors of 1024 bits. + +proc check_effective_target_vect1024 { } { +return [expr { [lsearch -exact [available_vector_sizes] 1024] >= 0 }] +} + # Return 1 if the target supports vectors of 512 bits. proc check_effective_target_vect512 { } { -- 2.36.3
[PATCH] RISC-V Regression: Make match patterns more accurate
This patch fixes following 2 FAILs in RVV regression since the check is not accurate. It's inspired by Robin's previous patch: https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern. * gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto. --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c | 2 +- gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c index 543ee98b5a4..058d1d2db2d 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c @@ -77,4 +77,4 @@ int main (void) } /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c index 7049e4936b9..6f2b2210b11 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-3.c @@ -30,4 +30,4 @@ unsigned int main1 () } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_sum_hi_to_si } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 1 "vect" { target vect_widen_sum_hi_to_si } } } */ +/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target vect_widen_sum_hi_to_si } } } */ -- 2.36.3
[PATCH] RISC-V Regression: Fix FAIL of predcom-2.c
Like GCN, add -fno-tree-vectorize. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/predcom-2.c: Add riscv. --- gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c index f19edd4cd74..681ff7c696b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c @@ -1,6 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details-blocks -fno-tree-pre" } */ -/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */ +/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* riscv*-*-* } } */ void abort (void); -- 2.36.3
[Committed] RISC-V: Add testcase for SCCVN optimization[PR111751]
Add testcase for PR111751 which has been fixed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html PR target/111751 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111751.c: New test. --- .../gcc.target/riscv/rvv/autovec/pr111751.c | 55 +++ 1 file changed, 55 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c new file mode 100644 index 000..0f1e8a7d567 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111751.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#define N 16 + +int foo1 () +{ + int i; + char ia[N]; + char ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; + char ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; + + /* Not vectorizable, multiplication */ + for (i = 0; i < N; i++) +{ + ia[i] = ib[i] * ic[i]; +} + + /* check results: */ + for (i = 0; i < N; i++) +{ + if (ia[i] != (char) (ib[i] * ic[i])) +__builtin_abort (); +} + + return 0; +} + +typedef int half_word; + +int foo2 () +{ + int i; + half_word ia[N]; + half_word ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; + half_word ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; + + /* Not worthwhile, only 2 parts per int */ + for (i = 0; i < N; i++) +{ + ia[i] = ib[i] + ic[i]; +} + + /* check results: */ + for (i = 0; i < N; i++) +{ + if (ia[i] != ib[i] + ic[i]) +__builtin_abort (); +} + + return 0; +} + +/* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,0\s+ret} 2 } } */ +/* { dg-final { scan-assembler-not {vset} } } */ -- 2.36.3
[Committed] RISC-V: Add VLS BOOL mode vcond_mask[PR111751]
Richard patch resolve PR111751: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65 which cause ICE in RISC-V regression: FAIL: gcc.dg/torture/pr53144.c -O2 (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328) FAIL: gcc.dg/torture/pr53144.c -O2 (test for excess errors) FAIL: gcc.dg/torture/pr53144.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328) FAIL: gcc.dg/torture/pr53144.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/pr53144.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328) FAIL: gcc.dg/torture/pr53144.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gcc.dg/torture/pr53144.c -O3 -g (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328) FAIL: gcc.dg/torture/pr53144.c -O3 -g (test for excess errors) VLS BOOL modes vcond_mask is needed to fix this regression ICE. More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111751 Tested and Committed. gcc/ChangeLog: * config/riscv/autovec.md: Add VLS BOOL modes. --- gcc/config/riscv/autovec.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 53e9d34eea1..41bff3a318f 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -575,10 +575,10 @@ ;; - (define_expand "vcond_mask_" - [(match_operand:VB 0 "register_operand") - (match_operand:VB 1 "register_operand") - (match_operand:VB 2 "register_operand") - (match_operand:VB 3 "register_operand")] + [(match_operand:VB_VLS 0 "register_operand") + (match_operand:VB_VLS 1 "register_operand") + (match_operand:VB_VLS 2 "register_operand") + (match_operand:VB_VLS 3 "register_operand")] "TARGET_VECTOR" { /* mask1 = operands[3] & operands[1]. */ -- 2.36.3
[PATCH] RISC-V Regression: Fix FAIL of pr65947-8.c for RVV
This test is testing fold_extract_last pattern so it's more reasonable use vect_fold_extract_last instead of specifying targets. This is the vect_fold_extract_last property: proc check_effective_target_vect_fold_extract_last { } { return [expr { [check_effective_target_aarch64_sve] || [istarget amdgcn*-*-*] || [check_effective_target_riscv_v] }] } include ARM SVE/GCN/RVV. It perfectly matches what we want and more reasonable, better maintainment. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last. --- gcc/testsuite/gcc.dg/vect/pr65947-8.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c b/gcc/testsuite/gcc.dg/vect/pr65947-8.c index d0426792e35..9ced4dbb69f 100644 --- a/gcc/testsuite/gcc.dg/vect/pr65947-8.c +++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c @@ -41,6 +41,6 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { amdgcn*-*-* || aarch64_sve } } } } } */ -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { amdgcn*-*-* || aarch64_sve } } } } */ -/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { target { ! { amdgcn*-*-* || aarch64_sve } } } } } */ +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { vect_fold_extract_last } } } } } */ +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_fold_extract_last } } } } */ +/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { target { ! { vect_fold_extract_last } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVV
As Richard suggested: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html Add vect_ext_char_longlong to fix FAIL for RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV. * lib/target-supports.exp: Add vect_ext_char_longlong property. --- gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c | 4 ++-- gcc/testsuite/lib/target-supports.exp | 9 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c index a61f1a9a221..fd17ad7437e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c @@ -35,6 +35,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! vect_unpack } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_unpack } || { vect_variable_length && vect_ext_char_longlong } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { { ! vect_unpack } && {! { vect_variable_length && vect_ext_char_longlong } } } } } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 95c489d7f76..b454b07359a 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4215,6 +4215,15 @@ proc check_effective_target_vect_floatuint_cvt { } { && [check_effective_target_riscv_v]) }}] } +# Return 1 if the target supports vector integer char -> long long extend optab +# + +proc check_effective_target_vect_ext_char_longlong { } { +return [check_cached_effective_target_indexed vect_ext_char_longlong { + expr { ([istarget riscv*-*-*] + && [check_effective_target_riscv_v]) }}] +} + # Return 1 if peeling for alignment might be profitable on the target # -- 2.36.3
[PATCH] RISC-V Regression: Make pattern match more accurate of vect-live-2.c
Like previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-live-2.c: Make pattern match more accurate. --- gcc/testsuite/gcc.dg/vect/vect-live-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-2.c index dae36e9ed67..0a49c96d4e0 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c @@ -58,4 +58,4 @@ main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */ -- 2.36.3
[PATCH] RISC-V: Remove XFAIL of ssa-dom-cse-2.c
Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not. Remove XFAIL, to fix: XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;" gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index a879d305971..5c89e3f8698 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -27,4 +27,4 @@ foo () but the loop reads only one element at a time, and DOM cannot resolve these. The same happens on powerpc depending on the SIMD support available. */ -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* mmix-knuth-mmixware } || { { { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } || aarch64_sve } || { arm*-*-* && { ! arm_neon } } } } } } } */ +/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* mmix-knuth-mmixware } || { { { lp64 && { powerpc*-*-* sparc*-*-* } } || aarch64_sve } || { arm*-*-* && { ! arm_neon } } } } } } } */ -- 2.36.3
[PATCH] RISC-V: Enable full coverage vect tests
I have analyzed all existing FAILs. Except these following FAILs need to be addressed: FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/slp-reduc-7.c execution test FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_(LEN_)?SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_(LEN_)?SUB" All other FAILs are dumple fail can be ignored (Confirm ARM SVE also has such FAILs and didn't fix them on either tests or implementation). Now, It's time to enable full coverage vect tests including vec_unpack, vec_pack, vec_interleave, ... etc. To see what we are still missing: Before this patch: === gcc Summary === # of expected passes182839 # of unexpected failures79 # of unexpected successes 11 # of expected failures 1275 # of unresolved testcases 4 # of unsupported tests 4223 After this patch: === gcc Summary === # of expected passes183411 # of unexpected failures93 # of unexpected successes 7 # of expected failures 1285 # of unresolved testcases 4 # of unsupported tests 4157 There is an important issue increased that I have noticed after this patch: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" It has a related PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721 I am gonna fix this first in the middle-end after commit this patch. Ok for trunk ? gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add RVV. --- gcc/testsuite/lib/target-supports.exp | 45 --- 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index b454b07359a..8037dbcee53 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -7876,7 +7876,9 @@ proc check_effective_target_vect_sdot_qi { } { || [istarget aarch64*-*-*] || [istarget arm*-*-*] || ([istarget mips*-*-*] -&& [et-is-effective-target mips_msa]) }}] +&& [et-is-effective-target mips_msa]) +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) }}] } # Return 1 if the target plus current options supports a vector @@ -7891,7 +7893,9 @@ proc check_effective_target_vect_udot_qi { } { || [istarget arm*-*-*] || [istarget ia64-*-*] || ([istarget mips*-*-*] -&& [et-is-effective-target mips_msa]) }}] +&& [et-is-effective-target mips_msa]) +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) }}] } # Return 1 if the target plus current options supports a vector @@ -7918,7 +7922,9 @@ proc check_effective_target_vect_sdot_hi { } { || [istarget ia64-*-*] || [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget mips*-*-*] -&& [et-is-effective-target mips_msa]) }}] +&& [et-is-effective-target mips_msa]) +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) }}] } # Return 1 if the target plus current options supports a vector @@ -7930,7 +7936,9 @@ proc check_effective_target_vect_udot_hi { } { return [check_cached_effective_target_indexed vect_udot_hi { expr { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) || ([istarget mips*-*-*] -&& [et-is-effective-target mips_msa]) }}] +&& [et-is-effective-target mips_msa]) +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) }}] } # Return 1 if the target plus current options supports a vector @@ -7945,7 +7953,9 @@ proc check_effective_target_vect_usad_char { } { || ([istarget aarch64*-*-*] && ![check_effective_target_aarch64_sve]) || ([istarget powerpc*-*-*] - && [check_p9vector_hw_available])}}] + && [check_p9vector_hw_available]) + || ([istarget riscv*-*-*] + && [check_effective_target_riscv_v]) }}] } # Return 1 if the target plus current options supports both signed @@ -7971,8 +7981,10 @@ proc check_effective_target_vect_mulhrs_hi {} { # by power-of-2 operations on vectors of 4-byte integers. proc check_effective_target_vect_sdiv_pow2_si {} { -return [expr { [istarget aarch64*-*-*] - && [check_effective_targ
[PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter
I suddenly I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix offset bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug. (gather_scatter_valid_offset_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test. --- gcc/config/riscv/autovec.md | 28 +-- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 16 +-- .../autovec/gather-scatter/offset_extend-1.c | 14 ++ 4 files changed, 42 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 41bff3a318f..07607bff71e 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -59,7 +59,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -74,7 +74,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -89,7 +89,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -104,7 +104,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -119,7 +119,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -134,7 +134,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -153,7 +153,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -172,7 +172,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -187,7 +187,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -202,7 +202,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -217,7 +217,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -232,7 +232,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_
[PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix index bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug. (gather_scatter_valid_offset_mode_p): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test. --- gcc/config/riscv/autovec.md | 28 +-- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 13 +++-- .../autovec/gather-scatter/offset_extend-1.c | 14 ++ 4 files changed, 39 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 41bff3a318f..a346ad8ec1a 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -59,7 +59,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -74,7 +74,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -89,7 +89,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -104,7 +104,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -119,7 +119,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -134,7 +134,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -153,7 +153,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -172,7 +172,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -187,7 +187,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -202,7 +202,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -217,7 +217,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -232,7 +232,7 @@ (match_operand: 5 "vector_mask_operand") (match_o
[PATCH V3] RISC-V: Fix incorrect index(offset) of gather/scatter
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix index bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug. (gather_scatter_valid_offset_mode_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test. --- gcc/config/riscv/autovec.md | 28 +-- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 13 +++-- .../autovec/gather-scatter/offset_extend-1.c | 14 ++ 4 files changed, 39 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 41bff3a318f..a346ad8ec1a 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -59,7 +59,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -74,7 +74,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -89,7 +89,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -104,7 +104,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -119,7 +119,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -134,7 +134,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -153,7 +153,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -172,7 +172,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -187,7 +187,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -202,7 +202,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -217,7 +217,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_mode_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -232,7 +232,7 @@ (match_operand: 5 "vector_mask_operand") (
[PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same situation as GATHER_LOAD (no conditional mask). So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask. gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): (vect_build_slp_tree_1): (vect_build_slp_tree_2): * tree-vect-stmts.cc (vectorizable_load): --- gcc/tree-vect-slp.cc | 18 -- gcc/tree-vect-stmts.cc | 4 ++-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fa098f9ff4e..712c04ec278 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) case IFN_MASK_GATHER_LOAD: return arg1_arg4_map; + case IFN_MASK_LEN_GATHER_LOAD: + /* In tree-vect-patterns.cc, we will have these 2 situations: + + - Unconditional gather load transforms + into MASK_LEN_GATHER_LOAD with dummy mask which is -1. + + - Conditional gather load transforms + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map + : nullptr; + case IFN_MASK_STORE: return arg3_arg2_map; @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (cfn == CFN_MASK_LOAD || cfn == CFN_GATHER_LOAD - || cfn == CFN_MASK_GATHER_LOAD) + || cfn == CFN_MASK_GATHER_LOAD + || cfn == CFN_MASK_LEN_GATHER_LOAD) ldst_p = true; else if (cfn == CFN_MASK_STORE) { @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) && rhs_code != CFN_GATHER_LOAD && rhs_code != CFN_MASK_GATHER_LOAD + && rhs_code != CFN_MASK_LEN_GATHER_LOAD /* Not grouped loads are handled as externals for BB vectorization. For loop vectorization we can handle splats the same we handle single element interleaving. */ @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, if (gcall *stmt = dyn_cast (stmt_info->stmt)) gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD) + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD)); else { *max_nunits = this_max_nunits; diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index cd7c1090d88..263acf5d3cd 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo, return false; mask_index = internal_fn_mask_index (ifn); - if (mask_index >= 0 && slp_node) + if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0) mask_index = vect_slp_child_index_for_operand (call, mask_index); - if (mask_index >= 0 + if (mask_index >= 0 && internal_fn_len_index (ifn) < 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) return false; -- 2.36.3
[PATCH V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. To naturally reuse the current flow of GATHER_LOAD/MASK_GATHER_LOAD. I adjust MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE pattern in tree-vect-patterns.cc Here is adjustment in tree-vect-patterns.cc: 1. For un-conditional gather load/scatter store: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) ---> MASK_LEN_GATHER_LOAD (base, offset, scale, zero) Note that we remove the dummy mask (-1) of MASK_LEN_GATHER_LOAD, so that we can reuse the current SLP flow of GATHER_LOAD. 2. For conditional gather load/scatter store: We don't change the IR, so they have an additional conditional mask. Then, we reuse the current flow of MASK_GATHER_LOAD. So, after the recognization of patterns (tree-vect-patterns.cc), we will end up with scalar gather/scatter IR with different num arguments. (4 arguments for un-conditional, 5 arguments for conditional). The difference only apply on scalar gather/scatter IR. Pass through "call" argument to "internal_fn_mask_index" and return the mask_index according to CALL for mask_len_gather/mask_len_scatter. For vector IR, they are always same (keep original format): MASK_GATHER_LOAD (ptr, offset, scale, zero, mask, len, bias). Hence, the optab of mask_len gather/scatter don't change. To conclude, we only change the format of mask_len gather/scatter scalar IR in tree-vect-patterns.cc It seems the flow of MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE after this patch seems to be more natural and reasonable. Also, I realize that SLP of conditional gather_load is missing so I append a test for that. RISC-V regression passed and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * internal-fn.cc (internal_fn_mask_index): Add call argument. * internal-fn.h (internal_fn_mask_index): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Delete MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE. * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Ditto. (vectorizable_store): Adapt for new interface of internal_fn_mask_index. (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/internal-fn.cc| 16 ++-- gcc/internal-fn.h | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++ gcc/tree-vect-patterns.cc | 4 +--- gcc/tree-vect-slp.cc | 17 +++-- gcc/tree-vect-stmts.cc| 6 +++--- 6 files changed, 49 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..009ebd95785 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4701,7 +4701,7 @@ internal_fn_len_index (internal_fn fn) otherwise return -1. */ int -internal_fn_mask_index (internal_fn fn) +internal_fn_mask_index (internal_fn fn, gcall *call) { switch (fn) { @@ -4717,9 +4717,21 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_GATHER_LOAD: case IFN_MASK_SCATTER_STORE: + return 4; + case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: - return 4; + /* In tree-vect-patterns.cc, we will have these 2 situations: + + - Unconditional gather load transforms + into MASK_LEN_GATHER_LOAD with no mask. + + - Conditional gather load transforms + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ + if (!call || gimple_num_args (call) == 5) + return 4; + else + return -1; default: return (conditional_internal_fn_code (fn) != ERROR_MARK diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..62fbbd537f4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -235,7 +235,7 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *, extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); extern bool internal_gather_scatter_fn_p (internal_fn); -extern int internal_fn_mask_index (internal_fn); +extern int internal_fn_mask_index (internal_fn, gcall * = nullptr); extern int internal_fn_len_index (internal_fn); extern int internal_fn_sto
[PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma sllia6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrrt1,vlenb sllia5,a5,33 srlia7,t1,2 addia5,a5,1 sllia3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minua5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtua6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++ gcc/tree-vect-slp.cc | 22 ++ gcc/tree-vect-stmts.cc| 10 +- 3 files changed, 42 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) +{ + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; +} +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fa098f9ff4e..38fe6ba6296 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -542,6 +542,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return arg1_map; case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: return arg1_arg4_map; case IFN_MASK_STORE: @@ -700,8 +701,7 @@ vect_get_and_check_sl
[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV
Like ARM SVE and GCN, add RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr69907.c: Add RVV. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c index b348526b62f..f63b42a271a 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c @@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2) /* Disable for SVE because for long or variable-length vectors we don't get an unrolled epilogue loop. Also disable for AArch64 Advanced SIMD, because there we can vectorize the epilogue using mixed vector sizes. - Likewise for AMD GCN. */ -/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! amdgcn*-*-* } } } } } */ + Likewise for AMD GCN and RVV. */ +/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! amdgcn*-*-* } && { ! riscv_v } } } } } } */ -- 2.36.3
[PATCH] RISC-V Regression: Fix FAIL of bb-slp-68.c for RVV
Like comment said, this test failed on 64 bytes vector. Both RVV and GCN has 64 bytes vector. So it's more reasonable to use vect512. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-68.c: Use vect512. --- gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c index e7573a14933..2dd3d8ee90c 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c @@ -20,4 +20,4 @@ void foo () /* We want to have the store group split into 4, 2, 4 when using 32byte vectors. Unfortunately it does not work when 64-byte vectors are available. */ -/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } } } */ +/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail vect512 } } } */ -- 2.36.3
[Committed] RISC-V: Remove redundant iterators.
These iterators are redundant, removed and commmitted. gcc/ChangeLog: * config/riscv/vector-iterators.md: Remove redundant iterators. --- gcc/config/riscv/vector-iterators.md | 110 --- 1 file changed, 110 deletions(-) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 96ddd34c958..6800f8d3d76 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -295,83 +295,6 @@ RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") ]) -(define_mode_iterator VLMULEXT2 [ - RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") - - RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") - - (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") (RVVM2HF "TARGET_VECTOR_ELEN_FP_16") - (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16") - (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") - - RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") - - (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") - (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32") - - (RVVM4DI "TARGET_VECTOR_ELEN_64") (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64") - - (RVVM4DF "TARGET_VECTOR_ELEN_FP_64") (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64") -]) - -(define_mode_iterator VLMULEXT4 [ - RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") - - RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") - - (RVVM2HF "TARGET_VECTOR_ELEN_FP_16") (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16") - (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") - - RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") - - (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32") - - (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64") - - (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64") -]) - -(define_mode_iterator VLMULEXT8 [ - RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") - - RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") - - (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16") - (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") - - RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") - - (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32") - - (RVVM1DI "TARGET_VECTOR_ELEN_64") - - (RVVM1DF "TARGET_VECTOR_ELEN_FP_64") -]) - -(define_mode_iterator VLMULEXT16 [ - RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") - - RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") - - (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") - - (RVVMF2SI "TARGET_MIN_VLEN > 32") - - (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VLMULEXT32 [ - RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") - - (RVVMF4HI "TARGET_MIN_VLEN > 32") - - (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VLMULEXT64 [ - (RVVMF8QI "TARGET_MIN_VLEN > 32") -]) - (define_mode_iterator VEI16 [ RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") @@ -1579,39 +1502,6 @@ RVVM4x2QI ]) -(define_mode_iterator VQI [ - RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VHI [ - RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VSI [ - RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VDI [ - (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64") - (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64") -]) - -(define_mode_iterator VHF [ - (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH") - (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH") - (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VSF [ - (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") - (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32") -]) - -(define_mode_iterator VDF [ - (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64") - (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64") -]) - (define_mode_attr V_LMUL1 [ (RVVM8QI "RVVM1QI") (RVVM4QI "RVVM1QI") (RVVM2QI "RVVM1QI") (RVVM1QI "RVVM1QI") (RVVMF2QI "RVVM1QI") (RVVMF4QI "RVVM1QI") (RVVMF8QI "RVVM1QI") -- 2.36.3
[Committed] RISC-V: Fix vsingle attribute
RVVM2x2QI should be rvvm2qi instead of rvvmq1i. gcc/ChangeLog: * config/riscv/vector-iterators.md: Fix vsingle incorrect attribute for RVVM2x2QI. --- gcc/config/riscv/vector-iterators.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 6800f8d3d76..0850475edc1 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -2230,7 +2230,7 @@ (RVVM1x5QI "rvvm1qi") (RVVMF2x5QI "rvvmf2qi") (RVVMF4x5QI "rvvmf4qi") (RVVMF8x5QI "rvvmf8qi") (RVVM2x4QI "rvvm2qi") (RVVM1x4QI "rvvm1qi") (RVVMF2x4QI "rvvmf2qi") (RVVMF4x4QI "rvvmf4qi") (RVVMF8x4QI "rvvmf8qi") (RVVM2x3QI "rvvm2qi") (RVVM1x3QI "rvvm1qi") (RVVMF2x3QI "rvvmf2qi") (RVVMF4x3QI "rvvmf4qi") (RVVMF8x3QI "rvvmf8qi") - (RVVM4x2QI "rvvm4qi") (RVVM2x2QI "rvvm1qi") (RVVM1x2QI "rvvm1qi") (RVVMF2x2QI "rvvmf2qi") (RVVMF4x2QI "rvvmf4qi") (RVVMF8x2QI "rvvmf8qi") + (RVVM4x2QI "rvvm4qi") (RVVM2x2QI "rvvm2qi") (RVVM1x2QI "rvvm1qi") (RVVMF2x2QI "rvvmf2qi") (RVVMF4x2QI "rvvmf4qi") (RVVMF8x2QI "rvvmf8qi") (RVVM1x8HI "rvvm1hi") (RVVMF2x8HI "rvvmf2hi") (RVVMF4x8HI "rvvmf4hi") (RVVM1x7HI "rvvm1hi") (RVVMF2x7HI "rvvmf2hi") (RVVMF4x7HI "rvvmf4hi") -- 2.36.3
[PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrrt0,vlenb csrra6,vlenb sllit1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.viv4,v4,1 sllit3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7)- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) - spill sllia3,a3,1 addit4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleua3,a6,.L3 csrra4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vvv4,v24,v20 bgtua7,a6,.L13 csrra1,vlenb sllia1,a1,2 add a1,a1,sp li a4,-1 csrrt0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 sllit1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vvv12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addwa0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrra6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srlia2,a6,1 vmv.v.x v4,a1 vid.v v12 sllia3,a3,1 vand.vi v0,v12,1 addit1,a2,-1 vmseq.viv0,v0,1 sllia6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minua4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vvv2,v16,v6 bgtua4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,
[PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.
void foo8 (int64_t *restrict a) { for (int i = 0; i < 16; ++i) a[i] = a[i]-16; } We use VLS modes instead of VLA modes even it is specified by dynamic LMUL. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use VLS modes. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: New test. --- gcc/config/riscv/riscv-vector-costs.cc| 13 ++-- .../costmodel/riscv/rvv/no-dynamic-lmul-1.c | 64 +++ 2 files changed, 73 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 11257f7c2bd..4482af2e039 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -530,10 +530,6 @@ costs::preferred_new_lmul_p (const vector_costs *uncast_other) const auto other_loop_vinfo = as_a (other->m_vinfo); class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo); - if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo) - && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo)) -return false; - if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p) return false; else if (loop_autovec_infos.get (loop)) @@ -567,6 +563,15 @@ costs::preferred_new_lmul_p (const vector_costs *uncast_other) const machine_mode biggest_mode = compute_local_live_ranges (program_points_per_bb, live_ranges_per_bb); + /* If we can use simple VLS modes to handle NITERS element. + We don't need to use VLA modes with partial vector auto-vectorization. */ + if (LOOP_VINFO_NITERS_KNOWN_P (this_loop_vinfo) + && known_le (tree_to_poly_int64 (LOOP_VINFO_NITERS (this_loop_vinfo)) +* GET_MODE_SIZE (biggest_mode).to_constant (), + (int) RVV_M8 * BYTES_PER_RISCV_VECTOR) + && pow2p_hwi (LOOP_VINFO_INT_NITERS (this_loop_vinfo))) +return vector_costs::better_main_loop_than_p (other); + /* Update live ranges according to PHI. */ update_local_live_ranges (other->m_vinfo, program_points_per_bb, live_ranges_per_bb); diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c new file mode 100644 index 000..7ede148396f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mabi=lp64d -fdump-tree-vect-details" } */ + +#include + +void +foo (int8_t *restrict a) +{ + for (int i = 0; i < 4096; ++i) +a[i] = a[i]-16; +} + +void +foo2 (int16_t *restrict a) +{ + for (int i = 0; i < 2048; ++i) +a[i] = a[i]-16; +} + +void +foo3 (int32_t *restrict a) +{ + for (int i = 0; i < 1024; ++i) +a[i] = a[i]-16; +} + +void +foo4 (int64_t *restrict a) +{ + for (int i = 0; i < 512; ++i) +a[i] = a[i]-16; +} + +void +foo5 (int8_t *restrict a) +{ + for (int i = 0; i < 16; ++i) +a[i] = a[i]-16; +} + +void +foo6 (int16_t *restrict a) +{ + for (int i = 0; i < 16; ++i) +a[i] = a[i]-16; +} + +void +foo7 (int32_t *restrict a) +{ + for (int i = 0; i < 16; ++i) +a[i] = a[i]-16; +} + +void +foo8 (int64_t *restrict a) +{ + for (int i = 0; i < 16; ++i) +a[i] = a[i]-16; +} + +/* { dg-final { scan-tree-dump-not "Maximum lmul" "vect" } } */ +/* { dg-final { scan-assembler-times {vsetvli} 4 } } */ +/* { dg-final { scan-assembler-times {vsetivli} 4 } } */ -- 2.36.3
[PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrrt0,vlenb csrra6,vlenb sllit1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.viv4,v4,1 sllit3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7)- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) - spill sllia3,a3,1 addit4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleua3,a6,.L3 csrra4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vvv4,v24,v20 bgtua7,a6,.L13 csrra1,vlenb sllia1,a1,2 add a1,a1,sp li a4,-1 csrrt0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 sllit1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vvv12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addwa0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrra6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srlia2,a6,1 vmv.v.x v4,a1 vid.v v12 sllia3,a3,1 vand.vi v0,v12,1 addit1,a2,-1 vmseq.viv0,v0,1 sllia6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minua4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vvv2,v16,v6 bgtua4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,
[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrrt0,vlenb csrra6,vlenb sllit1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.viv4,v4,1 sllit3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7)- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) - spill sllia3,a3,1 addit4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleua3,a6,.L3 csrra4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vvv4,v24,v20 bgtua7,a6,.L13 csrra1,vlenb sllia1,a1,2 add a1,a1,sp li a4,-1 csrrt0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 sllit1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vvv12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addwa0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrra6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srlia2,a6,1 vmv.v.x v4,a1 vid.v v12 sllia3,a3,1 vand.vi v0,v12,1 addit1,a2,-1 vmseq.viv0,v0,1 sllia6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minua4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vvv2,v16,v6 bgtua4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,
[PATCH V4] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store
Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrrt0,vlenb csrra6,vlenb sllit1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.viv4,v4,1 sllit3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7)- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) - spill sllia3,a3,1 addit4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleua3,a6,.L3 csrra4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vvv4,v24,v20 bgtua7,a6,.L13 csrra1,vlenb sllia1,a1,2 add a1,a1,sp li a4,-1 csrrt0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 sllit1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vvv12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addwa0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrra6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srlia2,a6,1 vmv.v.x v4,a1 vid.v v12 sllia3,a3,1 vand.vi v0,v12,1 addit1,a2,-1 vmseq.viv0,v0,1 sllia6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minua4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vvv2,v16,v6 bgtua4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,
[PATCH V4] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma sllia6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrrt1,vlenb sllia5,a5,33 srlia7,t1,2 addia5,a5,1 sllia3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minua5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtua6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++ gcc/tree-vect-slp.cc | 22 ++ gcc/tree-vect-stmts.cc| 9 - 3 files changed, 41 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) +{ + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; +} +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index af8f5031bd2..b379278446b 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -550,6 +550,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return arg1_map; case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: return arg1_arg4_map; case IFN_MASK_STORE: @@ -717,8 +718,7 @@ vect_get_and_check_slp
[PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832 Now, we enable more tests in rvv.exp in this patch and fix the bug. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests. --- gcc/config/riscv/riscv-vector-costs.cc | 19 +-- gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 10 -- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 33061efb1d0..af87388a1e4 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -154,6 +154,14 @@ compute_local_program_points ( } } +static machine_mode +get_biggest_mode (machine_mode mode1, machine_mode mode2) +{ + unsigned int mode1_size = GET_MODE_BITSIZE (mode1).to_constant (); + unsigned int mode2_size = GET_MODE_BITSIZE (mode2).to_constant (); + return mode1_size >= mode2_size ? mode1 : mode2; +} + /* Compute local live ranges of each vectorized variable. Note that we only compute local live ranges (within a block) since local live ranges information is accurate enough for us to determine @@ -201,12 +209,12 @@ compute_local_live_ranges ( { unsigned int point = program_point.point; gimple *stmt = program_point.stmt; - machine_mode mode = biggest_mode; tree lhs = gimple_get_lhs (stmt); if (lhs != NULL_TREE && is_gimple_reg (lhs) && !POINTER_TYPE_P (TREE_TYPE (lhs))) { - mode = TYPE_MODE (TREE_TYPE (lhs)); + biggest_mode = get_biggest_mode (biggest_mode, + TYPE_MODE (TREE_TYPE (lhs))); bool existed_p = false; pair &live_range = live_ranges->get_or_insert (lhs, &existed_p); @@ -225,7 +233,9 @@ compute_local_live_ranges ( the future. */ if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var))) { - mode = TYPE_MODE (TREE_TYPE (var)); + biggest_mode + = get_biggest_mode (biggest_mode, + TYPE_MODE (TREE_TYPE (var))); bool existed_p = false; pair &live_range = live_ranges->get_or_insert (var, &existed_p); @@ -238,9 +248,6 @@ compute_local_live_ranges ( live_range = pair (0, point); } } - if (GET_MODE_SIZE (mode).to_constant () - > GET_MODE_SIZE (biggest_mode).to_constant ()) - biggest_mode = mode; } if (dump_enabled_p ()) for (hash_map::iterator iter = live_ranges->begin (); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp index ff76e17d0e6..674ba0d72b4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp @@ -58,10 +58,12 @@ set AUTOVEC_TEST_OPTS [list \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m8} \ + {-ftree-vectorize -O3 --param riscv-autovec-lmul=dynamic} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} \ - {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} ] + {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} \ + {-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic} ] foreach op $AUTOVEC_TEST_OPTS { dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/partial/*.\[cS\]]] \ "" "$op" @@ -104,18 +106,22 @@ set AUTOVEC_TEST_OPTS [list \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \ + {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=dynamic -ffast-math} \ {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math}
[PATCH] RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx
This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, MASK_16); } Before this patch: lui a5,%hi(.LC0) addia5,a5,%lo(.LC0) vsetivlizero,16,e8,m1,ta,ma vle8.v v3,0(a5) vle8.v v2,0(a1) vrgather.vv v1,v2,v3 vse8.v v1,0(a0) ret After this patch: vsetivlizero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivlizero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivlizero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret Overal reduce 1 instruction which is vector load instruction which is much more expansive than VL toggling. Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function. (expand_vec_perm_const_1): Add consecutive pattern recognition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test. --- gcc/config/riscv/riscv-v.cc | 85 + .../rvv/autovec/vls-vlmax/consecutive-1.c | 21 + .../rvv/autovec/vls-vlmax/consecutive-2.c | 45 + .../rvv/autovec/vls-vlmax/consecutive_run-1.c | 27 ++ .../rvv/autovec/vls-vlmax/consecutive_run-2.c | 51 ++ .../riscv/rvv/autovec/vls/consecutive-1.c | 94 +++ .../riscv/rvv/autovec/vls/consecutive-2.c | 68 ++ .../riscv/rvv/autovec/vls/consecutive-3.c | 68 ++ .../gcc.target/riscv/rvv/autovec/vls/def.h| 6 ++ 9 files changed, 465 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 21d86c3f917..895c11d13fc 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2822,6 +2822,89 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d) return true; } +/* Recognize the consecutive index that we can use a single + vrgather.v[x|i] to shuffle the vectors. + + e.g. short[8] = VEC_PERM_EXPR + Use SEW = 32, index = 1 vrgather.vi to get the result. */ +static bool +shuffle_consecutive_patterns (struct expand_vec_perm_d *d) +{ + machine_mode vmode = d->vmode; + scalar_mode smode = GET_MODE_INNER (vmode); + poly_int64 vec_len = d->perm.length (); + HOST_WIDE_INT elt; + + if (!vec_len.is_constant () || !d->perm[0].is_constant (&elt)) +return false; + int vlen = vec_len.to_constant (); + + /* Compute the last element index of consecutive pattern from the leading + consecutive elements. */ + int last_consecutive_idx = -1; + int consecutive_num = -1; + for (int i = 1; i < vlen; i++) +{ + if (maybe_ne (d->perm[i], d->perm[i - 1] + 1)) + break; + last_consecutive_idx = i; + consecutive_num = last_consecutive_idx + 1; +} + + int new_vlen = vlen / consecutive_num; + if (last_consecutive_idx < 0 || consecutive_num == vlen + || !pow2p_hwi (consecutive_num) || !pow2p_hwi (new_vlen)) +return false; + /* VEC_PERM <..., (index, index + 1, ... index + consecutive_num - 1)>. + All elements of index, index + 1, ... index + consecutive_num - 1 should + locate at the same vector. */ + if (maybe_ge (d->perm[0], vec_len) + != maybe_ge (d->perm[last_consecutive_idx], vec_len)) +return false; + /* If a vector has 8 elements. We allow optimizations on consecutive + patterns e.g. <0, 1, 2, 3, 0, 1, 2, 3> or <4, 5, 6, 7, 4, 5, 6, 7>. + Other patterns like <2, 3, 4, 5, 2, 3, 4, 5> are not feasible patterns + to be optimiz
[PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations: 1. better LICM. 2. More opportunities of transforming 'vv' into 'vx' in the future. Before this patch: f3: ble a4,zero,.L8 csrrt0,vlenb sllit1,t0,4 csrra6,vlenb sub sp,sp,t1 csrra5,vlenb sllia6,a6,3 sllia5,a5,2 add a6,a6,sp vsetvli a7,zero,e16,m8,ta,ma sllia4,a4,3 vid.v v8 addit6,a5,-1 vand.vi v8,v8,-2 neg t5,a5 vs8r.v v8,0(sp) vadd.vi v8,v8,1 vs8r.v v8,0(a6) j .L4 .L12: vsetvli a7,zero,e16,m8,ta,ma .L4: csrrt0,vlenb sllit0,t0,3 vl8re16.v v16,0(sp) add t0,t0,sp vmv.v.x v8,t6 mv t1,a4 vand.vv v24,v16,v8 mv a6,a4 vl8re16.v v16,0(t0) vand.vv v8,v16,v8 bleua4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v20,0(a2) vle8.v v16,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v24 vadd.vv v4,v16,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v20,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtut1,a5,.L12 csrrt0,vlenb sllit1,t0,4 add sp,sp,t1 jr ra .L8: ret After this patch: bar: ble a3,zero,.L5 csrra5,vlenb csrrt1,vlenb srlia5,a5,1 srlia7,t1,1 addia5,a5,-1 vsetvli a4,zero,e32,m2,ta,ma sllia3,a3,1 vmv.v.x v2,a5 vid.v v18 vmv.v.x v6,a1 vand.vi v10,v18,-2 vand.vi v0,v18,1 vadd.vi v16,v10,1 vmseq.viv0,v0,1 vand.vv v10,v10,v2 vand.vv v16,v16,v2 sllit1,t1,1 vsetvli zero,a4,e32,m2,ta,ma neg t3,a7 viota.m v4,v0 vsetvli a4,zero,e32,m2,ta,mu vmv.v.x v8,a2 vrgather.vv v14,v6,v4 vrgather.vv v12,v8,v4 vmv.v.i v2,0 vrgather.vv v14,v8,v4,v0.t vrgather.vv v12,v6,v4,v0.t .L4: mv a2,a3 mv a5,a3 bleua3,a7,.L3 mv a5,a7 .L3: vsetvli zero,a5,e32,m2,ta,ma vle32.v v6,0(a0) vsetvli a6,zero,e32,m2,ta,ma add a3,a3,t3 vrgather.vv v4,v6,v10 vrgather.vv v8,v6,v16 vsub.vv v4,v4,v12 add a0,a0,t1 vsetvli zero,a5,e32,m2,tu,ma vadd.vv v2,v2,v4 vmacc.vvv2,v14,v8 bgtua2,a7,.L4 li a5,-1 vsetvli a6,zero,e32,m2,ta,ma li a4,0 vmv.v.i v4,0 vmul.vx v0,v18,a5 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v4 vand.vi v18,v18,1 vmerge.vvm v6,v4,v2,v0 vmseq.vvv18,v18,v4 vmv.s.x v1,a4 vmv1r.v v0,v18 vredsum.vs v6,v6,v1 vmerge.vvm v4,v4,v2,v0 vmv.x.s a0,v6 vredsum.vs v4,v4,v1 vmv.x.s a5,v4 addwa0,a0,a5 ret .L5: li a0,0 ret Note that this patch triggers multiple FAILs: FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/
[PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations: 1. better LICM. 2. More opportunities of transforming 'vv' into 'vx' in the future. Before this patch: f3: ble a4,zero,.L8 csrrt0,vlenb sllit1,t0,4 csrra6,vlenb sub sp,sp,t1 csrra5,vlenb sllia6,a6,3 sllia5,a5,2 add a6,a6,sp vsetvli a7,zero,e16,m8,ta,ma sllia4,a4,3 vid.v v8 addit6,a5,-1 vand.vi v8,v8,-2 neg t5,a5 vs8r.v v8,0(sp) vadd.vi v8,v8,1 vs8r.v v8,0(a6) j .L4 .L12: vsetvli a7,zero,e16,m8,ta,ma .L4: csrrt0,vlenb sllit0,t0,3 vl8re16.v v16,0(sp) add t0,t0,sp vmv.v.x v8,t6 mv t1,a4 vand.vv v24,v16,v8 mv a6,a4 vl8re16.v v16,0(t0) vand.vv v8,v16,v8 bleua4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v20,0(a2) vle8.v v16,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v24 vadd.vv v4,v16,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v20,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtut1,a5,.L12 csrrt0,vlenb sllit1,t0,4 add sp,sp,t1 jr ra .L8: ret After this patch: f3: ble a4,zero,.L6 csrra6,vlenb csrra5,vlenb sllia6,a6,2 sllia5,a5,2 addia6,a6,-1 sllia4,a4,3 neg t5,a5 vsetvli t1,zero,e16,m8,ta,ma vmv.v.x v24,a6 vid.v v8 vand.vi v8,v8,-2 vadd.vi v16,v8,1 vand.vv v8,v8,v24 vand.vv v16,v16,v24 .L4: mv t1,a4 mv a6,a4 bleua4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v28,0(a2) vle8.v v24,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v28,v8 vadd.vv v4,v24,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v28,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v28,v16 vadd.vv v4,v4,v24 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtut1,a5,.L4 .L6: ret Note that this patch triggers multiple FAILs: FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test They failed are all because of bugs on VSETVL PASS: 10dd4: 0c707057vsetvli zero,zero,e8,mf2,ta,ma 10dd8: 5e06b8d7vmv.v.i v17,13 10ddc: 9ed030d7vmv1r.v v1,v13 10de0: b21040d7vncvt.x.x.w v1,v1 > raise illegal instruction since we don't have SEW = 8 -> SEW = 4 narrowing. 10de4: 5e0785d7vmv.v.v v11,v15 Confirm the recent VSETVL refactor patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of them. So this patch should be committed after the VSETVL refactor patch. PR target/111848 gcc/Change
[PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma sllia6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrrt1,vlenb sllia5,a5,33 srlia7,t1,2 addia5,a5,1 sllia3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minua5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtua6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++ gcc/tree-vect-slp.cc | 22 ++ gcc/tree-vect-stmts.cc| 12 +++- 3 files changed, 44 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) +{ + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; +} +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index d081999a763..146dba658a2 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return arg1_map; case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: return arg1_arg4_map; case IFN_MASK_STORE: @@ -719,8 +720,7 @@ vect_get_and_check_
[PATCH] RISC-V: Add RVV FMA auto-vectorization support
From: Juzhe-Zhong This patch support FMA auto-vectorization pattern. 1. Let's RA decide vmacc or vmadd. 2. Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing ternop-3.c. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern. (*fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): Add ternary enum. (emit_vlmax_ternop_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_ternop_insn): Ditto. * config/riscv/vector.md: Fix ternary patterns bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add ternop tests. * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test. --- gcc/config/riscv/autovec.md | 65 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 22 gcc/config/riscv/vector.md| 2 +- .../riscv/rvv/autovec/ternop/ternop-1.c | 27 + .../riscv/rvv/autovec/ternop/ternop-2.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop-3.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-1.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-2.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-3.c | 104 ++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + 11 files changed, 477 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 7fe4d94de39..ba1240014dc 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -373,3 +373,68 @@ DONE; } ) + +;; = +;; == Ternary arithmetic +;; = + +;; - +;; [INT] VMACC and VMADD +;; - +;; Includes: +;; - vmacc +;; - vmadd +;; - + +;; We can't expand FMA for the following reasons: +;; 1. Before RA, we don't know which multiply-add instruction is the ideal one. +;;The vmacc is the ideal instruction when operands[3] overlaps operands[0]. +;;The vmadd is the ideal instruction when operands[1|2] overlaps operands[0]. +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which +;;is the operands[5]. Since operands[5] should overlap operands[0], this operand +;;should be allocated the same regno as operands[1|2|3]. +;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined +;;operand. +;; 3. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand. +;; +;; In this situation, we design the codegen of FMA as follows: +;; 1. clobber a scratch in the expand pattern of FMA. +;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap operands[0]. +;; 3. Generate instructions (vmacc or vmadd) according to the register allocation +;;result after reload_completed. +(define_expand "fma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr")) + (match_operand:VI 3 "register_operand" " vr"))) + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand" " %0, vr, vr") + (match_operand:VI 2 "register_operand" " vr, vr, vr")) + (match_operand:VI 3 "register_operand"
[PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
From: Juzhe-Zhong This patch support FMA auto-vectorization pattern. 1. Let's RA decide vmacc or vmadd. 2. Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing ternop-3.c. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern. (*fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. (emit_vlmax_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto. * config/riscv/vector.md: Fix vimuladd instruction bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add ternary tests * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test. --- gcc/config/riscv/autovec.md | 65 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 20 gcc/config/riscv/vector.md| 2 +- .../riscv/rvv/autovec/ternop/ternop-1.c | 28 + .../riscv/rvv/autovec/ternop/ternop-2.c | 34 ++ .../riscv/rvv/autovec/ternop/ternop-3.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-1.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-2.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-3.c | 104 ++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + 11 files changed, 477 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 7fe4d94de39..04825df1210 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -373,3 +373,68 @@ DONE; } ) + +;; = +;; == Ternary arithmetic +;; = + +;; - +;; [INT] VMACC and VMADD +;; - +;; Includes: +;; - vmacc +;; - vmadd +;; - + +;; We can't expand FMA for the following reasons: +;; 1. Before RA, we don't know which multiply-add instruction is the ideal one. +;;The vmacc is the ideal instruction when operands[3] overlaps operands[0]. +;;The vmadd is the ideal instruction when operands[1|2] overlaps operands[0]. +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which +;;is the operands[5]. Since operands[5] should overlap operands[0], this operand +;;should be allocated the same regno as operands[1|2|3]. +;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined +;;operand. +;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand. +;; +;; In this situation, we design the codegen of FMA as follows: +;; 1. clobber a scratch in the expand pattern of FMA. +;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap operands[0]. +;; 3. Generate instructions (vmacc or vmadd) according to the register allocation +;;result after reload_completed. +(define_expand "fma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr")) + (match_operand:VI 3 "register_operand" " vr"))) + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand" " %0, vr, vr") + (match_operand:VI 2 "register_operand" " vr, vr, vr")) + (match_operand:VI 3 "register_operand"
[PATCH] RISC-V: Remove redundant printf of abs-run.c
From: Juzhe-Zhong Notice that this testcase cause unexpected fail: FAIL: gcc.target/riscv/rvv/autovec/unop/abs-run.c (test for excess errors) Excess errors: /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7: warning: implicit declaration of function 'printf' [-Wimplicit-function-declaration] /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7: warning: incompatible implicit declaration of built-in function 'printf' [-Wbuiltin-declaration-mismatch] /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7: warning: incompatible implicit declaration of built-in function 'printf' [-Wbuiltin-declaration-mismatch] /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7: warning: incompatible implicit declaration of built-in function 'printf' [-Wbuiltin-declaration-mismatch] /work/home/jzzhong/work/rvv-opensource/software/host/toolchain/gcc/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c:22:7: warning: incompatible implicit declaration of built-in function 'printf' [-Wbuiltin-declaration-mismatch] spawn /work/home/jzzhong/work/rvv-opensource/output/sim/bin/spike --isa=RV64GCVZfh /work/home/jzzhong/work/rvv-opensource/output/sim/riscv64-rivai-elf/bin/pk ./abs-run.exe^M bbl loader^M^M 0 0 -64^M 1 63 -63^M 2 2 -62^M 3 61 -61^M 4 4 -60^M 5 59 -59^M 6 6 -58^M 7 57 -57^M 8 8 -56^M 9 55 -55^M 10 10 -54^M 11 53 -53^M 12 12 -52^M 13 51 -51^M Remove printf since it's unnecessary. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/abs-run.c: Remove redundant printf. --- gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c index 7404dbe037e..d864b54229b 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c @@ -19,7 +19,6 @@ vabs_##TYPE (a##TYPE, a##TYPE, SZ); \ for (int i = 0; i < SZ; i++) \ { \ - printf ("%d %d %d\n", i, a##TYPE[i], i - 64);\ if (i & 1) \ assert (a##TYPE[i] == abs (i - 64));\ else \ -- 2.36.3
[PATCH] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support
From: Juzhe-Zhong Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion now since it's not depending on FRM and we don't need mode switching support for this ('rtz' conversions independent FRM). gcc/ChangeLog: * config/riscv/autovec.md (2): New pattern. * config/riscv/iterators.md: New attribute. * config/riscv/vector-iterators.md: New attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test. --- gcc/config/riscv/autovec.md | 23 gcc/config/riscv/iterators.md | 4 +- gcc/config/riscv/vector-iterators.md | 5 ++ .../rvv/autovec/conversions/vfcvt_rtz-run.c | 52 +++ .../autovec/conversions/vfcvt_rtz-rv32gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-rv64gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-template.h | 15 ++ 7 files changed, 110 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index b24867ae4d0..3989ffb26ee 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -478,6 +478,29 @@ DONE; }) +;; = +;; == Conversions +;; = + +;; - +;; [INT<-FP] Conversions +;; - +;; Includes: +;; - vfcvt.rtz.xu.f.v +;; - vfcvt.rtz.x.f.v +;; - + +(define_expand "2" + [(set (match_operand: 0 "register_operand") + (any_fix: + (match_operand:VF 1 "register_operand")))] + "TARGET_VECTOR" +{ + insn_code icode = code_for_pred (, mode); + riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands); + DONE; +}) + ;; = ;; == Unary arithmetic ;; = diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 8afe98e4410..d374a10810c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -225,7 +225,9 @@ (ss_minus "sssub") (us_minus "ussub") (sign_extend "extend") -(zero_extend "zero_extend")]) +(zero_extend "zero_extend") +(fix "fix_trunc") +(unsigned_fix "fixuns_trunc")]) ;; code attributes (define_code_attr or_optab [(ior "ior") diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 70fb5b80b1b..937ec3c7f67 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1208,6 +1208,11 @@ (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI") ]) +(define_mode_attr vconvert [ + (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si") + (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di") +]) + (define_mode_attr VNCONVERT [ (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI") (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c b/gcc/testsuite/gcc.target/riscv/r
[PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support
From: Juzhe-Zhong Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion now since it's not depending on FRM and we don't need mode switching support for this ('rtz' conversions independent FRM). gcc/ChangeLog: * config/riscv/autovec.md (2): New pattern. * config/riscv/iterators.md: New attribute. * config/riscv/vector-iterators.md: New attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test. --- gcc/config/riscv/autovec.md | 23 gcc/config/riscv/iterators.md | 4 +- gcc/config/riscv/vector-iterators.md | 5 ++ .../rvv/autovec/conversions/vfcvt_rtz-run.c | 52 +++ .../autovec/conversions/vfcvt_rtz-rv32gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-rv64gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-template.h | 15 ++ 7 files changed, 110 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index b24867ae4d0..3989ffb26ee 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -478,6 +478,29 @@ DONE; }) +;; = +;; == Conversions +;; = + +;; - +;; [INT<-FP] Conversions +;; - +;; Includes: +;; - vfcvt.rtz.xu.f.v +;; - vfcvt.rtz.x.f.v +;; - + +(define_expand "2" + [(set (match_operand: 0 "register_operand") + (any_fix: + (match_operand:VF 1 "register_operand")))] + "TARGET_VECTOR" +{ + insn_code icode = code_for_pred (, mode); + riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands); + DONE; +}) + ;; = ;; == Unary arithmetic ;; = diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 8afe98e4410..d374a10810c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -225,7 +225,9 @@ (ss_minus "sssub") (us_minus "ussub") (sign_extend "extend") -(zero_extend "zero_extend")]) +(zero_extend "zero_extend") +(fix "fix_trunc") +(unsigned_fix "fixuns_trunc")]) ;; code attributes (define_code_attr or_optab [(ior "ior") diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 70fb5b80b1b..937ec3c7f67 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1208,6 +1208,11 @@ (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI") ]) +(define_mode_attr vconvert [ + (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si") + (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di") +]) + (define_mode_attr VNCONVERT [ (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI") (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c b/gcc/testsuite/gcc.target/riscv/r
[PATCH] RISC-V: Add RVV FNMA auto-vectorization support
From: Juzhe-Zhong Like FMA, Add FNMA auto-vectorization support. gcc/ChangeLog: * config/riscv/autovec.md (fnma4): New pattern. (*fnma): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test. --- gcc/config/riscv/autovec.md | 45 .../riscv/rvv/autovec/ternop/ternop-4.c | 28 + .../riscv/rvv/autovec/ternop/ternop-5.c | 34 ++ .../riscv/rvv/autovec/ternop/ternop-6.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-4.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-5.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-6.c | 104 ++ 7 files changed, 432 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index eff3e484fb4..20004a8af27 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -606,3 +606,48 @@ } [(set_attr "type" "vimuladd") (set_attr "mode" "")]) + +;; - +;; [INT] VMACC and VMADD +;; - +;; Includes: +;; - vnmsac +;; - vnmsub +;; - + +(define_expand "fnma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr") + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr" + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fnma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr, 0, vr") + (mult:VI + (match_operand:VI 1 "register_operand" " %0, vr, vr") + (match_operand:VI 2 "register_operand" " vr, vr, vr" + (clobber (match_scratch:SI 4 "=r,r,r"))] + "TARGET_VECTOR" + "#" + "&& reload_completed" + [(const_int 0)] + { +PUT_MODE (operands[4], Pmode); +riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); +if (which_alternative == 2) + emit_insn (gen_rtx_SET (operands[0], operands[3])); +rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul (mode), + riscv_vector::RVV_TERNOP, ops, operands[4]); +DONE; + } + [(set_attr "type" "vimuladd") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c new file mode 100644 index 000..22d11de89a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */ + +#include + +#define TEST_TYPE(TYPE) \ + __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ + TYPE *__restrict a, \ + TYPE *__restrict b, int n) \ + { \ +for (int i = 0; i < n; i++) \ + dst[i] += -(a[i] * b[i]); \ + } + +#define TEST_ALL() \ + TEST_T
[PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support
From: Juzhe-Zhong Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support. gcc/ChangeLog: * config/riscv/autovec.md (fnma4): New pattern. (*fnma): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test. --- gcc/config/riscv/autovec.md | 45 .../riscv/rvv/autovec/ternop/ternop-4.c | 28 + .../riscv/rvv/autovec/ternop/ternop-5.c | 34 ++ .../riscv/rvv/autovec/ternop/ternop-6.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-4.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-5.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-6.c | 104 ++ 7 files changed, 432 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index eff3e484fb4..a1028d71467 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -606,3 +606,48 @@ } [(set_attr "type" "vimuladd") (set_attr "mode" "")]) + +;; - +;; [INT] VNMSAC and VNMSUB +;; - +;; Includes: +;; - vnmsac +;; - vnmsub +;; - + +(define_expand "fnma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr") + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr" + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fnma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr, 0, vr") + (mult:VI + (match_operand:VI 1 "register_operand" " %0, vr, vr") + (match_operand:VI 2 "register_operand" " vr, vr, vr" + (clobber (match_scratch:SI 4 "=r,r,r"))] + "TARGET_VECTOR" + "#" + "&& reload_completed" + [(const_int 0)] + { +PUT_MODE (operands[4], Pmode); +riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); +if (which_alternative == 2) + emit_insn (gen_rtx_SET (operands[0], operands[3])); +rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul (mode), + riscv_vector::RVV_TERNOP, ops, operands[4]); +DONE; + } + [(set_attr "type" "vimuladd") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c new file mode 100644 index 000..22d11de89a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */ + +#include + +#define TEST_TYPE(TYPE) \ + __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ + TYPE *__restrict a, \ + TYPE *__restrict b, int n) \ + { \ +for (int i = 0; i < n; i++) \ + dst[i] += -(a[i] * b[i]); \ + } + +#define TEST_ALL()
[PATCH] RISC-V: Fix warning in riscv.md
From: Juzhe-Zhong Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ???rtx_def* gen_anddi3(rtx, rtx, rtx)???: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) Add unsigned conversion to fix this warning. gcc/ChangeLog: * config/riscv/riscv.md: Fix signed and unsigned comparison warning. --- gcc/config/riscv/riscv.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index aba203318a7..3d71f59c3a9 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1353,9 +1353,9 @@ if (CONST_INT_P (operands[2])) { enum machine_mode tmode = VOIDmode; - if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) + if ((unsigned HOST_WIDE_INT) INTVAL (operands[2]) == GET_MODE_MASK (HImode)) tmode = HImode; - else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) + else if ((unsigned HOST_WIDE_INT) INTVAL (operands[2]) == GET_MODE_MASK (SImode)) tmode = SImode; if (tmode != VOIDmode) -- 2.36.3
[PATCH V2] RISC-V: Fix warning in riscv.md
From: Juzhe-Zhong Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ???rtx_def* gen_anddi3(rtx, rtx, rtx): ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) Add unsigned conversion to fix this warning. gcc/ChangeLog: * config/riscv/riscv.md: Fix signed and unsigned comparison warning. --- gcc/config/riscv/riscv.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index aba203318a7..f545874edc1 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1353,9 +1353,9 @@ if (CONST_INT_P (operands[2])) { enum machine_mode tmode = VOIDmode; - if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) + if (UINTVAL (operands[2]) == GET_MODE_MASK (HImode)) tmode = HImode; - else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) + else if (UINTVAL (operands[2]) == GET_MODE_MASK (SImode)) tmode = SImode; if (tmode != VOIDmode) -- 2.36.3
[PATCH] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. ALL tests (decrement IV) of RVV are passed. Ok for trunk? gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto. --- gcc/tree-vect-loop-manip.cc | 40 + 1 file changed, 27 insertions(+), 13 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index acf3642ceb2..ef28711c58f 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator loop_cond_gsi, rgroup_controls *rgc, tree niters, tree niters_skip, bool might_wrap_p, -tree *iv_step) +tree *iv_step, tree *compare_step) { tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); @@ -538,24 +538,26 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... - ivtmp_35 = ivtmp_9 - _36; + ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; ... - if (ivtmp_35 != 0) -goto ; [83.33%] + if (ivtmp_9 > POLY_INT_CST [4, 4]) +goto ; [83.33%] else -goto ; [16.67%] +goto ; [16.67%] */ nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total); tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, -insert_after, &index_before_incr, &index_after_incr); + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, +&incr_gsi, insert_after, &index_before_incr, +&index_after_incr); gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, index_before_incr, nitems_step)); *iv_step = step; - return index_after_incr; + *compare_step = nitems_step; + return index_before_incr; } /* Create increment IV. */ @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, arbitrarily pick the last. */ tree test_ctrl = NULL_TREE; tree iv_step = NULL_TREE; + tree compare_step = NULL_TREE; rgroup_controls *rgc; rgroup_controls *iv_rgc = nullptr; unsigned int i; @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, &preheader_seq, &header_seq, loop_cond_gsi, rgc, niters, niters_skip, might_wrap_p, -&iv_step); +&iv_step, &compare_step); iv_rgc = rgc; } @@ -884,11 +887,22 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); - tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + gcond *cond_stmt; tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); - gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl, - NULL_TREE, NULL_TREE); - gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) +{ + gcc_assert (compare_step); + cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step, +NULL_TREE, NULL_TREE); + gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); +} + else +{ + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + cond_stmt + = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE); + gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); +} /* The loop iterates (NITERS - 1) / VF + 1 times. Subtract one from this to get the latch count. */ -- 2.36.1
[PATCH] RISC-V: Support RVV permutation auto-vectorization
From: Juzhe-Zhong This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. gcc/ChangeLog: * config/riscv/autovec.md (vec_perm): New pattern. * config/riscv/predicates.md (vector_perm_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_perm): New function. * config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Ditto. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_vec_perm): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test. --- gcc/config/riscv/autovec.md | 19 +++ gcc/config/riscv/predicates.md| 4 + gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 130 + .../riscv/rvv/autovec/vls-vlmax/perm-1.c | 58 .../riscv/rvv/autovec/vls-vlmax/perm-2.c | 33 + .../riscv/rvv/autovec/vls-vlmax/perm-3.c | 29 .../riscv/rvv/autovec/vls-vlmax/perm-4.c | 58 .../riscv/rvv/autovec/vls-vlmax/perm-5.c | 49 +++ .../riscv/rvv/autovec/vls-vlmax/perm-6.c | 58 .../riscv/rvv/autovec/vls-vlmax/perm-7.c | 49 +++ .../riscv/rvv/autovec/vls-vlmax/perm.h| 70 + .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c | 104 + .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c | 32 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c | 20 +++ .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c | 104 + .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c | 137 ++ .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c | 104 + .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c | 135 + 19 files changed, 1195 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 0314e7587d0..4834bb4b412 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -83,6 +83,25 @@ } ) +;; - +;; [INT,FP] permutation +;; - +;; This is the pattern permutes the vector +;; - + +(define_expand "vec_perm" + [(match_operand:V 0 "register_operand") + (match_operand:V 1 "register_operand") + (match_operand:V 2 "register_operand") + (match_operand: 3 &
[PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization
From: Juzhe-Zhong Apparently, we are missing vrsub.vi tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto. --- .../riscv/rvv/autovec/binop/vsub-run.c| 30 ++- .../riscv/rvv/autovec/binop/vsub-rv32gcv.c| 1 + .../riscv/rvv/autovec/binop/vsub-rv64gcv.c| 1 + .../riscv/rvv/autovec/binop/vsub-template.h | 28 + 4 files changed, 59 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c index 8c6d8e88d1a..4f254872e33 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c @@ -27,6 +27,22 @@ for (int i = 0; i < SZ; i++) \ assert (as##TYPE[i] == 999 - VAL); +#define RUN3(TYPE) \ + TYPE as2##TYPE[SZ]; \ + for (int i = 0; i < SZ; i++) \ +as2##TYPE[i] = i * 33 - 779; \ + vsubi_##TYPE (as2##TYPE, as2##TYPE, SZ); \ + for (int i = 0; i < SZ; i++) \ +assert (as2##TYPE[i] == (TYPE)(-16 - (i * 33 - 779))); + +#define RUN4(TYPE) \ + TYPE as3##TYPE[SZ]; \ + for (int i = 0; i < SZ; i++) \ +as3##TYPE[i] = i * -17 + 667; \ + vsubi2_##TYPE (as3##TYPE, as3##TYPE, SZ);\ + for (int i = 0; i < SZ; i++) \ +assert (as3##TYPE[i] == (TYPE)(15 - (i * -17 + 667))); + #define RUN_ALL() \ RUN(int16_t, 1) \ RUN(uint16_t, 2) \ @@ -39,7 +55,19 @@ RUN2(int32_t, 9) \ RUN2(uint32_t, 10)\ RUN2(int64_t, 11) \ - RUN2(uint64_t, 12) + RUN2(uint64_t, 12)\ + RUN3(int16_t) \ + RUN3(uint16_t)\ + RUN3(int32_t) \ + RUN3(uint32_t)\ + RUN3(int64_t) \ + RUN3(uint64_t)\ + RUN4(int16_t) \ + RUN4(uint16_t)\ + RUN4(int32_t) \ + RUN4(uint32_t)\ + RUN4(int64_t) \ + RUN4(uint64_t) int main () { diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c index e2bdd0fe904..a0d3802be65 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c @@ -4,3 +4,4 @@ #include "vsub-template.h" /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */ +/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c index f7a2691b9f3..562c026a7e4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c @@ -4,3 +4,4 @@ #include "vsub-template.h" /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */ +/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h index 8c0a9c99217..47f07f13462 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h @@ -16,6 +16,22 @@ dst[i] = a[i] - b; \ } +#define TEST3_TYPE(TYPE) \ + __attribute__((noipa)) \ + void vsubi_##TYPE (TYPE *dst, TYPE *a, int n)\ + {\ +for (int i = 0; i < n; i++)\ + dst[i] = -16 - a[i]; \ + } + +#define TEST4_TYPE(TYPE) \ + __attribute__((noipa)) \ + void vsubi2_##TYPE (TYPE *dst, TYPE *a, int n) \ + {\ +for (int i = 0; i < n; i++)\ + dst[i] = 15 - a[i]; \ + } + /* *int8_t not autovec currently. */ #define TEST_ALL() \ TEST_TYPE(int16_t)\ @@ -30,5 +46,17 @@ TEST2_TYPE(uint32_t) \ TEST2_TYPE(int64_t) \ TEST2_TYPE(uint64_t) + TEST3_TYPE(int16_t) \ + TEST3_TYPE(uint16_t) \ + TEST3_TYPE(int32_t) \ + TEST3_TYPE(uint32_t) \ + TEST3_TYPE(int64_t) \ + TEST3_TYPE(uint64_t) \ + TEST4_TYPE(int16_t) \ + TEST4_TYPE(uint16_t) \ + TEST4_TYPE(int32_t) \ + TES
[PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)
From: Juzhe-Zhong Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index cd696da5d89..28e7e63ce69 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7180,10 +7180,8 @@ (match_operand 5 "const_int_operand" "i,i") (match_operand 6 "const_int_operand" "i,i") (match_operand 7 "const_int_operand" "i,i") -(match_operand 8 "const_int_operand" "i,i") (reg:SI VL_REGNUM) -(reg:SI VTYPE_REGNUM) -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (float_extend:VWEXTF (match_operand: 3 "register_operand" " vr, vr")) (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0")))] -- 2.36.1
[PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)
From: Juzhe-Zhong Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 28e7e63ce69..3c4565dc775 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7159,10 +7159,8 @@ (match_operand 5 "const_int_operand""i,i") (match_operand 6 "const_int_operand""i,i") (match_operand 7 "const_int_operand""i,i") -(match_operand 8 "const_int_operand""i,i") (reg:SI VL_REGNUM) -(reg:SI VTYPE_REGNUM) -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (any_float:VF (match_operand: 3 "register_operand" " vr, vr")) (match_operand:VF 2 "vector_merge_operand" " vu,0")))] -- 2.36.1
[PATCH] RISC-V: Remove FRM for vfncvt.rod instruction
From: Juzhe-Zhong Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. --- gcc/config/riscv/vector.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 3c4565dc775..cd41ebbb24f 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7286,10 +7286,8 @@ (match_operand 5 "const_int_operand" " i, i, i, i,i,i") (match_operand 6 "const_int_operand" " i, i, i, i,i,i") (match_operand 7 "const_int_operand" " i, i, i, i,i,i") -(match_operand 8 "const_int_operand" " i, i, i, i,i,i") (reg:SI VL_REGNUM) -(reg:SI VTYPE_REGNUM) -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (unspec: [(float_truncate: (match_operand:VWEXTF 3 "register_operand" " 0, 0, 0, 0, vr, vr"))] UNSPEC_ROD) -- 2.36.1
[PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization
From: Juzhe-Zhong The approach is quite simple and obvious, changing extension pattern into define_insn_and_split will make combine PASS combine into widen operations naturally. gcc/ChangeLog: * config/riscv/autovec.md (2): Change expand into define_insn_and_split. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: * gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test. --- gcc/config/riscv/autovec.md | 13 --- .../riscv/rvv/autovec/widen/widen-1.c | 27 +++ .../riscv/rvv/autovec/widen/widen-2.c | 27 +++ .../riscv/rvv/autovec/widen/widen-3.c | 27 +++ .../riscv/rvv/autovec/widen/widen-4.c | 23 + .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-4.c | 31 + gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++ 10 files changed, 259 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 4834bb4b412..e96de60123b 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -401,16 +401,21 @@ ;; - vsext.vf[2|4|8] ;; - -(define_expand "2" - [(set (match_operand:VWEXTI 0 "register_operand") +(define_insn_and_split "2" + [(set (match_operand:VWEXTI 0 "register_operand" "=&vr") (any_extend:VWEXTI - (match_operand: 1 "register_operand")))] + (match_operand: 1 "register_operand" "vr")))] "TARGET_VECTOR" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] { insn_code icode = code_for_pred_vf2 (, mode); riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands); DONE; -}) +} + [(set_attr "type" "vext") + (set_attr "mode" "")]) (define_expand "2" [(set (match_operand:VQEXTI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c new file mode 100644 index 000..00edecab089 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */ + +#include + +#define TEST_TYPE(TYPE1, TYPE2) \ + __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst, \ + TYPE2 *__restrict a, \ + TYPE2 *__restrict b, \ + int n) \ + { \ +for (int i = 0; i < n; i++) \ + dst[i] = (TYPE1) a[i] + (TYPE1) b[i]; \ + } + +#define TEST_ALL() \ + TEST_TYPE (int16_t, int8_t) \ + TEST_TYPE (uint16_t, uint8_t) \ + TEST_TYPE (int32_t, int16_t) \ + TEST_TYPE (uint32_t, uint16_t) \ + TEST_TYPE (int64_t, int32_t)
[PATCH V2] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization
From: Juzhe-Zhong Base on V1 patch, adding comment: ;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS ;; to combine instructions as below: ;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv gcc/ChangeLog: * config/riscv/autovec.md (2): Change expand into define_insn_and_split. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: * gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test. --- gcc/config/riscv/autovec.md | 16 ++--- .../riscv/rvv/autovec/widen/widen-1.c | 27 +++ .../riscv/rvv/autovec/widen/widen-2.c | 27 +++ .../riscv/rvv/autovec/widen/widen-3.c | 27 +++ .../riscv/rvv/autovec/widen/widen-4.c | 23 + .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-4.c | 31 + gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++ 10 files changed, 262 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 4834bb4b412..2a21ce3f93c 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -401,16 +401,24 @@ ;; - vsext.vf[2|4|8] ;; - -(define_expand "2" - [(set (match_operand:VWEXTI 0 "register_operand") +;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS +;; to combine instructions as below: +;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv +(define_insn_and_split "2" + [(set (match_operand:VWEXTI 0 "register_operand" "=&vr") (any_extend:VWEXTI - (match_operand: 1 "register_operand")))] + (match_operand: 1 "register_operand" "vr")))] "TARGET_VECTOR" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] { insn_code icode = code_for_pred_vf2 (, mode); riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands); DONE; -}) +} + [(set_attr "type" "vext") + (set_attr "mode" "")]) (define_expand "2" [(set (match_operand:VQEXTI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c new file mode 100644 index 000..00edecab089 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */ + +#include + +#define TEST_TYPE(TYPE1, TYPE2) \ + __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst, \ + TYPE2 *__restrict a, \ + TYPE2 *__restrict b, \ + int n) \ + { \ +for (int i = 0; i < n; i++) \ + dst[i] = (TYPE1) a[i] + (TYPE1) b[i]; \ + } + +#define TEST_ALL() \ + TEST_TYPE (int16_t, int8_t) \ + TEST_TYPE (uint16_t, uint8_t)
[PATCH V2] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kewen. This patch will need to wait for Kewen's test feedback. Testing on X86 is on-going Co-Authored by: Kewen Lin gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto. --- gcc/tree-vect-loop-manip.cc | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index acf3642ceb2..3f735945e67 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator loop_cond_gsi, rgroup_controls *rgc, tree niters, tree niters_skip, bool might_wrap_p, -tree *iv_step) +tree *iv_step, tree *compare_step) { tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... - ivtmp_35 = ivtmp_9 - _36; + ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; ... - if (ivtmp_35 != 0) + if (ivtmp_9 > POLY_INT_CST [4, 4]) goto ; [83.33%] else goto ; [16.67%] @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, -insert_after, &index_before_incr, &index_after_incr); + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, +&incr_gsi, insert_after, &index_before_incr, +&index_after_incr); gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, index_before_incr, nitems_step)); *iv_step = step; - return index_after_incr; + *compare_step = nitems_step; + return index_before_incr; } /* Create increment IV. */ @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, arbitrarily pick the last. */ tree test_ctrl = NULL_TREE; tree iv_step = NULL_TREE; + tree compare_step = NULL_TREE; rgroup_controls *rgc; rgroup_controls *iv_rgc = nullptr; unsigned int i; @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, &preheader_seq, &header_seq, loop_cond_gsi, rgc, niters, niters_skip, might_wrap_p, -&iv_step); +&iv_step, &compare_step); iv_rgc = rgc; } @@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); - tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; - tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); - gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl, - NULL_TREE, NULL_TREE); + gcond *cond_stmt; + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) +{ + gcc_assert (compare_step); + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; + cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE, +NULL_TREE); +} + else +{ + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); + cond_stmt + = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE); +} gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* The loop iterates (NITERS - 1) / VF + 1 times. -- 2.36.3
[PATCH V2] RISC-V: Support RVV permutation auto-vectorization
From: Juzhe-Zhong This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. Fixed following comments from Robin. Ok for trunk? gcc/ChangeLog: * config/riscv/autovec.md (vec_perm): New pattern. * config/riscv/predicates.md (vector_perm_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_perm): New function. * config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Ditto. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_vec_perm): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test. --- gcc/config/riscv/autovec.md | 18 +++ gcc/config/riscv/predicates.md| 4 + gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 153 ++ .../riscv/rvv/autovec/vls-vlmax/perm-1.c | 58 +++ .../riscv/rvv/autovec/vls-vlmax/perm-2.c | 33 .../riscv/rvv/autovec/vls-vlmax/perm-3.c | 29 .../riscv/rvv/autovec/vls-vlmax/perm-4.c | 58 +++ .../riscv/rvv/autovec/vls-vlmax/perm-5.c | 49 ++ .../riscv/rvv/autovec/vls-vlmax/perm-6.c | 58 +++ .../riscv/rvv/autovec/vls-vlmax/perm-7.c | 49 ++ .../riscv/rvv/autovec/vls-vlmax/perm.h| 70 .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c | 104 .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c | 32 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c | 20 +++ .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c | 104 .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c | 137 .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c | 104 .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c | 135 19 files changed, 1217 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 3a1e1316732..5c3aad7ee44 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -83,6 +83,24 @@ } ) +;; - +;; [INT,FP] permutation +;; - +;; This is the pattern permutes the vector +;; - + +(define_expand "vec_perm" + [(match_operand:V 0 "register_operand") + (match_operand:V 1 "register_operand") + (match_operand:V 2 "regis
[PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization
From: Juzhe-Zhong 1. This patch optimize the codegen of the following auto-vectorization codes: void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n) { for (int i = 0; i < n; i++) c[i] = (int64_t)a[i] + b[i]; } Combine instruction from: ... vsext.vf2 vadd.vv ... into: ... vwadd.wv ... Since for PLUS operation, GCC prefer the following RTL operand order when combining: (plus: (sign_extend:..) (reg:) instead of (plus: (reg:..) (sign_extend:) which is different from MINUS pattern. I split patterns of vwadd/vwsub, and add dedicated patterns for them. 2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv optimization for complicate PLUS/MINUS codes, consider this following codes: __attribute__ ((noipa)) void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] + (int16_t) b[i]; dst2[i] = (int16_t) a2[i] + (int16_t) b[i]; dst3[i] = (int16_t) a2[i] + (int16_t) a[i]; } } Before this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v2,0(a3) vle8.v v1,0(a4) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v3,v2 vsext.vf2 v2,v1 vadd.vv v1,v2,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v1,0(a0) vle8.v v4,0(a5) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v1,v4 vadd.vv v2,v1,v2 ... After this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v3,0(a4) vle8.v v1,0(a3) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vvv2,v1,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v2,0(a0) vle8.v v2,0(a5) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vvv4,v3,v2 vsetvli zero,a6,e16,m1,ta,ma vse16.v v4,0(a1) vsetvli t4,zero,e8,mf2,ta,ma sub a7,a7,a6 vwadd.vvv3,v2,v1 vsetvli zero,a6,e16,m1,ta,ma vse16.v v3,0(a2) ... The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate RTL IR, extend the other operand to generate vwadd.vv. So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv intrinsic API expander * config/riscv/vector.md (@pred_single_widen_): Remove it. (@pred_single_widen_sub): New pattern. (@pred_single_widen_add): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test. --- .../riscv/riscv-vector-builtins-bases.cc | 8 +++-- gcc/config/riscv/vector.md| 29 +--- .../riscv/rvv/autovec/widen/widen-5.c | 27 +++ .../riscv/rvv/autovec/widen/widen-6.c | 27 +++ .../rvv/autovec/widen/widen-complicate-1.c| 31 + .../rvv/autovec/widen/widen-complicate-2.c| 31 + .../riscv/rvv/autovec/widen/widen_run-5.c | 34 +++ .../riscv/rvv/autovec/widen/widen_run-6.c | 34 +++ 8 files changed, 215 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-6.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index a8113f6602b..3f92084929d 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -361,8 +361,12 @@ public: return e.use_exact_insn ( code_for_pred_dual_widen_scalar (CODE1, CODE2, e.vector_mode ())); case OP_TYPE_wv: - return e.use_exact_insn ( - code_for_pred_single_widen (CODE1, CODE2, e.vecto
[PATCH V3] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kewen. This patch will need to wait for Kewen's test feedback. Testing on X86 is on-going Co-Authored by: Kewen Lin PR tree-optimization/109971 gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto. --- gcc/tree-vect-loop-manip.cc | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index acf3642ceb2..3f735945e67 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator loop_cond_gsi, rgroup_controls *rgc, tree niters, tree niters_skip, bool might_wrap_p, -tree *iv_step) +tree *iv_step, tree *compare_step) { tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... - ivtmp_35 = ivtmp_9 - _36; + ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; ... - if (ivtmp_35 != 0) + if (ivtmp_9 > POLY_INT_CST [4, 4]) goto ; [83.33%] else goto ; [16.67%] @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, -insert_after, &index_before_incr, &index_after_incr); + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, +&incr_gsi, insert_after, &index_before_incr, +&index_after_incr); gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, index_before_incr, nitems_step)); *iv_step = step; - return index_after_incr; + *compare_step = nitems_step; + return index_before_incr; } /* Create increment IV. */ @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, arbitrarily pick the last. */ tree test_ctrl = NULL_TREE; tree iv_step = NULL_TREE; + tree compare_step = NULL_TREE; rgroup_controls *rgc; rgroup_controls *iv_rgc = nullptr; unsigned int i; @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, &preheader_seq, &header_seq, loop_cond_gsi, rgc, niters, niters_skip, might_wrap_p, -&iv_step); +&iv_step, &compare_step); iv_rgc = rgc; } @@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); - tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; - tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); - gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl, - NULL_TREE, NULL_TREE); + gcond *cond_stmt; + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) +{ + gcc_assert (compare_step); + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; + cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE, +NULL_TREE); +} + else +{ + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); + cond_stmt + = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE); +} gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* The loop iterates (NITERS - 1) / VF + 1 times. -- 2.36.3
[PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
From: Juzhe-Zhong This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } In such complicate case, the operand is not single used, used by multiple statements. GCC combine optimization will iterate the combination of the operands. First round -> combine one of the operand and change vsext + vmul into vwmul.wv Second round -> combine the other operand and change vwmul.wv into vwmul.vv Notice when I add a pseudo vwmul.wv pattern, it makes vwmulsu.vv testcase fail since GCC prefer such pattern order: (mul: (zero_extend) (sign_exted)) So change vwmulsu.vv instruction operands order. gcc/ChangeLog: * config/riscv/vector.md: Shift zero_extend and sign_extend order. * config/riscv/autovec-opt.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test. --- gcc/config/riscv/autovec-opt.md | 56 +++ gcc/config/riscv/vector.md| 9 +-- .../riscv/rvv/autovec/widen/widen-7.c | 27 + .../rvv/autovec/widen/widen-complicate-3.c| 32 +++ .../riscv/rvv/autovec/widen/widen_run-7.c | 34 +++ 5 files changed, 154 insertions(+), 4 deletions(-) create mode 100644 gcc/config/riscv/autovec-opt.md create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md new file mode 100644 index 000..5b7dc9bef8c --- /dev/null +++ b/gcc/config/riscv/autovec-opt.md @@ -0,0 +1,56 @@ +;; Machine description for optimization of RVV auto-vectorization. +;; Copyright (C) 2023 Free Software Foundation, Inc. +;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd. + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. + +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +;; We don't have vwmul.wv instruction like vwadd.wv in RVV. +;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance +;; optimization of instructions combine. +(define_insn_and_split "@pred_single_widen_mul" + [(set (match_operand:VWEXTI 0 "register_operand" "=&vr,&vr") + (if_then_else:VWEXTI + (unspec: + [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") +(match_operand 5 "vector_length_operand" " rK, rK") +(match_operand 6 "const_int_operand" "i, i") +(match_operand 7 "const_int_operand" "i, i") +(match_operand 8 "const_int_operand" "i, i") +(reg:SI VL_REGNUM) +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 4 "register_operand" " vr, vr")) + (match_operand:VWEXTI 3 "register_operand" " vr, vr")) + (match_operand:VWEXTI 2 "vector_merge_operand" " vu, 0")))] + "TARGET_VECTOR" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] + { +insn_code icode = code_for_pred_vf2 (, mode); +rtx tmp = gen_reg_rtx (mode); +rtx ops[] = {tmp, operands[4]}; +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops); + +emit_
[PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
From: Juzhe-Zhong This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } In such complicate case, the operand is not single used, used by multiple statements. GCC combine optimization will iterate the combination of the operands. Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization. Currently, we have format: (mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling. Now, we add a new vwmulsu.ww with this format: (mult: (zero_extend) (sign_extend)) To handle this following cases (sign and unsigned widening multiplication mixing codes): void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, uint8_t *__restrict b, uint8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } Before this patch: ... vsetvli zero,t1,e8,m1,ta,ma vle8.v v1,0(a4) vsetvli t3,zero,e16,m2,ta,ma vsext.vf2 v6,v1 vsetvli zero,t1,e8,m1,ta,ma vle8.v v1,0(a5) vsetvli t3,zero,e16,m2,ta,ma add t0,a0,t4 vzext.vf2 v4,v1 vmul.vv v2,v4,v6 vsetvli zero,t1,e16,m2,ta,ma vse16.v v2,0(t0) vle8.v v1,0(a6) vsetvli t3,zero,e16,m2,ta,ma add t0,a1,t4 vzext.vf2 v2,v1 vmul.vv v4,v2,v4 vsetvli zero,t1,e16,m2,ta,ma vse16.v v4,0(t0) vsetvli t3,zero,e16,m2,ta,ma add t0,a2,t4 vmul.vv v2,v2,v6 vsetvli zero,t1,e16,m2,ta,ma vse16.v v2,0(t0) add t0,a3,t4 vle8.v v1,0(a7) vsetvli t3,zero,e16,m2,ta,ma sub t6,t6,t1 vsext.vf2 v2,v1 vmul.vv v2,v2,v6 vsetvli zero,t1,e16,m2,ta,ma vse16.v v2,0(t0) ... After this patch: ... vsetvli zero,t1,e8,mf2,ta,ma vle8.v v1,0(a4) vle8.v v3,0(a5) vsetvli t6,zero,e8,mf2,ta,ma add t0,a0,t3 vwmulsu.vv v2,v1,v3 vsetvli zero,t1,e16,m1,ta,ma vse16.v v2,0(t0) vle8.v v2,0(a6) vsetvli t6,zero,e8,mf2,ta,ma add t0,a1,t3 vwmulu.vv v4,v3,v2 vsetvli zero,t1,e16,m1,ta,ma vse16.v v4,0(t0) vsetvli t6,zero,e8,mf2,ta,ma add t0,a2,t3 vwmulsu.vv v3,v1,v2 vsetvli zero,t1,e16,m1,ta,ma vse16.v v3,0(t0) add t0,a3,t3 vle8.v v3,0(a7) vsetvli t6,zero,e8,mf2,ta,ma sub t4,t4,t1 vwmul.vvv2,v1,v3 vsetvli zero,t1,e16,m1,ta,ma vse16.v v2,0(t0) ... gcc/ChangeLog: * config/riscv/vector.md: Add vector-opt.md. * config/riscv/autovec-opt.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test. --- gcc/config/riscv/autovec-opt.md | 80 +++ gcc/config/riscv/vector.md| 3 +- .../riscv/rvv/autovec/widen/widen-7.c | 27 +++ .../rvv/autovec/widen/widen-complicate-3.c| 32 .../rvv/autovec/widen/widen-complicate-4.c| 31 +++ .../riscv/rvv/autovec/widen/widen_run-7.c | 34 6 files changed, 206 insertions(+), 1 deletion(-) create mode 100644 gcc/config/riscv/autovec-opt.md create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md new file mode 100644 index 000..92cdc4e9a16 --- /dev/null +++ b/gcc/config/riscv/autovec-opt.md @@ -0,0 +1,80 @@ +;; Machine description for optimizat
[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics. --- gcc/config/riscv/riscv-vector-builtins-bases.cc | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index a8113f6602b..498c6ba042e 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -164,7 +164,7 @@ public: { if (STORE_P || LST_TYPE == LST_INDEXED) return true; -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -963,7 +963,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -979,7 +979,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -1749,7 +1749,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -1794,7 +1794,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override -- 2.36.1
[PATCH] RISC-V: Add __RISCV_ prefix to VXRM and FRM enum
From: Juzhe-Zhong According to doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 Add __RISCV_ prefix to VXRM and FRM enum. gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add __RISCV_ prefix. (DEF_RVV_FRM_ENUM): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/frm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-10.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-11.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-12.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-6.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-7.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-8.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-9.c: Ditto. --- gcc/config/riscv/riscv-vector-builtins.cc | 8 gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c | 10 +- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c | 8 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c | 8 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-11.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-12.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c | 8 10 files changed, 31 insertions(+), 31 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins.cc b/gcc/config/riscv/riscv-vector-builtins.cc index 43bf6d8f262..9e6dae98a6d 100644 --- a/gcc/config/riscv/riscv-vector-builtins.cc +++ b/gcc/config/riscv/riscv-vector-builtins.cc @@ -4026,11 +4026,11 @@ register_vxrm () { auto_vec values; #define DEF_RVV_VXRM_ENUM(NAME, VALUE) \ - values.quick_push (string_int_pair ("VXRM_" #NAME, VALUE)); + values.quick_push (string_int_pair ("__RISCV_VXRM_" #NAME, VALUE)); #include "riscv-vector-builtins.def" #undef DEF_RVV_VXRM_ENUM - lang_hooks.types.simulate_enum_decl (input_location, "RVV_VXRM", &values); + lang_hooks.types.simulate_enum_decl (input_location, "__RISCV_VXRM", &values); } /* Register the frm enum. */ @@ -4039,11 +4039,11 @@ register_frm () { auto_vec values; #define DEF_RVV_FRM_ENUM(NAME, VALUE) \ - values.quick_push (string_int_pair ("FRM_" #NAME, VALUE)); + values.quick_push (string_int_pair ("__RISCV_FRM_" #NAME, VALUE)); #include "riscv-vector-builtins.def" #undef DEF_RVV_FRM_ENUM - lang_hooks.types.simulate_enum_decl (input_location, "RVV_FRM", &values); + lang_hooks.types.simulate_enum_decl (input_location, "__RISCV_FRM", &values); } /* Implement #pragma riscv intrinsic vector. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c index f5635fb959e..ff19c8bc089 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/frm-1.c @@ -5,27 +5,27 @@ size_t f0 () { - return FRM_RNE; + return __RISCV_FRM_RNE; } size_t f1 () { - return FRM_RTZ; + return __RISCV_FRM_RTZ; } size_t f2 () { - return FRM_RDN; + return __RISCV_FRM_RDN; } size_t f3 () { - return FRM_RUP; + return __RISCV_FRM_RUP; } size_t f4 () { - return FRM_RMM; + return __RISCV_FRM_RMM; } /* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*0} 1} } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c index 0d364787ad0..b0ed27b0520 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c @@ -5,22 +5,22 @@ size_t f0 () { - return VXRM_RNU; + return __RISCV_VXRM_RNU; } size_t f1 () { - return VXRM_RNE; + return __RISCV_VXRM_RNE; } size_t f2 () { - return VXRM_RDN; + return __RISCV_VXRM_RDN; } size_t f3 () { - return VXRM_ROD; + return __RISCV_VXRM_ROD; } /* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*0} 1} } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c index a707aa1645e..3c7872bb73d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c @@ -8,16 +8,16 @@ void f (void * in, void *out, int32_t x, int n, int m) for (int i = 0; i < n; i++) { vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4); vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4); -vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4); -v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4); +vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, __RISCV_VXR
[PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics. * config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto. --- gcc/config/riscv/riscv-vector-builtins-bases.cc | 17 +++-- .../riscv/riscv-vector-builtins-shapes.cc | 5 ++--- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index 3f92084929d..09870c327fa 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -164,7 +164,7 @@ public: { if (STORE_P || LST_TYPE == LST_INDEXED) return true; -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -967,7 +967,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -983,7 +983,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -1715,6 +1715,11 @@ public: return CP_READ_MEMORY | CP_WRITE_CSR; } + bool can_be_overloaded_p (enum predication_type_index pred) const override + { +return pred != PRED_TYPE_none; + } + gimple *fold (gimple_folder &f) const override { return fold_fault_load (f); @@ -1753,7 +1758,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -1798,7 +1803,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -1888,7 +1893,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } gimple *fold (gimple_folder &f) const override diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc b/gcc/config/riscv/riscv-vector-builtins-shapes.cc index 76262f07ce4..c8daae01f91 100644 --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc @@ -550,9 +550,8 @@ struct fault_load_def : public build_base char *get_name (function_builder &b, const function_instance &instance, bool overloaded_p) const override { -if (overloaded_p) - if (instance.pred == PRED_TYPE_none || instance.pred == PRED_TYPE_mu) - return nullptr; +if (overloaded_p && !instance.base->can_be_overloaded_p (instance.pred)) + return nullptr; tree type = builtin_types[instance.type.index].vector; machine_mode mode = TYPE_MODE (type); int sew = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); -- 2.36.1
[PATCH V2] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid
From: Juzhe-Zhong Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics. * config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto. --- gcc/config/riscv/riscv-vector-builtins-bases.cc | 17 +++-- .../riscv/riscv-vector-builtins-shapes.cc | 5 ++--- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index 3f92084929d..09870c327fa 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -164,7 +164,7 @@ public: { if (STORE_P || LST_TYPE == LST_INDEXED) return true; -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -967,7 +967,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -983,7 +983,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum - || pred == PRED_TYPE_tumu; + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; } rtx expand (function_expander &e) const override @@ -1715,6 +1715,11 @@ public: return CP_READ_MEMORY | CP_WRITE_CSR; } + bool can_be_overloaded_p (enum predication_type_index pred) const override + { +return pred != PRED_TYPE_none; + } + gimple *fold (gimple_folder &f) const override { return fold_fault_load (f); @@ -1753,7 +1758,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -1798,7 +1803,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } rtx expand (function_expander &e) const override @@ -1888,7 +1893,7 @@ public: bool can_be_overloaded_p (enum predication_type_index pred) const override { -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; +return pred != PRED_TYPE_none; } gimple *fold (gimple_folder &f) const override diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc b/gcc/config/riscv/riscv-vector-builtins-shapes.cc index 76262f07ce4..c8daae01f91 100644 --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc @@ -550,9 +550,8 @@ struct fault_load_def : public build_base char *get_name (function_builder &b, const function_instance &instance, bool overloaded_p) const override { -if (overloaded_p) - if (instance.pred == PRED_TYPE_none || instance.pred == PRED_TYPE_mu) - return nullptr; +if (overloaded_p && !instance.base->can_be_overloaded_p (instance.pred)) + return nullptr; tree type = builtin_types[instance.type.index].vector; machine_mode mode = TYPE_MODE (type); int sew = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); -- 2.36.1
[PATCH] RISC-V: Fix warning in predicated.md
From: Juzhe-Zhong Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool arith_operand_or_mode_mask(rtx, machine_mode)???: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] || INTVAL (op) == GET_MODE_MASK (SImode)" gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL. --- gcc/config/riscv/predicates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 1ed84850e35..d14b1ca30bb 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -31,7 +31,7 @@ (ior (match_operand 0 "arith_operand") (and (match_code "const_int") (match_test "INTVAL (op) == GET_MODE_MASK (HImode) -|| INTVAL (op) == GET_MODE_MASK (SImode)" +|| UINTVAL (op) == GET_MODE_MASK (SImode)" (define_predicate "lui_operand" (and (match_code "const_int") -- 2.36.1
[PATCH] RISC-V: Optimize reverse series index vector
From: Juzhe-Zhong This patch optimizes the following seriese vector: [nunits - 1, nunits - 2, , 0] Before this patch: vid vmul vadd After this patch: vid vrsub This patch is an obvious and simple optimization, ok for trunk? gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check. --- gcc/config/riscv/riscv-v.cc | 17 + .../riscv/rvv/autovec/vls-vlmax/perm-4.c| 2 ++ 2 files changed, 19 insertions(+) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 1cd3bd3438e..75cf00b7eba 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -530,6 +530,8 @@ expand_vec_series (rtx dest, rtx base, rtx step) machine_mode mode = GET_MODE (dest); machine_mode mask_mode; gcc_assert (get_mask_mode (mode).exists (&mask_mode)); + poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1; + poly_int64 value; /* VECT_IV = BASE + I * STEP. */ @@ -545,6 +547,21 @@ expand_vec_series (rtx dest, rtx base, rtx step) rtx step_adj; if (rtx_equal_p (step, const1_rtx)) step_adj = vid; + else if (rtx_equal_p (step, constm1_rtx) && poly_int_rtx_p (base, &value) + && known_eq (nunits_m1, value)) +{ + /* Special case: + {nunits - 1, nunits - 2, ... , 0}. + nunits can be either const_int or const_poly_int. + +Code sequence: + vid.v v + vrsub nunits - 1, v. */ + rtx ops[] = {dest, vid, gen_int_mode (nunits_m1, GET_MODE_INNER (mode))}; + insn_code icode = code_for_pred_sub_reverse_scalar (mode); + emit_vlmax_insn (icode, RVV_BINOP, ops); + return; +} else { step_adj = gen_reg_rtx (mode); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c index 179c8274a92..aa328810c30 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c @@ -56,3 +56,5 @@ TEST_ALL (PERMUTE) /* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */ +/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */ +/* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */ -- 2.36.1
[PATCH V2] RISC-V: Fix warning in predicated.md
From: Juzhe-Zhong Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ???bool arith_operand_or_mode_mask(rtx, machine_mode): ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] || INTVAL (op) == GET_MODE_MASK (SImode)" gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL. --- gcc/config/riscv/predicates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index d14b1ca30bb..04ca6ceabc7 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -30,7 +30,7 @@ (define_predicate "arith_operand_or_mode_mask" (ior (match_operand 0 "arith_operand") (and (match_code "const_int") -(match_test "INTVAL (op) == GET_MODE_MASK (HImode) +(match_test "UINTVAL (op) == GET_MODE_MASK (HImode) || UINTVAL (op) == GET_MODE_MASK (SImode)" (define_predicate "lui_operand" -- 2.36.1
[PATCH] RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109
From: Juzhe-Zhong PR target/110109 This patch is to fix PR110109 issue. This issue happens is because: (define_insn_and_split "*vlmul_extx2" [(set (match_operand: 0 "register_operand" "=vr, ?&vr") (subreg: (match_operand:VLMULEXT2 1 "register_operand" " 0, vr") 0))] "TARGET_VECTOR" "#" "&& reload_completed" [(const_int 0)] { emit_insn (gen_rtx_SET (gen_lowpart (mode, operands[0]), operands[1])); DONE; }) Such pattern generate such codes in insn-recog.cc: static int pattern57 (rtx x1) { rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0]; rtx x2; int res ATTRIBUTE_UNUSED; if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0)) return -1; ... PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant () I create that patterns is to optimize the following test: vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) { return __riscv_vlmul_ext_v_f32mf2_f32m2(op1); } codegen: test_vlmul_ext_v_f32mf2_f32m2: vsetvli a5,zero,e32,m2,ta,ma vmv.v.i v2,0 vsetvli a5,zero,e32,mf2,ta,ma vle32.v v2,0(a1) vs2r.v v2,0(a0) ret There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike LLVM, LLVM has undef/poison). For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into register. However, I think it's not a big issue after we support subreg livness tracking. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Change expand approach. * config/riscv/vector.md (@vlmul_extx2): Remove it. (@vlmul_extx4): Ditto. (@vlmul_extx8): Ditto. (@vlmul_extx16): Ditto. (@vlmul_extx32): Ditto. (@vlmul_extx64): Ditto. (*vlmul_extx2): Ditto. (*vlmul_extx4): Ditto. (*vlmul_extx8): Ditto. (*vlmul_extx16): Ditto. (*vlmul_extx32): Ditto. (*vlmul_extx64): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr110109-1.c: New test. * gcc.target/riscv/rvv/base/pr110109-2.c: New test. --- .../riscv/riscv-vector-builtins-bases.cc | 28 +- gcc/config/riscv/vector.md| 120 - .../gcc.target/riscv/rvv/base/pr110109-1.c| 40 ++ .../gcc.target/riscv/rvv/base/pr110109-2.c| 485 ++ 4 files changed, 529 insertions(+), 144 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index 09870c327fa..87a684dd127 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -1565,30 +1565,10 @@ public: rtx expand (function_expander &e) const override { -e.add_input_operand (0); -switch (e.op_info->ret.base_type) - { - case RVV_BASE_vlmul_ext_x2: - return e.generate_insn ( - code_for_vlmul_extx2 (e.vector_mode ())); - case RVV_BASE_vlmul_ext_x4: - return e.generate_insn ( - code_for_vlmul_extx4 (e.vector_mode ())); - case RVV_BASE_vlmul_ext_x8: - return e.generate_insn ( - code_for_vlmul_extx8 (e.vector_mode ())); - case RVV_BASE_vlmul_ext_x16: - return e.generate_insn ( - code_for_vlmul_extx16 (e.vector_mode ())); - case RVV_BASE_vlmul_ext_x32: - return e.generate_insn ( - code_for_vlmul_extx32 (e.vector_mode ())); - case RVV_BASE_vlmul_ext_x64: - return e.generate_insn ( - code_for_vlmul_extx64 (e.vector_mode ())); - default: - gcc_unreachable (); - } +tree arg = CALL_EXPR_ARG (e.exp, 0); +rtx src = expand_normal (arg); +emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src)); +return e.target; } }; diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 79f1644732a..2496eff7874 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -498,126 +498,6 @@ } ) -(define_expand "@vlmul_extx2" - [(set (match_operand: 0 "register_operand") - (subreg: - (match_operand:VLMULEXT2 1 "register_operand") 0))] - "TARGET_VECTOR" -{}) - -(define_expand "@vlmul_extx4" - [(set (match_operand: 0 "register_operand") - (subreg: - (match_operand:VLMULEXT4 1 "register_operand") 0))] - "TARGET_VECTOR" -{}) - -(define_expand "@vlmul_extx8" - [(set (match_operand: 0 "register_operand") - (subreg: - (match_operand:VLMULEXT8 1 "register_operand") 0))] - "TARGET_VECTOR" -{}) -
[NFC] RISC-V: Reorganize riscv-v.cc
From: Juzhe-Zhong This patch is just reorganizing the functions for the following patch. I put rvv_builder and emit_* functions located before expand_const_vector function since I will use them in expand_const_vector in the following patch. gcc/ChangeLog: * config/riscv/riscv-v.cc (class rvv_builder): Reorganize functions. (rvv_builder::can_duplicate_repeating_sequence_p): Ditto. (rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto. (rvv_builder::get_merged_repeating_sequence): Ditto. (rvv_builder::get_merge_scalar_mask): Ditto. (emit_scalar_move_insn): Ditto. (emit_vlmax_integer_move_insn): Ditto. (emit_nonvlmax_integer_move_insn): Ditto. (emit_vlmax_gather_insn): Ditto. (emit_vlmax_masked_gather_mu_insn): Ditto. (get_repeating_sequence_dup_machine_mode): Ditto. --- gcc/config/riscv/riscv-v.cc | 497 ++-- 1 file changed, 249 insertions(+), 248 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 75cf00b7eba..fa13bd94f9d 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -239,6 +239,165 @@ private: expand_operand m_ops[MAX_OPERANDS]; }; + +class rvv_builder : public rtx_vector_builder +{ +public: + rvv_builder () : rtx_vector_builder () {} + rvv_builder (machine_mode mode, unsigned int npatterns, + unsigned int nelts_per_pattern) +: rtx_vector_builder (mode, npatterns, nelts_per_pattern) + { +m_inner_mode = GET_MODE_INNER (mode); +m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode); +m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode); + +gcc_assert ( + int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode)); + } + + bool can_duplicate_repeating_sequence_p (); + rtx get_merged_repeating_sequence (); + + bool repeating_sequence_use_merge_profitable_p (); + rtx get_merge_scalar_mask (unsigned int) const; + + machine_mode new_mode () const { return m_new_mode; } + scalar_mode inner_mode () const { return m_inner_mode; } + scalar_int_mode inner_int_mode () const { return m_inner_int_mode; } + unsigned int inner_bits_size () const { return m_inner_bits_size; } + unsigned int inner_bytes_size () const { return m_inner_bytes_size; } + +private: + scalar_mode m_inner_mode; + scalar_int_mode m_inner_int_mode; + machine_mode m_new_mode; + scalar_int_mode m_new_inner_mode; + unsigned int m_inner_bits_size; + unsigned int m_inner_bytes_size; +}; + +/* Return true if the vector duplicated by a super element which is the fusion + of consecutive elements. + + v = { a, b, a, b } super element = ab, v = { ab, ab } */ +bool +rvv_builder::can_duplicate_repeating_sequence_p () +{ + poly_uint64 new_size = exact_div (full_nelts (), npatterns ()); + unsigned int new_inner_size = m_inner_bits_size * npatterns (); + if (!int_mode_for_size (new_inner_size, 0).exists (&m_new_inner_mode) + || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD + || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode)) +return false; + return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ()); +} + +/* Return true if it is a repeating sequence that using + merge approach has better codegen than using default + approach (slide1down). + + Sequence A: + {a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b} + + nelts = 16 + npatterns = 2 + + for merging a we need mask 101010 + for merging b we need mask 010101 + + Foreach element in the npattern, we need to build a mask in scalar register. + Mostely we need 3 instructions (aka COST = 3), which is consist of 2 scalar + instruction and 1 scalar move to v0 register. Finally we need vector merge + to merge them. + + lui a5, #imm + add a5, #imm + vmov.s.xv0, a5 + vmerge.vxm v9, v9, a1, v0 + + So the overall (roughly) COST of Sequence A = (3 + 1) * npatterns = 8. + If we use slide1down, the COST = nelts = 16 > 8 (COST of merge). + So return true in this case as it is profitable. + + Sequence B: + {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h} + + nelts = 16 + npatterns = 8 + + COST of merge approach = (3 + 1) * npatterns = 24 + COST of slide1down approach = nelts = 16 + Return false in this case as it is NOT profitable in merge approach. +*/ +bool +rvv_builder::repeating_sequence_use_merge_profitable_p () +{ + if (inner_bytes_size () > UNITS_PER_WORD) +return false; + + unsigned int nelts = full_nelts ().to_constant (); + + if (!repeating_sequence_p (0, nelts, npatterns ())) +return false; + + unsigned int merge_cost = 1; + unsigned int build_merge_mask_cost = 3; + unsigned int slide1down_cost = nelts; + + return (build_merge_mask_cost + merge_cost) * npatterns () < slide1down_cost; +} + +/* Merge the repeating sequence into a single element and return the RT
[PATCH] RISC-V: Split arguments of expand_vec_perm
From: Juzhe-Zhong Since the following patch will calls expand_vec_perm with splitted arguments, change the expand_vec_perm interface in this patch. gcc/ChangeLog: * config/riscv/autovec.md: Split arguments. * config/riscv/riscv-protos.h (expand_vec_perm): Ditto. * config/riscv/riscv-v.cc (expand_vec_perm): Ditto. --- gcc/config/riscv/autovec.md | 3 ++- gcc/config/riscv/riscv-protos.h | 2 +- gcc/config/riscv/riscv-v.cc | 6 +- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 5c3aad7ee44..ec038fe87cd 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -96,7 +96,8 @@ (match_operand: 3 "vector_perm_operand")] "TARGET_VECTOR && GET_MODE_NUNITS (mode).is_constant ()" { -riscv_vector::expand_vec_perm (operands); +riscv_vector::expand_vec_perm (operands[0], operands[1], + operands[2], operands[3]); DONE; } ) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index d032f569a36..00e1b20c6c6 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -241,7 +241,7 @@ opt_machine_mode get_mask_mode (machine_mode); void expand_vec_series (rtx, rtx, rtx); void expand_vec_init (rtx, rtx); void expand_vcond (rtx *); -void expand_vec_perm (rtx *); +void expand_vec_perm (rtx, rtx, rtx, rtx); /* Rounding mode bitfield for fixed point VXRM. */ enum vxrm_field_enum { diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index fa13bd94f9d..49752cd8899 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2025,12 +2025,8 @@ expand_vcond (rtx *ops) /* Implement vec_perm. */ void -expand_vec_perm (rtx *operands) +expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) { - rtx target = operands[0]; - rtx op0 = operands[1]; - rtx op1 = operands[2]; - rtx sel = operands[3]; machine_mode data_mode = GET_MODE (target); machine_mode sel_mode = GET_MODE (sel); -- 2.36.3
[NFC] RISC-V: Move optimization patterns into autovec-opt.md
From: Juzhe-Zhong Move all optimization patterns into autovec-opt.md to make organization easier maintain. gcc/ChangeLog: * config/riscv/autovec-opt.md (*not): Move to autovec-opt.md. (*n): Ditto. * config/riscv/autovec.md (*not): Ditto. (*n): Ditto. * config/riscv/vector.md: Ditto. --- gcc/config/riscv/autovec-opt.md | 92 + gcc/config/riscv/autovec.md | 52 --- gcc/config/riscv/vector.md | 39 -- 3 files changed, 92 insertions(+), 91 deletions(-) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index 92cdc4e9a16..f6052b50572 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -78,3 +78,95 @@ "vwmulsu.vv\t%0,%3,%4%p1" [(set_attr "type" "viwmul") (set_attr "mode" "")]) + +;; - +;; Integer Compare Instructions Simplification +;; - +;; Simplify OP(V, V) Instructions to VMCLR.m Includes: +;; - 1. VMSNE +;; - 2. VMSLT +;; - 3. VMSLTU +;; - 4. VMSGT +;; - 5. VMSGTU +;; - +;; Simplify OP(V, V) Instructions to VMSET.m Includes: +;; - 1. VMSEQ +;; - 2. VMSLE +;; - 3. VMSLEU +;; - 4. VMSGE +;; - 5. VMSGEU +;; - + +(define_split + [(set (match_operand:VB 0 "register_operand") + (if_then_else:VB + (unspec:VB + [(match_operand:VB 1 "vector_all_trues_mask_operand") +(match_operand4 "vector_length_operand") +(match_operand5 "const_int_operand") +(match_operand6 "const_int_operand") +(reg:SI VL_REGNUM) +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) + (match_operand:VB3 "vector_move_operand") + (match_operand:VB2 "vector_undef_operand")))] + "TARGET_VECTOR" + [(const_int 0)] + { +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode), +RVV_VUNDEF (mode), operands[3], +operands[4], operands[5])); +DONE; + } +) + +;; - +;; [BOOL] Binary logical operations (inverted second input) +;; - +;; Includes: +;; - vmandnot.mm +;; - vmornot.mm +;; - + +(define_insn_and_split "*not" + [(set (match_operand:VB 0 "register_operand" "=vr") + (bitmanip_bitwise:VB + (not:VB (match_operand:VB 2 "register_operand" " vr")) + (match_operand:VB 1 "register_operand" " vr")))] + "TARGET_VECTOR" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] + { +insn_code icode = code_for_pred_not (, mode); +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands); +DONE; + } + [(set_attr "type" "vmalu") + (set_attr "mode" "")]) + +;; - +;; [BOOL] Binary logical operations (inverted result) +;; - +;; Includes: +;; - vmnand.mm +;; - vmnor.mm +;; - vmxnor.mm +;; - + +(define_insn_and_split "*n" + [(set (match_operand:VB 0 "register_operand" "=vr") + (not:VB + (any_bitwise:VB + (match_operand:VB 1 "register_operand" " vr") + (match_operand:VB 2 "register_operand" " vr"] + "TARGET_VECTOR" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] + { +insn_code icode = code_for_pred_n (, mode); +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands); +DONE; + } + [(set_attr "type" "vmalu") + (set_attr "mode" "")]) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index ec038fe87cd..9f4492db23c 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -229,58 +229,6 @@ [(set_attr "type" "vmalu") (set_attr "mode" "")]) -;; - -;; [BOOL] Binary logical operations (inverted second
[PATCH V2] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong This patch address comments from Richard and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i - _36 = MIN_EXPR ; + _36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, -&incr_gsi, insert_after, &index_before_incr, -&index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, +insert_after, &index_before_incr, &index_after_incr); + tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type, + index_before_incr, nitems_step); + gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len)); + } + else + { + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, +&incr_gsi, insert_after, &index_before_incr, +&index_after_incr); + gimple_seq_add_stmt (header_seq, + gimple_build_assign (step, MIN_EXPR, + index_before_incr, + nitems_step)); + } *iv_step = step; *compare_step = nitems_step; - return index_before_incr; + return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr + : index_before_incr; } /* Create increment IV. */ @@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); gcond *cond_stmt; - if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) { gcc_assert (compare_step); tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5b7a0da0034..68c3432c0a4 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), using_decrementing_iv_p (false), +using_select_vl_p (false), epil_using_partial_vectors_p (false), partial_load_store_bias (0), peeling_for_gaps (false), @@ -2737,6 +2738,53 @@ start_over: LOOP_VINFO_VECT_FACTOR (loop_vinfo LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; + /* If we're using decrement IV approach in loop control, we can use output of + SELECT_VL to adjust IV of loop control and data reference when it satisfies + the following checks: + + (a) SELECT_VL is supported by the target. + (b) LOOP_VINFO is single-rgroup control. + (c) non-SLP. + (d) LOOP can not be unrolled. + + Otherwise, we use MIN_EXPR approach. + + 1. We only apply SELECT_VL on single-rgroup since: + + (1). Multiple-rgroup controls N vector loads/stores would need N pointer + updates by variable amounts. + (2). SELECT_VL allows flexible length (<=VF) in each iteration. + (3). For decrement IV approach, we calculate the MAX length of the loop + and then deduce the length of each control from this MAX length. + + Base on (1), (2) and (3) situations, if we try to use SELECT_VL on + multiple-rgroup control, we need to generate multiple SELECT_VL to + carefully adjust length of each control. Such approach is very inefficient + and unprofitable for targets that are using a standalone instruction + to configure the length of each
[PATCH V3] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford This patch address comments from Richard and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i --- gcc/doc/md.texi | 22 gcc/internal-fn.def | 1 + gcc/optabs.def | 1 + gcc/tree-vect-loop-manip.cc | 32 - gcc/tree-vect-loop.cc | 72 + gcc/tree-vect-stmts.cc | 66 ++ gcc/tree-vectorizer.h | 6 7 files changed, 191 insertions(+), 9 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6a435eb4461..95f7fe1f802 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); @end smallexample +@cindex @code{select_vl@var{m}} instruction pattern +@item @code{select_vl@var{m}} +Set operand 0 to the number of scalar iterations that should be handled +by one iteration of a vector loop. Operand 1 is the total number of +scalar iterations that the loop needs to process and operand 2 is a +maximum bound on the result (also known as the maximum ``vectorization +factor''). + +The maximum value of operand 0 is given by: +@smallexample +operand0 = MIN (operand1, operand2) +@end smallexample +However, targets might choose a lower value than this, based on +target-specific criteria. Each iteration of the vector loop might +therefore process a different number of scalar iterations, which in turn +means that induction variables will have a variable step. Because of +this, it is generally not useful to define this instruction if it will +always calculate the maximum value. + +This optab is only useful on targets that implement @samp{len_load_@var{m}} +and/or @samp{len_store_@var{m}}. + @cindex @code{check_raw_ptrs@var{m}} instruction pattern @item @samp{check_raw_ptrs@var{m}} Check whether, given two pointers @var{a} and @var{b} and a length @var{len}, diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae7..6f6fa7d37f9 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set) DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store) DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) +DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary) DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, check_raw_ptrs, check_ptrs) DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW, diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b30..b637471b76e 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") OPTAB_D (len_load_optab, "len_load_$a") OPTAB_D (len_store_optab, "len_store_$a") +OPTAB_D (select_vl_optab, "select_vl$a") diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 3f735945e67..1c8100c1a1c 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, _10 = (unsigned long) count_12(D); ... # ivtmp_9 = PHI - _36 = MIN_EXPR ; + _36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, -&incr_gsi, insert_after, &index_before_incr, -&index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, +insert_after, &index_before_incr, &index_after_incr); + tree len = gimple_build (header_seq, IFN_SELEC
[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addit1,t1,513 li a3,50790400 addia3,a3,1541 sllia3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtua3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/
[PATCH] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization
From: Juzhe-Zhong This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before this patch: ... vsext.vf2 vzext.vf2 vmadd.vv .. After this patch: ... vwmaccsu.vv ... gcc/ChangeLog: * config/riscv/autovec-opt.md (*_fma): New pattern. (*single_mult_plus): Ditto. (*double_mult_plus): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test. --- gcc/config/riscv/autovec-opt.md | 162 ++ gcc/config/riscv/riscv-protos.h | 1 + .../riscv/rvv/autovec/widen/widen-8.c | 27 +++ .../riscv/rvv/autovec/widen/widen-9.c | 23 +++ .../rvv/autovec/widen/widen-complicate-5.c| 32 .../rvv/autovec/widen/widen-complicate-6.c| 30 .../riscv/rvv/autovec/widen/widen_run-8.c | 38 .../riscv/rvv/autovec/widen/widen_run-9.c | 35 8 files changed, 348 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index f6052b50572..b18783b22eb 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -170,3 +170,165 @@ } [(set_attr "type" "vmalu") (set_attr "mode" "")]) + +;; = +;; == Widening Ternary arithmetic +;; = + +;; - +;; [INT] VWMACC +;; - +;; Includes: +;; - vwmacc.vv +;; - vwmaccu.vv +;; - + +;; Combine ext + ext + fma ===> widen fma. +;; Most of circumstantces, LoopVectorizer will generate the following IR: +;; vect__8.64_40 = (vector([4,4]) int) vect__7.63_41; +;; vect__11.68_35 = (vector([4,4]) int) vect__10.67_36; +;; vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45); +(define_insn_and_split "*_fma" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (any_extend:VWEXTI + (match_operand: 3 "register_operand"))) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +rtx ops[] = {operands[0], operands[1], operands[2], operands[3]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus (, mode), + riscv_vector::RVV_WIDEN_TERNOP, ops); +DONE; + } + [(set_attr "type" "viwmuladd") + (set_attr "mode" "")]) + +;; Enhance the combine optimizations. +(define_insn_and_split "*single_mult_plus" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (match_operand:VWEXTI 3 "register_operand")) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +insn_code icode = code_for_pred_vf2 (, mode); +rtx tmp = gen_reg_rtx (mode); +rtx ext_ops[] = {tmp, operands[2]}; +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops); + +rtx dst = expand_ternary_op (mode, fma_optab, tmp,
[PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before this patch: ... vsext.vf2 vzext.vf2 vmadd.vv .. After this patch: ... vwmaccsu.vv ... gcc/ChangeLog: * config/riscv/autovec-opt.md (*_fma): New pattern. (*single_mult_plus): Ditto. (*double_mult_plus): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test. --- gcc/config/riscv/autovec-opt.md | 162 ++ gcc/config/riscv/riscv-protos.h | 1 + .../riscv/rvv/autovec/widen/widen-8.c | 27 +++ .../riscv/rvv/autovec/widen/widen-9.c | 23 +++ .../rvv/autovec/widen/widen-complicate-5.c| 32 .../rvv/autovec/widen/widen-complicate-6.c| 30 .../riscv/rvv/autovec/widen/widen_run-8.c | 38 .../riscv/rvv/autovec/widen/widen_run-9.c | 35 8 files changed, 348 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index f6052b50572..1c36b5f56be 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -170,3 +170,165 @@ } [(set_attr "type" "vmalu") (set_attr "mode" "")]) + +;; = +;; == Widening Ternary arithmetic +;; = + +;; - +;; [INT] VWMACC +;; - +;; Includes: +;; - vwmacc.vv +;; - vwmaccu.vv +;; - + +;; Combine ext + ext + fma ===> widen fma. +;; Most of circumstantces, LoopVectorizer will generate the following IR: +;; vect__8.64_40 = (vector([4,4]) int) vect__7.63_41; +;; vect__11.68_35 = (vector([4,4]) int) vect__10.67_36; +;; vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45); +(define_insn_and_split "*_fma" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (any_extend:VWEXTI + (match_operand: 3 "register_operand"))) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +rtx ops[] = {operands[0], operands[1], operands[2], operands[3]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus (, mode), + riscv_vector::RVV_WIDEN_TERNOP, ops); +DONE; + } + [(set_attr "type" "viwmuladd") + (set_attr "mode" "")]) + +;; This helps to match ext + fma to enhance the combine optimizations. +(define_insn_and_split "*single_mult_plus" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (match_operand:VWEXTI 3 "register_operand")) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +insn_code icode = code_for_pred_vf2 (, mode); +rtx tmp = gen_reg_rtx (mode); +rtx ext_ops[] = {tmp, operands[2]}; +riscv_vector::emit_vlmax_insn (icode,
[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addit1,t1,513 li a3,50790400 addia3,a3,1541 sllia3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtua3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/
[PATCH] RISC-V: Enable SELECT_VL for RVV
From: Juzhe-Zhong gcc/ChangeLog: * config/riscv/autovec.md (select_vl): New pattern. * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export global. * config/riscv/riscv-v.cc (force_vector_length_operand): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Adapt test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: New test. --- gcc/config/riscv/autovec.md | 19 + gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 2 +- .../riscv/rvv/autovec/partial/select_vl-1.c | 28 +++ .../riscv/rvv/autovec/ternop/ternop-2.c | 2 +- .../riscv/rvv/autovec/ternop/ternop-5.c | 2 +- 6 files changed, 51 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 9f4492db23c..c298f069714 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -18,6 +18,25 @@ ;; along with GCC; see the file COPYING3. If not see ;; <http://www.gnu.org/licenses/>. +;; = +;; == SELECT_VL +;; = + +(define_expand "select_vl" + [(match_operand:P 0 "register_operand") + (match_operand:P 1 "vector_length_operand") + (match_operand:P 2 "")] + "TARGET_VECTOR" +{ + poly_int64 nunits = rtx_to_poly_int64 (operands[2]); + /* We arbitrary picked QImode as inner scalar mode to get vector mode. + since vsetvl only demand ratio. We let VSETVL PASS to optimize it. */ + scalar_int_mode mode = QImode; + machine_mode rvv_mode = riscv_vector::get_vector_mode (mode, nunits).require (); + emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (rvv_mode, operands[0], operands[1])); + DONE; +}) + ;; = ;; == Loads/Stores ;; = diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 00e1b20c6c6..d770e5e826e 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -232,6 +232,7 @@ enum vlen_enum RVV_64 = 64, RVV_65536 = 65536 }; +rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx); bool slide1_sew64_helper (int, machine_mode, machine_mode, machine_mode, rtx *); rtx gen_avl_for_scalar_move (rtx); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 49752cd8899..83277fc2c05 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1280,7 +1280,7 @@ force_vector_length_operand (rtx vl) return vl; } -static rtx +rtx gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl) { unsigned int sew = get_sew (vmode); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c new file mode 100644 index 000..b8e0ca0f1f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fno-vect-cost-model -fno-tree-loop-distribute-patterns -fdump-tree-optimized-details" } */ + +#include + +#define TEST_TYPE(TYPE) \ + __attribute__ ((noipa)) void select_vl_##TYPE (TYPE *__restrict dst, \ +TYPE *__restrict a, int n)\ + { \ +for (int i = 0; i < n; i++) \ + dst[i] = a[i]; \ + } + +#define TEST_ALL() \ + TEST_TYPE (int8_t) \ + TEST_TYPE (uint8_t) \ + TEST_TYPE (int16_t) \ + TEST_TYPE (uint16_t) \ + TEST_TYPE (int32_t) \ + TEST_TYPE (uint32_t) \ + TEST_TYPE (int64_t) \ + TEST_TYPE (uint64_t) \ + TEST_TYPE (float)
[PATCH V3] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before this patch: ... vsext.vf2 vzext.vf2 vmadd.vv .. After this patch: ... vwmaccsu.vv ... gcc/ChangeLog: * config/riscv/autovec-opt.md (*_fma): New pattern. (*single_mult_plus): Ditto. (*double_mult_plus): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test. --- gcc/config/riscv/autovec-opt.md | 162 ++ gcc/config/riscv/riscv-protos.h | 1 + .../riscv/rvv/autovec/widen/widen-8.c | 27 +++ .../riscv/rvv/autovec/widen/widen-9.c | 23 +++ .../rvv/autovec/widen/widen-complicate-5.c| 32 .../rvv/autovec/widen/widen-complicate-6.c| 30 .../riscv/rvv/autovec/widen/widen_run-8.c | 38 .../riscv/rvv/autovec/widen/widen_run-9.c | 35 8 files changed, 348 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index f6052b50572..1c36b5f56be 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -170,3 +170,165 @@ } [(set_attr "type" "vmalu") (set_attr "mode" "")]) + +;; = +;; == Widening Ternary arithmetic +;; = + +;; - +;; [INT] VWMACC +;; - +;; Includes: +;; - vwmacc.vv +;; - vwmaccu.vv +;; - + +;; Combine ext + ext + fma ===> widen fma. +;; Most of circumstantces, LoopVectorizer will generate the following IR: +;; vect__8.64_40 = (vector([4,4]) int) vect__7.63_41; +;; vect__11.68_35 = (vector([4,4]) int) vect__10.67_36; +;; vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45); +(define_insn_and_split "*_fma" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (any_extend:VWEXTI + (match_operand: 3 "register_operand"))) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +rtx ops[] = {operands[0], operands[1], operands[2], operands[3]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus (, mode), + riscv_vector::RVV_WIDEN_TERNOP, ops); +DONE; + } + [(set_attr "type" "viwmuladd") + (set_attr "mode" "")]) + +;; This helps to match ext + fma to enhance the combine optimizations. +(define_insn_and_split "*single_mult_plus" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (match_operand:VWEXTI 3 "register_operand")) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +insn_code icode = code_for_pred_vf2 (, mode); +rtx tmp = gen_reg_rtx (mode); +rtx ext_ops[] = {tmp, operands[2]}; +riscv_vector::emit_vlmax_insn (icode,
[PATCH V4] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization
From: Juzhe-Zhong Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before this patch: ... vsext.vf2 vzext.vf2 vmadd.vv .. After this patch: ... vwmaccsu.vv ... gcc/ChangeLog: * config/riscv/autovec-opt.md (*_fma): New pattern. (*single_mult_plus): Ditto. (*double_mult_plus): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test. --- gcc/config/riscv/autovec-opt.md | 160 ++ gcc/config/riscv/riscv-protos.h | 1 + .../riscv/rvv/autovec/widen/widen-8.c | 27 +++ .../riscv/rvv/autovec/widen/widen-9.c | 23 +++ .../rvv/autovec/widen/widen-complicate-5.c| 32 .../rvv/autovec/widen/widen-complicate-6.c| 30 .../riscv/rvv/autovec/widen/widen_run-8.c | 38 + .../riscv/rvv/autovec/widen/widen_run-9.c | 35 8 files changed, 346 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index f6052b50572..7bb93eed220 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -170,3 +170,163 @@ } [(set_attr "type" "vmalu") (set_attr "mode" "")]) + +;; = +;; == Widening Ternary arithmetic +;; = + +;; - +;; [INT] VWMACC +;; - +;; Includes: +;; - vwmacc.vv +;; - vwmaccu.vv +;; - + +;; Combine ext + ext + fma ===> widen fma. +;; Most of circumstantces, LoopVectorizer will generate the following IR: +;; vect__8.64_40 = (vector([4,4]) int) vect__7.63_41; +;; vect__11.68_35 = (vector([4,4]) int) vect__10.67_36; +;; vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45); +(define_insn_and_split "*_fma" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (any_extend:VWEXTI + (match_operand: 3 "register_operand"))) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus (, mode), + riscv_vector::RVV_WIDEN_TERNOP, operands); +DONE; + } + [(set_attr "type" "viwmuladd") + (set_attr "mode" "")]) + +;; This helps to match ext + fma. +(define_insn_and_split "*single_mult_plus" + [(set (match_operand:VWEXTI 0 "register_operand") + (plus:VWEXTI + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 2 "register_operand")) + (match_operand:VWEXTI 3 "register_operand")) + (match_operand:VWEXTI 1 "register_operand")))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +insn_code icode = code_for_pred_vf2 (, mode); +rtx tmp = gen_reg_rtx (mode); +rtx ext_ops[] = {tmp, operands[2]}; +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops); + +rtx dst = expand_ternary_op (mode, fma_optab, tmp, operands[3],
[PATCH V2] RISC-V: Support RVV VLA SLP auto-vectorization
From: Juzhe-Zhong This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addit1,t1,513 li a3,50790400 addia3,a3,1541 sllia3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtua3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/
[PATCH V4] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i Co-authored-by: Richard Biener --- gcc/doc/md.texi | 22 ++ gcc/internal-fn.def | 1 + gcc/optabs.def | 1 + gcc/tree-vect-loop-manip.cc | 32 ++ gcc/tree-vect-loop.cc | 72 ++ gcc/tree-vect-stmts.cc | 87 - gcc/tree-vectorizer.h | 6 +++ 7 files changed, 202 insertions(+), 19 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6a435eb4461..95f7fe1f802 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); @end smallexample +@cindex @code{select_vl@var{m}} instruction pattern +@item @code{select_vl@var{m}} +Set operand 0 to the number of scalar iterations that should be handled +by one iteration of a vector loop. Operand 1 is the total number of +scalar iterations that the loop needs to process and operand 2 is a +maximum bound on the result (also known as the maximum ``vectorization +factor''). + +The maximum value of operand 0 is given by: +@smallexample +operand0 = MIN (operand1, operand2) +@end smallexample +However, targets might choose a lower value than this, based on +target-specific criteria. Each iteration of the vector loop might +therefore process a different number of scalar iterations, which in turn +means that induction variables will have a variable step. Because of +this, it is generally not useful to define this instruction if it will +always calculate the maximum value. + +This optab is only useful on targets that implement @samp{len_load_@var{m}} +and/or @samp{len_store_@var{m}}. + @cindex @code{check_raw_ptrs@var{m}} instruction pattern @item @samp{check_raw_ptrs@var{m}} Check whether, given two pointers @var{a} and @var{b} and a length @var{len}, diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 3ac9d82aace..5d638de6d06 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set) DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store) DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) +DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary) DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, check_raw_ptrs, check_ptrs) DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW, diff --git a/gcc/optabs.def b/gcc/optabs.def index 6c064ff4993..f31b69c5d85 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") OPTAB_D (len_load_optab, "len_load_$a") OPTAB_D (len_store_optab, "len_store_$a") +OPTAB_D (select_vl_optab, "select_vl$a") diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 3f735945e67..1c8100c1a1c 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, _10 = (unsigned long) count_12(D); ... # ivtmp_9 = PHI - _36 = MIN_EXPR ; + _36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, -&incr_gsi, insert_after, &index_before_incr, -&index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, +insert_after, &index_before_incr, &index_after
[PATCH V5] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i Co-authored-by: Richard Biener --- gcc/doc/md.texi | 22 ++ gcc/internal-fn.def | 1 + gcc/optabs.def | 1 + gcc/tree-vect-loop-manip.cc | 32 ++ gcc/tree-vect-loop.cc | 72 +++ gcc/tree-vect-stmts.cc | 86 - gcc/tree-vectorizer.h | 6 +++ 7 files changed, 201 insertions(+), 19 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6a435eb4461..95f7fe1f802 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); @end smallexample +@cindex @code{select_vl@var{m}} instruction pattern +@item @code{select_vl@var{m}} +Set operand 0 to the number of scalar iterations that should be handled +by one iteration of a vector loop. Operand 1 is the total number of +scalar iterations that the loop needs to process and operand 2 is a +maximum bound on the result (also known as the maximum ``vectorization +factor''). + +The maximum value of operand 0 is given by: +@smallexample +operand0 = MIN (operand1, operand2) +@end smallexample +However, targets might choose a lower value than this, based on +target-specific criteria. Each iteration of the vector loop might +therefore process a different number of scalar iterations, which in turn +means that induction variables will have a variable step. Because of +this, it is generally not useful to define this instruction if it will +always calculate the maximum value. + +This optab is only useful on targets that implement @samp{len_load_@var{m}} +and/or @samp{len_store_@var{m}}. + @cindex @code{check_raw_ptrs@var{m}} instruction pattern @item @samp{check_raw_ptrs@var{m}} Check whether, given two pointers @var{a} and @var{b} and a length @var{len}, diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 3ac9d82aace..5d638de6d06 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set) DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store) DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) +DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary) DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, check_raw_ptrs, check_ptrs) DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW, diff --git a/gcc/optabs.def b/gcc/optabs.def index 6c064ff4993..f31b69c5d85 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") OPTAB_D (len_load_optab, "len_load_$a") OPTAB_D (len_store_optab, "len_store_$a") +OPTAB_D (select_vl_optab, "select_vl$a") diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 3f735945e67..1c8100c1a1c 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, _10 = (unsigned long) count_12(D); ... # ivtmp_9 = PHI - _36 = MIN_EXPR ; + _36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, -&incr_gsi, insert_after, &index_before_incr, -&index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, +insert_after, &index_before_incr, &index_afte
[PATCH V6] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong Co-authored-by: Richard Sandiford Co-authored-by: Richard Biener This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i - _36 = MIN_EXPR ; + _36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, -&incr_gsi, insert_after, &index_before_incr, -&index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, +insert_after, &index_before_incr, &index_after_incr); + tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type, + index_before_incr, nitems_step); + gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len)); + } + else + { + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, +&incr_gsi, insert_after, &index_before_incr, +&index_after_incr); + gimple_seq_add_stmt (header_seq, + gimple_build_assign (step, MIN_EXPR, + index_before_incr, + nitems_step)); + } *iv_step = step; *compare_step = nitems_step; - return index_before_incr; + return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr + : index_before_incr; } /* Create increment IV. */ @@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); gcond *cond_stmt; - if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) { gcc_assert (compare_step); tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5b7a0da0034..ace9e759f5b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), using_decrementing_iv_p (false), +using_select_vl_p (false), epil_using_partial_vectors_p (false), partial_load_store_bias (0), peeling_for_gaps (false), @@ -2737,6 +2738,77 @@ start_over: LOOP_VINFO_VECT_FACTOR (loop_vinfo LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; + /* If a loop uses length controls and has a decrementing loop control IV, + we will normally pass that IV through a MIN_EXPR to calcaluate the + basis for the length controls. E.g. in a loop that processes one + element per scalar iteration, the number of elements would be + MIN_EXPR , where N is the number of scalar iterations left. + + This MIN_EXPR approach allows us to use pointer IVs with an invariant + step, since only the final iteration of the vector loop can have + inactive lanes. + + However, some targets have a dedicated instruction for calculating the + preferred length, given the total number of elements that still need to + be processed. This is encapsulated in the SELECT_VL internal function. + + If the target supports SELECT_VL, we can use it instead of MIN_EXPR + to determine the basis for the length controls. However, unlike the + MIN_EXPR calculation, the SELECT_VL calculation can decide to make + lanes inactive in any iteration of the vector loop, not just the
[PATCH] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
From: Juzhe-Zhong This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6 are quite messy and cause some bugs discovered by my downstream auto-vectorization test-generator. Before this patch. Phase 5 is cleanup_insns is the function remove AVL operand dependency from each RVV instruction. E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since "a5" is used in "vsetvl" instructions and after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need AVL operand "a5" anymore. Then, we remove this operand dependency helps for the following scheduling PASS. Phase 6 is propagate_avl do the following 2 things: 1. Local && Global user vsetvl instructions optimization. E.g. vsetvli a2, a2, e8, mf8 ==> Change it into vsetvli a2, a2, e32, mf2 vsetvli zero,a2, e32, mf2 ==> eliminate 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is not used by any instructions. Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM which change the CFG, I re-new a new RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and optmize user vsetvli base on the new RTL_SSA. There are 2 issues in Phase 5 && Phase 6: 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better local user vsetvl optimizations better than Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So the local user vsetvli instructions optimizaiton in Phase 6 is redundant and should be removed. 2. A bug discovered by my downstream auto-vectorization test-generator (I can't put the test in this patch since we are missing autovec patterns for it so we can't use the upstream GCC directly reproduce such issue but I will remember put it back after I support the necessary autovec patterns). Such bug is causing by using RTL_SSA re-new framework. The issue description is this: Before Phase 6: ... insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern. slli a4,a3,3 ... insn2: vsetvli zero, a3, ... load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is removed in Phase 5) ... In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1. insn2 is the vsetvli instruction inserted in Phase 4 which is not included in the RLT_SSA framework even though we renew it (I didn't take a look at it and I don't think we need to now). Base on this situation, the def_info of insn2 has the information "set->single_nondebug_insn_use ()" which return true. Obviously, this information is not correct, since insn1 has aleast 2 uses: 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my downstream test-generator execution test failed. Conclusion of RTL_SSA framework: Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of the VSETVL PASS which is absolutely correct, the other is re-new after Phase 4 (LCM) has incorrect information that causes bugs. Besides, we don't like to initialize RTL_SSA second time it seems to be a waste since we just need to do a little optimization. Base on all circumstances I described above, I rework and reorganize Phase 5 && Phase 6 as follows: 1. Phase 5 is called ssa_post_optimization which is doing the optimization base on the RTL_SSA information (The RTL_SSA is initialized at the beginning of the VSETVL PASS, no need to re-new it again). This phase includes 3 optimizaitons: 1). local_eliminate_vsetvl_insn we already have (no change). 2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal Phase 6 but with more powerful and reliable implementation. E.g. void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { size_t avl; if (m > 100) avl = __riscv_vsetvl_e16mf4(vl << 4); else{ avl = __riscv_vsetvl_e8mf8(vl); } for (size_t i = 0; i < m; i++) { vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl); __riscv_vse8_v_i8mf8(out + i, v0, avl); } } This example failed to global user vsetvl optimize before this patch: f: ... vsetvli a2,a2,e16,mf4,ta,mu .L3: li a5,0 vsetvli zero,a2,e8,mf8,ta,ma .L5: ... vle8.v v1,0(a6) addia5,a5,1 vse8.v v1,0(a4) bgtua3,a5,.L5 .L10: ret .L2: beq a3,zero,.L10 vsetvli a2,a2,e8,mf8,ta,mu j .L3 With this patch: f: ... vsetvli zer