[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF [GCC 14 regression]

2024-01-12 Thread Juzhe-Zhong
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Trunk GCC: vsetvli a5,zero,e8,mf2,ta,ma li a4,-32768 vid.v v1

[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-14 Thread Juzhe-Zhong
Update in v2: Add dynmaic lmul test. This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Trunk GCC: vsetvli a5,zero,e8,mf2,ta,ma li

[PATCH] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-14 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with -march=rv64gcv in real hardware. The root cause is incorrect cost model cause inefficient vectorization which makes us performance drop significantly. So this patch does: 1. Adjust vector to scalar cost by introducing v to sca

[Committed] RISC-V: Fix attributes bug configuration of ternary instructions

2024-01-14 Thread Juzhe-Zhong
This patch fixes the following FAILs: Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.c-torture/execute/pr68532.c -O0 execution test FAIL: gcc.c-torture/execute/pr68532.c -O1 execution test FAIL: gcc.c-torture/execut

[Committed] RISC-V: Add optimized dump check of VLS reduc tests

2024-01-15 Thread Juzhe-Zhong
Add more dump check to robostify the tests. Committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check. * gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto. * gcc.target/r

[Committed V3] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-15 Thread Juzhe-Zhong
Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin. Update in v2: Add dynmaic lmul test. This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14) GCC 13.2.0: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0

[Committed V2] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-15 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with -march=rv64gcv in real hardware. The root cause is incorrect cost model cause inefficient vectorization which makes us performance drop significantly. So this patch does: 1. Adjust vector to scalar cost by introducing v to sca

[PATCH] RISC-V: Report Sorry when users enable RVV in big-endian mode [PR113404]

2024-01-15 Thread Juzhe-Zhong
As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404 We have ICE when we enable RVV in big-endian mode: during RTL pass: expand a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant, at poly-int.h:588 0xab4c2c poly_int<2u, unsigned short>::to_constant

[PATCH] test regression fix: Remove xfail for variable length targets

2024-01-15 Thread Juzhe-Zhong
Recently notice there is a XPASS in RISC-V: XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2 "vector operands from scalars" XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not slp2 "vector operands from scalars" And checked both ARM SVE and RVV: https://godbolt.org/

[PATCH] test regression fix: Remove xfail for variable length targets of bb-slp-subgroups-3.c

2024-01-15 Thread Juzhe-Zhong
Notice there is a regression recently: XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 2 XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: basic block" 2 Checked on both ARM SVE an RVV: https://godbo

[PATCH v2] test regression fix: Add vect128 for bb-slp-43.c

2024-01-16 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-43.c: Add vect128. --- gcc/testsuite/gcc.dg/vect/bb-slp-43.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c b/gcc/testsuite/gcc.dg/vect/bb-slp-43.c index dad2d24262d..8aedb06bf72 100

[PATCH] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,ma vmv.v.i v1,0 vse8

[PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,ma vmv.v.i v1,0 vse8

[Committed V3] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
V3: Rebase to trunk and commit it. This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible check for conflict vsetvl fusion. Buggy assembler before this patch: .L69: vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl vsetivlizero,8,e8,mf2,ta,

[PATCH v2] test regression fix: Add !vect128 for variable length targets of bb-slp-subgroups-3.c

2024-01-17 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128. --- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3

[PATCH] RISC-V: Support vi variant for vec_cmp

2024-01-18 Thread Juzhe-Zhong
While running various benchmarks, I notice we miss vi variant support for integer comparison. That is, we can vectorize code into vadd.vi but we can't vectorize into vmseq.vi. Consider this following case: void foo (int n, int **__restrict a) { int b; int c; int d; for (b = 0; b < n; b+

[PATCH] RISC-V: Fix RVV_VLMAX

2024-01-19 Thread Juzhe-Zhong
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memo

[PATCH V2] RISC-V: Fix RVV_VLMAX

2024-01-19 Thread Juzhe-Zhong
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memo

[Committed] RISC-V: Suppress warning

2024-01-19 Thread Juzhe-Zhong
../../gcc/config/riscv/riscv.cc: In function 'void riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)': ../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' [-Werror=unused-parameter] 4879 | tree fndecl, |

[PATCH] RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributes

2024-01-21 Thread Juzhe-Zhong
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of demanding sew_lmul. But my previous typo makes VSETVL PASS miss honor the risc-v v spec. Consider this following simple case: int foo4 (void * in, void * out) { vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4); v = __r

[PATCH] RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x

2024-01-21 Thread Juzhe-Zhong
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop. It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x. Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have better chances to fuse vsetvl. Consider this followi

[PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread Juzhe-Zhong
This patch fixes the recent regression: FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in reg_or_sub

[PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread Juzhe-Zhong
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS that is, VSETVL PASS consume over 33 GB memory which make use impossible to compile SPEC 2017 wrf in a laptop. The root cause is wasting-memory variables: unsigned num_exprs = num_bbs * num_regs; sbitmap *avl_def_loc = sbitmap

[Committed] RISC-V: Add optim-no-fusion compile option [VSETVL PASS]

2024-01-24 Thread Juzhe-Zhong
This patch adds no fusion compile option to disable phase 2 global fusion. It can help us to analyze the compile-time and debugging. Committed. gcc/ChangeLog: * config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add optim-no-fusion option. * config/riscv/riscv-vsetvl.cc (pa

[Committed] RISC-V: Remove redundant full available computation [NFC]

2024-01-25 Thread Juzhe-Zhong
Notice full available is computed evey round of earliest fusion which is redundant. Actually we only need to compute it once in phase 3. It's NFC patch and tested no regression. Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data): Remove redu

[PATCH] RISC-V: Add LCM delete block predecessors dump information

2024-01-25 Thread Juzhe-Zhong
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly. This patch add dump information of all predecessors for LCM delete vsetvl block for better debugging. Tested no regression. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function.

[PATCH] RISC-V: Fix incorrect LCM delete bug [VSETVL PASS]

2024-01-25 Thread Juzhe-Zhong
This patch fixes the recent noticed bug in RV32 glibc. We incorrectly deleted a vsetvl: ... and a4,a4,a3 vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report. vse8.v v1,0(a5) The root cause the laterin in LCM is incorrect.

[Committed V2] RISC-V: Fix incorrect LCM delete bug [VSETVL PASS]

2024-01-25 Thread Juzhe-Zhong
This patch fixes the recent noticed bug in RV32 glibc. We incorrectly deleted a vsetvl: ... and a4,a4,a3 vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report. vse8.v v1,0(a5) The root cause the laterin in LCM is incorrect.

[Committed] RISC-V: Refine some codes of VSETVL PASS [NFC]

2024-01-26 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Refine some codes. (pre_vsetvl::emit_vsetvl): Ditto. --- gcc/config/riscv/riscv-vsetvl.cc | 69 +--- 1 file changed, 27 insertions(+), 42 deletions(-) diff --git a

[PATCH] RISC-V: Fix VSETLV PASS compile-time issue

2024-01-29 Thread Juzhe-Zhong
The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%) machine dep reorg

[Committed] RISC-V: Fix regression

2024-01-29 Thread Juzhe-Zhong
Due to recent middle-end loop vectorizer changes, these tests have regression and the changes are reasonable. Adapt test to fix the regression. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adapt test. * gcc.target/riscv/rvv/autovec/binop/shift-rv

[PATCH] middle-end: Enhance conditional reduction vectorization by re-association in ifcvt [PR109088]

2024-01-30 Thread Juzhe-Zhong
This patch targets GCC-15. Consider this following case: unsigned int single_loop_with_if_condition (unsigned int *restrict a, unsigned int *restrict b, unsigned int *restrict c, unsigned int loop_size) { unsigned int result = 0; for (unsigned int i = 0; i < lo

[PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-13 Thread Juzhe-Zhong
Hi, before this patch, a simple conversion case for RVV codegen: foo: ble a2,zero,.L8 addiw a5,a2,-1 li a4,6 bleua5,a4,.L6 srliw a3,a2,3 sllia3,a3,3 add a3,a3,a0 mv a5,a0 mv a4,a1 vse

[PATCH] RISC-V: Add RVV builtin vectorization cost model

2023-12-13 Thread Juzhe-Zhong
This patch fixes PR11153: ble a1,zero,.L8 addiw a5,a1,-1 li a4,4 addisp,sp,-16 mv a2,a0 sext.w a3,a1 bleua5,a4,.L9 srliw a4,a3,2 sllia4,a4,4 mv a5,a0 add a4,a4,a0

[Committed] RISC-V: Add failed SLP testcase

2023-12-13 Thread Juzhe-Zhong
After recent RVV cost model tweak, I found this PR issue has been fixed. Add testcase and committed. PR target/112387 gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test. --- .../vect/costmodel/riscv/rvv/pr112387.c | 19 +++ 1

[PATCH] Middle-end: Do not model address cost for SELECT_VL style vectorization

2023-12-14 Thread Juzhe-Zhong
Follow Richard's suggestions, we should not model address cost in the loop vectorizer for select_vl or decrement IV since other style vectorization doesn't do that. To make cost model comparison apple to apple. This patch set COST from 2 to 1 which turns out have better codegen in various codegen

[Committed] RISC-V: Adjust test

2023-12-14 Thread Juzhe-Zhong
Since middle-end patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640595.html will change vectorization code. Adapt tests for ths patch. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Adapt test. --- gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr11298

[Committed] RISC-V: Tweak generic vector COST model

2023-12-14 Thread Juzhe-Zhong
Notice current generic vector cost model make PR112387 failed to vectorize. Adapt it same as ARM SVE generic vector cost model which can fix it. Committed as it is obvious fix. PR target/112387 gcc/ChangeLog: * config/riscv/riscv.cc: Adapt generic cost model same ARM SVE. gcc/

[PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-14 Thread Juzhe-Zhong
This patch fixes the following FAILs in "full coverage" testing: Running target riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution

[Committed] RISC-V: Remove xfail for some of the SLP tests

2023-12-15 Thread Juzhe-Zhong
Due to recent middle-end cost model changes, now we can do more VLA SLP. Fix these following regressions: XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand XPASS: gcc.target/riscv/rvv/autovec/parti

[PATCH V2] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Juzhe-Zhong
This patch fixes the following FAILs in "full coverage" testing: Running target riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution

[PATCH] RISC-V: Fix natural regsize for fixed-vlmax of -march=rv64gc_zve32f

2023-12-17 Thread Juzhe-Zhong
This patch fixes 12 ICEs of "full coverage" testing: Running target riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/torture/pr96513.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftr

[PATCH] RISC-V: Enable vect test for RV32

2023-12-17 Thread Juzhe-Zhong
After recent fixes, almost all real FAILs on RV64 full coverage testing are fixed. So, it's reasonable to start test RV32 vect testing now. We will enable full coverage testing RV32 soon and to see what else need to be fixed. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Enable

[PATCH V2] RISC-V: Enable vect test for RV32

2023-12-18 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add RV32. --- gcc/testsuite/lib/target-supports.exp | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index bd38d72562d..370df10978d

[PATCH] RISC-V: Support one more overlap for wv instructions

2023-12-18 Thread Juzhe-Zhong
For 'wv' instructions, e.g. vwadd.wv vd,vs2,vs1. vs2 has same EEW as vd. vs1 has smaller than vd. So, vs2 can overlap with vd, but vs1 can only overlap highest-number of vd when LMUL of vs1 is greater than 1. We already have supported overlap for vs1 LMUL >= 1. But I forget vs1 LMUL < 1, vs2 can

[PATCH V2] RISC-V: Support one more overlap for wv instructions

2023-12-18 Thread Juzhe-Zhong
For 'wv' instructions, e.g. vwadd.wv vd,vs2,vs1. vs2 has same EEW as vd. vs1 has smaller than vd. So, vs2 can overlap with vd, but vs1 can only overlap highest-number of vd when LMUL of vs1 is greater than 1. We already have supported overlap for vs1 LMUL >= 1. But I forget vs1 LMUL < 1, vs2 can

[Committed] RISC-V: Remove 256/512/1024 VLS vectors

2023-12-18 Thread Juzhe-Zhong
Since https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e7abd09621a4401d44f4513adf126bce4b4828b we only allow VLSmodes with size <= TARGET_MIN_VLEN * TARGET_MAX_LMUL. So when -march=rv64gcv default LMUL = 1, we don't have VLS modes of 256/512/1024 vectors. Disable them in vect test which fixes the

[Committed] RISC-V: Fix FAIL of dynamic-lmul2-7.c

2023-12-18 Thread Juzhe-Zhong
Fix this FAIL: FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c scan-tree-dump-times vect "Maximum lmul = 2" 1 gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c | 2 +- 1

[Committed] RISC-V: Refine some codes of expand_const_vector [NFC]

2023-12-19 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Use builder.inner_mode (). --- gcc/config/riscv/riscv-v.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index d1eb7a0a9a5..486f5deb296 1

[PATCH] Regression FIX: Remove vect_variable_length XFAIL from some tests

2023-12-19 Thread Juzhe-Zhong
Hi, this patch fixes these following regression FAILs on RVV: XPASS: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2 "vector operands from scalars" XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not sl

[PATCH] RISC-V: Fix FAIL of bb-slp-cond-1.c for RVV

2023-12-19 Thread Juzhe-Zhong
Due to recent VLSmode changes (Change for fixing ICE and run-time FAIL). The dump check is same as ARM SVE now. So adapt test for RISC-V. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-cond-1.c: Adapt for RISC-V. --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++-- 1 file changed, 2

[PATCH] RISC-V: Fix bug of VSETVL fusion

2023-12-19 Thread Juzhe-Zhong
This patch fixes bugs in the fusion of this following case: li a5,-1 vmv.s.x v0,a5 -> demand any non-zero AVL vsetvli a5, ... Incorrect fusion after VSETVL PASS: li a5,-1 vsetvli a5... vmv.s.x v0, a5 --> a5 is modified as incorrect value. We disallow this incorrect fusion above. Full coverage

[PATCH] RISC-V: Optimize SELECT_VL codegen when length is known as smaller than VF

2023-12-19 Thread Juzhe-Zhong
While trying to fix bugs of PR113097, notice this following situation we generate redundant vsetvli _255 = SELECT_VL (3, POLY_INT_CST [4, 4]); COND_LEN (..., _255) Before this patch: vsetivli a5, 3... ... vadd.vv (use a5) After this patch: ... vadd.vv (use AVL = 3) The reason we can do this i

[Committed] RISC-V: Fix ICE of moving SUBREG of vector mode to DImode scalar register on RV32 system.

2023-12-20 Thread Juzhe-Zhong
This patch fixes following ICE on full coverage testing of RV32. Running target riscv-sim/-march=rv32gc_zve32f/-mabi=ilp32d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic FAIL: gcc.c-torture/compile/930120-1.c -O2 (internal compiler error: in emit_move_insn, at expr.cc:4606) FAIL: gcc.c-t

[Committed] RISC-V: Add dynamic LMUL test for x264

2023-12-21 Thread Juzhe-Zhong
When working on evaluating x264 performance, I notice the best LMUL for such case with -march=rv64gcv is LMUL = 2 LMUL = 1: x264_pixel_8x8: add a4,a1,a2 addia6,a0,16 vsetivlizero,4,e8,mf4,ta,ma add a5,a4,a2 vle8.v v12,0(a6) vle

[PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong
Consider this following case: foo: ble a0,zero,.L11 lui a2,%hi(.LANCHOR0) addisp,sp,-128 addia2,a2,%lo(.LANCHOR0) mv a1,a0 vsetvli a6,zero,e32,m8,ta,ma vid.v v8 vs8r.v v8,0(sp) ---> spill .L

[Committed] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong
Consider this following case: foo: ble a0,zero,.L11 lui a2,%hi(.LANCHOR0) addisp,sp,-128 addia2,a2,%lo(.LANCHOR0) mv a1,a0 vsetvli a6,zero,e32,m8,ta,ma vid.v v8 vs8r.v v8,0(sp) ---> spill .L

[Committed] RISC-V: Add one more ASM check in PR113112-1.c

2023-12-24 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Add one more ASM check. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c b/gcc/testsuite/

[PATCH] RISC-V: Move RVV V_REGS liveness computation into analyze_loop_vinfo

2023-12-25 Thread Juzhe-Zhong
Currently, we compute RVV V_REGS liveness during better_main_loop_than_p which is not appropriate time to do that since we for example, when have the codes will finally pick LMUL = 8 vectorization factor, we compute liveness for LMUL = 8 multiple times which are redundant. Since we have leverage

[Committed] RISC-V: Some minior tweak on dynamic LMUL cost model

2023-12-26 Thread Juzhe-Zhong
Tweak some codes of dynamic LMUL cost model to make computation more predictable and accurate. Tested on both RV32 and RV64 no regression. Committed. PR target/113112 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak LMUL estimation.

[Committed] RISC-V: Fix typo

2023-12-26 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo. --- .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.

[PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,f

[PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,f

[Committed] RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness information

2023-12-27 Thread Juzhe-Zhong
Consider this following case: int f[12][100]; void bad1(int v1, int v2) { for (int r = 0; r < 100; r += 4) { int i = r + 1; f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]); f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]); f[0][r+2] = f[1][r+2] * (f[2][r+2]) -

[Committed] RISC-V: Make dynamic LMUL cost model more accurate for conversion codes

2023-12-27 Thread Juzhe-Zhong
Notice current dynamic LMUL is not accurate for conversion codes. Refine for it, there is current case is changed from choosing LMUL = 4 into LMUL = 8. Tested no regression, committed. Before this patch (LMUL = 4): After this patch (LMUL = 8): lw a7,56(sp)

[PATCH] RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost model

2023-12-28 Thread Juzhe-Zhong
This patch fixes the following choosing unexpected big LMUL which cause register spillings. Before this patch, choosing LMUL = 4: addisp,sp,-160 addiw t1,a2,-1 li a5,7 bleut1,a5,.L16 vsetivlizero,8,e64,m4,ta,ma vmv.v.x v4,a0

[Committed] RISC-V: Robostify testcase pr113112-1.c

2023-12-28 Thread Juzhe-Zhong
The redudant dump check is fragile and easily changed, not necessary. Tested on both RV32/RV64 no regression. Remove it and committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr

[Committed] RISC-V: Declare STMT_VINFO_TYPE (...) as local variable

2024-01-01 Thread Juzhe-Zhong
Committed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc: Move STMT_VINFO_TYPE (...) to local. --- gcc/config/riscv/riscv-vector-costs.cc | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-

[PATCH] RISC-V: Make liveness be aware of rgroup number of LENS[dynamic LMUL]

2024-01-01 Thread Juzhe-Zhong
This patch fixes the following situation: vl4re16.v v12,0(a5) ... vl4re16.v v16,0(a3) vs4r.v v12,0(a5) ... vl4re16.v v4,0(a0) vs4r.v v16,0(a3) ... vsetvli a3,zero,e16,m4,ta,ma ... vmv.v.x v8,t6 vmsgeu.vv v2,v16,v8 vsub.vv v16,v16,v8 vs4r.v v16,0(a5) ... vs4r.v v4,0(a0) v

[Committed] RISC-V: Add simplification of dummy len and dummy mask COND_LEN_xxx pattern

2024-01-01 Thread Juzhe-Zhong
In https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d1eacedc6d9ba9f5522f2c8d49ccfdf7939ad72d I optimize COND_LEN_xxx pattern with dummy len and dummy mask with too simply solution which causes redundant vsetvli in the following case: vsetvli a5,a2,e8,m1,ta,ma vle32.v v8,0(a0)

[PATCH] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --> a4 is pollu

[PATCH V2] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206 and PR113209, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --

[Committed V3] RISC-V: Fix bug of earliest fusion for infinite loop[VSETVL PASS]

2024-01-03 Thread Juzhe-Zhong
As PR113206 and PR113209, the bugs happens on the following situation: li a4,32 ... vsetvli zero,a4,e8,m8,ta,ma ... slliw a4,a3,24 sraiw a4,a4,24 bge a3,a1,.L8 sb a4,%lo(e)(a0) vsetvli zero,a4,e8,m8,ta,ma --

[Committed] RISC-V: Fix indent

2024-01-03 Thread Juzhe-Zhong
Fix indent of some codes to make them 8 spaces align. Committed. gcc/ChangeLog: * config/riscv/vector.md: Fix indent. --- gcc/config/riscv/vector.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md in

[Committed] RISC-V: Refine LMUL computation for MASK_LEN_LOAD/MASK_LEN_STORE IFN

2024-01-03 Thread Juzhe-Zhong
Notice a case has "Maximum lmul = 16" which is incorrect. Correct LMUL estimation for MASK_LEN_LOAD/MASK_LEN_STORE. Committed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (variable_vectorized_p): New function. (compute_nregs_for_mode): Refine LMUL. (max_number_of

[PATCH] RISC-V: Teach liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[Committed V2] RISC-V: Make liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[Committed V3] RISC-V: Make liveness estimation be aware of .vi variant

2024-01-04 Thread Juzhe-Zhong
Consider this following case: void f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n) { for (int i = 0; i < n; i++) { int tmp = b[i] + 15; int tmp2 = tmp + b[i]; c[i] = tmp2 + b[i]; d[i] = tmp + tmp2 + b[i]; } } Current dynamic LMUL cos

[PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]

2024-01-04 Thread Juzhe-Zhong
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift with vector shift amount, that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant. 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize shift with scalar shift amount, that is

[PATCH] RISC-V: Allow simplification non-vlmax with len = NUNITS reg to reg move

2024-01-04 Thread Juzhe-Zhong
While working on fixing a bug, I notice this following code has redundant move: #include "riscv_vector.h" void f (float x, float y, void *out) { float f[4] = { x, x, x, y }; vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4); __riscv_vse32_v_f32m1 (out, v, 4); } Before this patch: f: vs

[Committed V2] RISC-V: Allow simplification non-vlmax with len = NUNITS reg to reg move

2024-01-05 Thread Juzhe-Zhong
V2: Address comments from Robin. While working on fixing a bug, I notice this following code has redundant move: #include "riscv_vector.h" void f (float x, float y, void *out) { float f[4] = { x, x, x, y }; vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4); __riscv_vse32_v_f32m1 (out, v, 4); }

[Committed V2] RISC-V: Teach liveness computation loop invariant shift amount

2024-01-05 Thread Juzhe-Zhong
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift with vector shift amount, that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant. 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize shift with scalar shift amount, that is

[Committed] RISC-V: Update MAX_SEW for available vsevl info[VSETVL PASS]

2024-01-05 Thread Juzhe-Zhong
This patch fixes a bug of VSETVL PASS in this following situation: Ignore curr info since prev info available with it: prev_info: VALID (insn 8, bb 2) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=16, VLMUL=mf4, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, M

[Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-06 Thread Juzhe-Zhong
Obvious fix, Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: replace std::max by MAX. --- gcc/config/riscv/riscv-vsetvl.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 7d748edc

[PATCH] RISC-V: Fix loop invariant check

2024-01-08 Thread Juzhe-Zhong
As Robin suggested, remove gimple_uid check which is sufficient for our need. Tested on both RV32/RV64 no regression, ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop invariant check. --- gcc/config/riscv/riscv-vector-costs.cc | 2 +-

[Committed] RISC-V: Fix comments of segment load/store intrinsic[NFC]

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments to real place. (vcreate): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +--- 1 file chan

[Committed] RISC-V: Fix comments of segment load/store intrinsic

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments. (vundefined): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++-- 1 file changed, 2 inse

[PATCH] RISC-V: Remove earlyclobber for wx/wf instructions.

2023-11-30 Thread Juzhe-Zhong
While working on overlap for widening instructions, I realize that we set vwadd.wx/vfwadd.wf as earlyclobber which is incorrect. Since according to RVV ISA: "The destination EEW equals the source EEW." For both vwadd.wx/vfwadd.wf source vector and dest vector operand are same EEW. So, they should

[PATCH V2] RISC-V: Remove earlyclobber for wx/wf instructions.

2023-11-30 Thread Juzhe-Zhong
While working on overlap for widening instructions, I realize that we set vwadd.wx/vfwadd.wf as earlyclobber which is incorrect. Since according to RVV ISA: "The destination EEW equals the source EEW." vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW) while vwadd.wx only w

[PATCH] RISC-V: Fix VSETVL PASS regression

2023-11-30 Thread Juzhe-Zhong
This patch fix 2 regression (one is bug regression, the other is performance regression). Those 2 regressions are both we are comparing ratio for same AVL in wrong place. 1. BUG regression: avl_single-84.c: f0: li a5,999424 add a1,a1,a5 li a4,299008

[PATCH] RISC-V: Support highpart register overlap for widen vx/vf instructions

2023-11-30 Thread Juzhe-Zhong
This patch leverages the same approach as vwcvt. Before this patch: .L5: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e32,m4,ta,ma vle32.v v16,0(s1) vle32.v v12,0(a3) mv s1,s2 vle32.v v8,0(a4) vle32

[PATCH] RISC-V: Support highpart overlap for indexed load with SRC EEW < DEST EEW

2023-12-01 Thread Juzhe-Zhong
Leverage previous approach. Before this patch: .L5: add a3,s0,s2 add a4,s6,s2 add a5,s7,s2 vsetvli zero,s0,e64,m8,ta,ma vle8.v v4,0(s2) vle8.v v3,0(a3) mv s2,s1 vle8.v v2,0(a4) vle8.v v1,0(a5) nop

[PATCH] RISC-V: Fix incorrect combine of extended scalar pattern

2023-12-01 Thread Juzhe-Zhong
Background: RVV ISA vx instructions for example vadd.vx, When EEW = 64 and RV32. We can't directly use vadd.vx. Instead, we need to use: sw sw vlse vadd.vv However, we have some special situation that we still can directly use vadd.vx directly for EEW=64 && RV32. that is, when scalar is a known

[Committed] RISC-V: Robostify the W43, W86, W87 constraint enabled attribute

2023-12-03 Thread Juzhe-Zhong
Committed as it is obvious fix. gcc/ChangeLog: * config/riscv/riscv.md: Rostify the constraints. --- gcc/config/riscv/riscv.md | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 4c6f63677df

[PATCH] RISC-V: Fix overlap group incorrect overlap on v0

2023-12-03 Thread Juzhe-Zhong
In serious high register pressure case (appended in this patch): We see vluxei8.v v0,(s1),v1,v0.t which is not allowed. Since according to RVV ISA: +;; The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0), +;; unless the destina

[PATCH] RISC-V: Remove earlyclobber from widen reduction

2023-12-04 Thread Juzhe-Zhong
Since the destination of reduction is not a vector register group, there is no need to apply overlap constraint. Also confirm Clang: The mir in LLVM has early clobber: early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed %17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24 The mi

[PATCH] RISC-V: Support highest-number regno overlap for widen ternary vx instructions

2023-12-04 Thread Juzhe-Zhong
Consider this example: #include "riscv_vector.h" void foo6 (void *in, void *out) { vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4); vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1); vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64); vint32m4_t high

[PATCH V2] RISC-V: Support highest-number regno overlap for widen ternary

2023-12-04 Thread Juzhe-Zhong
Consider this example: #include "riscv_vector.h" void foo6 (void *in, void *out) { vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4); vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1); vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64); vint32m4_t high

[Committed V2] RISC-V: Fix overlap group incorrect overlap on v0

2023-12-04 Thread Juzhe-Zhong
In serious high register pressure case (appended in this patch): We see vluxei8.v v0,(s1),v1,v0.t which is not allowed. Since according to RVV ISA: +;; The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0), +;; unless the destina

[PATCH] RISC-V: Add blocker for gather/scatter auto-vectorization

2023-12-04 Thread Juzhe-Zhong
This patch fixes ICE exposed on full coverage testing: === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic === FAIL: g++.dg/pr106219.C -std=gnu++14 (internal compiler error: in require, at machmode.h:313) FAIL: g++

<    1   2   3   4   5   6   7   8   9   10   >