[PATCH] RISC-V: Disable user vsetvl fusion into EMPTY block

2023-08-28 Thread Juzhe-Zhong
This patch is fixing these bunch of ICE in "vect" testsuite: FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314) FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors) FAIL: gcc.dg/vect/pr109025.c (internal c

[PATCH V2] RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

2023-08-28 Thread Juzhe-Zhong
This patch is fixing these bunch of ICE in "vect" testsuite: FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314) FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors) FAIL: gcc.dg/vect/pr109025.c (internal c

[PATCH V3] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Juzhe-Zhong
FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects scan-tree-dump slp2 "unsupported unaligned access" FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned access" XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/ve

[PATCH] RISC-V: Fix uninitialized probability for GIMPLE IR tests

2023-08-28 Thread Juzhe-Zhong
This patch fix unitialized probability in GIMPLE IR code tests: FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358) FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors) FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-o

[PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Juzhe-Zhong
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 XPASS: gcc.dg/

[PATCH] RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

2023-08-28 Thread Juzhe-Zhong
Fix bunch of ICE in "vect" testsuite: FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors) FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault

[PATCH] RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

2023-08-28 Thread Juzhe-Zhong
Notice there is a failure: FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2 scan-assembler-times vsetvli\\s+zero,\\s*zero 2 Fix "2" into "3", the assembly is correct and better. Committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c:

[PATCH] vect test: Remove xfail for riscv

2023-08-28 Thread Juzhe-Zhong
We are planning to enable "vect" testsuite with scalable vector auto-vectorization. This case XPASS: XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 like ARM SVE. --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 2 +- 1 file changed, 1 insert

[PATCH] RISC-V: Remove movmisalign pattern for VLA modes

2023-08-29 Thread Juzhe-Zhong
This patch fixed this bunch of failures in "vect" testsuite: FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr63341-1.c execution test FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr63341-2.c execution test FAIL: gcc.

[PATCH] RISC-V: Enable movmisalign for VLS modes

2023-08-29 Thread Juzhe-Zhong
Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug. Such patch disable vectorization for misalign data movement. After I check LLVM codes, LLVM supports misalign for VLS modes. Before this patch: sll a5,a4,0x1 add a5,a5,a1 lhu a3,64(a5) lbu a5,66(a5)

[PATCH] RISC-V: Make sure we get VL REG operand for VLMAX vsetvl

2023-08-29 Thread Juzhe-Zhong
Fix ICE in "vect" testsuite: FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at df-scan.cc:2958) FAIL: gcc.dg/vect/pr64495.c (test for excess errors After this patch, all current found VSETVL PASS related bugs in "vect" are fixed. gcc/ChangeLog: * config/riscv

[PATCH] middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

2023-08-29 Thread Juzhe-Zhong
Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant. Bootstrap and Regression on X86 passed. Ok for trunk? gcc/ChangeLog: * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant. (call_may_clobber_ref_p_1): Ditto. * tree-ssa-loop-ivopts.cc (get_m

[PATCH V5] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
Add vect_strided and vect_widen so that we will remove these following failures: FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 0 FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c scan-tree-dump-times vect "vectorized

[PATCH] test: Add xfail for riscv_vector

2023-08-30 Thread Juzhe-Zhong
Like ARM SVE, when we enable scalable vectorization for RVV, we can't do constant fold for these yet for both ARM SVE and RVV. Ok for trunk ? gcc/testsuite/ChangeLog: * gcc.dg/vect/pr88598-1.c: Add riscv_vector. * gcc.dg/vect/pr88598-2.c: Ditto. * gcc.dg/vect/pr88598-3.c

[PATCH] test: Fix XPASS of RVV

2023-08-30 Thread Juzhe-Zhong
XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects scan-tree-dump-times vect "OUTER LOOP

[PATCH] test: Adapt slp-26.c check for RVV

2023-08-30 Thread Juzhe-Zhong
Fix FAILs: FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0 FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1

[PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
Like ARM SVE, add RVV variable length xfail. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-reduc-7.c: Add RVV. --- gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c b/gcc/testsuite/gcc.dg/vec

[PATCH V6] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
This patch is the final version of enabling vect_int test for RVV. There are still 80+ FAILs and they can't be fixed by adjusting testcases or target-supports.exp Here is the analysis of **ALL** FAILs: 1. REAL highest priority FAILs: ICE: FAIL: gcc.dg/vect/vect-live-6.c (internal compiler

[PATCH] RISC-V: Add Vector cost model framework for RVV

2023-08-31 Thread Juzhe-Zhong
SC-V 'V' Extension for GNU compiler. + Copyright (C) 2023-2023 Free Software Foundation, Inc. + Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms

[PATCH] RISC-V: Enable VECT_COMPARE_COSTS by default

2023-08-31 Thread Juzhe-Zhong
since we have added COST framework, we by default enable VECT_COMPARE_COSTS. Also, add 16/32/64 to provide more choices for COST comparison. This patch doesn't change any behavior from the current testsuite since we are using default COST model. gcc/ChangeLog: * config/riscv/riscv-v.cc

[PATCH] RISC-V: Add dynamic LMUL compile option

2023-08-31 Thread Juzhe-Zhong
We are going to support dynamic LMUL support. gcc/ChangeLog: * config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add dynamic enum. * config/riscv/riscv.opt: Add dynamic compile option. --- gcc/config/riscv/riscv-opts.h | 4 +++- gcc/config/riscv/riscv.opt| 3 +++ 2

[PATCH] RISC-V: Fix Dynamic LMUL compile option

2023-09-04 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status. * config/riscv/riscv-v.cc (preferred_simd_mode): Ditto. (autovectorize_vector_modes): Ditto. (vectorize_related_mode): Ditto. --- gcc/config/riscv/riscv-opts.h | 2 +-

[PATCH] RISC-V: Support Dynamic LMUL Cost model

2023-09-04 Thread Juzhe-Zhong
This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic. Consider this following case: void foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c, int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2, int32_t *__res

[PATCH] RISC-V: Export functions as global extern preparing for dynamic LMUL patch use

2023-09-05 Thread Juzhe-Zhong
Notice those functions need to be use by COST model for dynamic LMUL use. Extract as a single patch and committed. gcc/ChangeLog: * config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global. (get_all_predecessors): New function. (get_all_successors): Ditto

[PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-05 Thread Juzhe-Zhong
This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic. Consider this following case: void foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c, int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2, int32_t *__res

[PATCH] RISC-V: Fix incorrect mode tieable which cause ICE in RA[PR111296]

2023-09-06 Thread Juzhe-Zhong
This patch fix incorrect mode tieable between DI and V2SI which cause ICE in RA. PR target/111296 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_modes_tieable_p): Fix bug. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/pr111296.C: New test. --- gcc/config/riscv/r

[PATCH] RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit

2023-09-06 Thread Juzhe-Zhong
Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system E.g. V8QI Since I realized such modes may cause inferior codegen for some situations in RV32 system. However, this is really quite ugly and it cause ICE for some cases in RV32: FAIL: gcc.target/riscv/r

[PATCH] RISC-V: Fix VSETVL PASS AVL/VL fetch bug[111295]

2023-09-06 Thread Juzhe-Zhong
Fix bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111295 PR target/111295 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (insert_vsetvl): gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111295.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc

[Committed V2] RISC-V: Fix incorrect mode tieable which cause ICE in RA[PR111296]

2023-09-06 Thread Juzhe-Zhong
This patch fix incorrect mode tieable between DI and V2SI which cause ICE in RA. PR target/111296 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_modes_tieable_p): Fix incorrect mode tieable for RVV modes. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/pr111296.C:

[PATCH] RISC-V: Remove incorrect earliest vsetvl post optimization[PR111313]

2023-09-06 Thread Juzhe-Zhong
This patch removes the incorrect earliest poset vsetvl optimization, such bug was found in vect-double-reduc-5.c which is runtime(execution fail) and also in PR111313. For VLMAX intrinsics, we always emit a bogus patter which is vlmax_avl (see vector.md) to occupy a scalar register which is used

[PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]

2023-09-07 Thread Juzhe-Zhong
This patch is not ready but they all will be fixed very soon. gcc/ChangeLog: * config/riscv/riscv.opt: Set default as scalable vectorization. --- gcc/config/riscv/riscv.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/

[PATCH] RISC-V: Add VLS mask modes mov patterns[PR111311]

2023-09-07 Thread Juzhe-Zhong
This patterns fix these following ICE FAILs when running the whole GCC testsuite with enabling scalable vector by default. All of these FAILs are fixed: FAIL: c-c++-common/opaque-vector.c -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079) FAIL: c-c++-common/opaque-vec

[PATCH] RISC-V: Replace rtx REG for zero REGS operations

2023-09-07 Thread Juzhe-Zhong
This patch fixes these following FAILs: FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176) FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (test for excess errors) FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O1 (internal comp

[PATCH] RISC-V: Fix incorrect nregs calculation for VLS modes

2023-09-08 Thread Juzhe-Zhong
This patch fixes obvious bug: TARGET_MIN_VLEN is bitsize. All these following bugs are fixed with this patch: FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176) FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (test for excess err

[PATCH] RISC-V: Suppress bogus warning for VLS types

2023-09-08 Thread Juzhe-Zhong
This patch fixes over 100+ bogus FAILs due to experimental vector ABI warning. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_pass_in_vector_p): Only allow RVV type. --- gcc/config/riscv/riscv.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.c

[Committed] RISC-V: Fix VLS floating-point operations predicate

2023-09-08 Thread Juzhe-Zhong
VLS vfadd should depend on ZVFH instead of ZVFHMIN. Obvious fix and committed. gcc/ChangeLog: * config/riscv/vector-iterators.md: Fix floating-point operations predicate. --- gcc/config/riscv/vector-iterators.md | 24 1 file changed, 12 insertions(+), 12 deleti

[PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]

2023-09-08 Thread Juzhe-Zhong
This patch add VLS modes VEC_PERM support which fix these following FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311: FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EX

[PATCH] RISC-V: Fix dump FILE of VSETVL PASS[PR111311]

2023-09-09 Thread Juzhe-Zhong
To make the dump FILE not too big, add TDF_DETAILS. This patch fix these following FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions comparison FAI

[PATCH] RISC-V: Expand fixed-vlmax/vls vector permutation in targethook

2023-09-09 Thread Juzhe-Zhong
When debugging FAIL: gcc.dg/pr92301.c execution test. Realize a vls vector permutation situation failed to vectorize since early return false: - /* For constant size indices, we dont't need to handle it here. - Just leave it to vec_perm. */ - if (d->perm.length ().is_constant ()) -retu

[PATCH] RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

2023-09-09 Thread Juzhe-Zhong
If a const vector all elements are same, the slide up is unnecessary. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid unnecessary slideup. --- gcc/config/riscv/riscv-v.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/ri

[PATCH V2] RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

2023-09-10 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid unnecessary slideup. --- gcc/config/riscv/riscv-v.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index bee60de1d26..3cd1f61de0e

[Committed] RISC-V: Add missing VLS mask bool mode reg -> reg patterns

2023-09-10 Thread Juzhe-Zhong
Committed. gcc/ChangeLog: * config/riscv/autovec-vls.md (*mov_vls): New pattern. * config/riscv/vector-iterators.md: New iterator --- gcc/config/riscv/autovec-vls.md | 8 gcc/config/riscv/vector-iterators.md | 15 +++ 2 files changed, 23 insertions(+)

[Committed V2] RISC-V: Add VLS modes VEC_PERM support[PR111311]

2023-09-10 Thread Juzhe-Zhong
This patch add VLS modes VEC_PERM support which fix these following FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311: FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EX

[PATCH] RISC-V: Use dominance analysis in global vsetvl elimination

2023-09-10 Thread Juzhe-Zhong
I found that it's more reasonable to use existing dominance analysis. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn): Use dominance analysis. (pass_vsetvl::init): Ditto. (pass_vsetvl::done): Ditto. --- gcc/config/riscv/riscv-vs

[PATCH] RISC-V: Remove redundant functions

2023-09-11 Thread Juzhe-Zhong
I just finished V2 version of LMUL cost model. Turns out we don't these redundant functions. Remove them. gcc/ChangeLog: * config/riscv/riscv-protos.h (get_all_predecessors): Remove. (get_all_successors): Ditto. * config/riscv/riscv-v.cc (get_all_predecessors): Ditto.

[PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-11 Thread Juzhe-Zhong
This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic. Consider this following case: void foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c, int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2, int32_t *__res

[PATCH V3] RISC-V: Support Dynamic LMUL Cost model

2023-09-11 Thread Juzhe-Zhong
This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic. Consider this following case: void foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c, int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2, int32_t *__res

[PATCH] test: Fix XPASS of bb-slp-43.c for RVV

2023-11-06 Thread Juzhe-Zhong
RVV is variable length vector but also has 256 bit VLS mode vector. This test is vectorized as: f: vsetivlizero,8,e32,m2,ta,ma vle32.v v2,0(a0) vmv.v.i v4,1 vle16.v v1,0(a1) vmseq.vvv0,v2,v4 vsetvli zero,zero,e16,m1,ta,ma vmse

[PATCH] test: Fix XPASS of bb-slp-43.c for RVV

2023-11-06 Thread Juzhe-Zhong
RVV is variable length vector but also has 256 bit VLS mode vector. This test is vectorized as: f: vsetivlizero,8,e32,m2,ta,ma vle32.v v2,0(a0) vmv.v.i v4,1 vle16.v v1,0(a1) vmseq.vvv0,v2,v4 vsetvli zero,zero,e16,m1,ta,ma vmse

[PATCH] test: Fix FAIL of bb-slp-cond-1.c for RVV

2023-11-06 Thread Juzhe-Zhong
This patch fixes: FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "loop vectorized" 1 FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1 For RVV, "loop vectorized" appears 2 times instead of 1. Because: optimized: loop vectorized u

[PATCH] RISC-V regression test: Fix FAIL of bb-slp-39.c

2023-11-06 Thread Juzhe-Zhong
Like s390, add riscv to fix the fail. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-39.c: Add RISCV. --- gcc/testsuite/gcc.dg/vect/bb-slp-39.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-39.c b/gcc/testsuite/gcc.dg/vect/bb-sl

[PATCH] test: Fix FAIL of SAD tests for RVV

2023-11-06 Thread Juzhe-Zhong
RVV didn't explicitly enable SAD optab but we can vectorize it since loop vectorizer is able to recognize SAD pattern for RVV during analysis. Current scan check of explicit SAD pattern looks odd, it should be more reasonable to check recognition of SAD pattern during Loop vectorize analysis. Ot

[PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix FAIL of vect-sdiv-pow2-1.c for RVV#

2023-11-06 Thread Juzhe-Zhong
RVV didn't explictly enable DIV_POW2 optab but we cen vectorize it. We should check pattern recognition instead of explicit pattern check. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-sdiv-pow2-1.c: Fix dump check. --- gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +- 1 file changed,

[PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-06 Thread Juzhe-Zhong
This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV. The reason is RVV has 512 bit vector. Here is comparison between RVV ans ARM SVE: https://godbolt.org/z/xc5KE5rPs But I notice AMDGCN also has 512 bit vector, seems this patch will cause FAIL in GCN ? Not sure whether GCN

[PATCH] RISC-V regression test: Fix FAIL bb-slp-cond-1.c for RVV

2023-11-07 Thread Juzhe-Zhong
Previously, in this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635392.html I use vect64 && vect128 to represent both RVV and AMDGCN. However, it caused additional FAIL on ARM SVE. I don't know why ARM SVE vect64 is set as true since their AdvSIMD is 128bit vector and they don

[PATCH] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread Juzhe-Zhong
As https://godbolt.org/z/hPsqahEa5 shows. RVV failed dump check since "vectorizing stmts using SLP" shows 3 times instead of 2. The root cause is this code in main: if (a[0] != 1 || a[1] != 2 || a[2] != 3 || a[3] != 4 || a[4] != 7 || a[5] != 0 || a[6] != 0

[PATCH] test: Fix FAIL of pr65518.c for RVV[PR112420]

2023-11-07 Thread Juzhe-Zhong
PR target/112420 gcc/testsuite/ChangeLog: * gcc.dg/vect/pr65518.c: Fix check for RVV. --- gcc/testsuite/gcc.dg/vect/pr65518.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr65518.c b/gcc/testsuite/gcc.dg/vect/pr65518.c index 3

[PATCH] RISC-V: Add RISC-V into vect_cmdline_needed

2023-11-07 Thread Juzhe-Zhong
Like all other targets, we add RISC-V into vect_cmdline_needed. This patch fixes following FAILs: FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/tree-ssa/gen

[PATCH V2] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Juzhe-Zhong
This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV. The reason is RVV has 512 bit vector. Here is comparison between RVV ans ARM SVE: https://godbolt.org/z/xc5KE5rPs Confirm GCN also matches 4 SLP. This patch is passed on both GCN and RVV. Ok for trunk ? gcc/testsuite/Chang

[PATCH] test: Recover sdiv_pow2 check and remove test of RISC-V

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-sdiv-pow2-1.c: Recover scan check. * lib/target-supports.exp: Remove riscv. --- gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +- gcc/testsuite/lib/target-supports.exp| 4 +--- 2 files changed, 2 insertions(+), 4 deletions(-)

[PATCH V2] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-33.c: Rewrite the condition. --- gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 35 --- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c b/gcc/testsuite/gcc.dg/vect/bb-slp-

[PATCH V3] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97428.c: Add additional compile option for riscv. --- gcc/testsuite/gcc.dg/vect/pr97428.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c b/gcc/testsuite/gcc.dg/vect/pr97428.c index ad6416096aa..60dd984cfd3

[PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread Juzhe-Zhong
Since our user vsetvl intrinsics are defined as just calculate the VL output which is the number of the elements to be processed. Such intrinsics do not have any side effects. We should normalize them when they have same ratio. E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. Nor

[PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-08 Thread Juzhe-Zhong
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 SELECT_VL result is not necessary always VF in non-final iteration. Current GIMPLE IR is wrong: # vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)> ... _24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }; After this patch which is

[Committed] RISC-V: Fix VSETVL VL check condition bug

2023-11-08 Thread Juzhe-Zhong
When fixing the induction variable vectorization bug, notice there is a ICE bug in VSETVL PASS: 0x178015b rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) ../../../../gcc/gcc/rtl.cc:770 0x1079cdd rhs_regno(rtx_def const*) ../../../../gcc/gcc/rtl.h:19

[Committed] RISC-V: Fix dynamic tests [NFC]

2023-11-08 Thread Juzhe-Zhong
This patch just adapt dynamic LMUL tests for following preparing patches. Committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Ditto. * gcc.dg/vect/costmodel/riscv/rv

[PATCH] RISC-V: Fix dynamic LMUL cost model ICE

2023-11-08 Thread Juzhe-Zhong
When trying to use dynamic LMUL to compile benchmark. Notice there is a bunch ICEs. This patch fixes those ICEs and append tests. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Fix ICE. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/

[Committed] RISC-V: Add PR112450 test to avoid regression

2023-11-09 Thread Juzhe-Zhong
ICE has been fixed by Richard:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450. Add test to avoid future regression. Committed. PR target/112450 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112450.c: New test. --- .../gcc.target/riscv/rvv/autovec/pr112450.c

[PATCH] RISC-V: Move cond_copysign from combine pattern to autovec pattern

2023-11-09 Thread Juzhe-Zhong
Since cond_copysign has been support into match.pd (middle-end). We don't need to support conditional copysign by RTL combine pass. Instead, we can support it by direct explicit cond_copysign optab. conditional copysign tests are already available in the testsuite. No need to add tests. gcc/Chan

[PATCH] RISC-V: Robustify vec_init pattern[NFC]

2023-11-09 Thread Juzhe-Zhong
Although current GCC didn't cause ICE when I create FP16 vec_init case with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong. Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init needs vfslide1down/vfslide1up. It makes more sense to robustify the vec_init patterns

[PATCH] RISC-V: Add combine optimization by slideup for vec_init vectorization

2023-11-09 Thread Juzhe-Zhong
This patch is a small optimization for vector initialization. Discovered when I am evaluating benchmarks. Consider this following case: void foo3 (int8_t *out, int8_t x, int8_t y) { v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x}; *(v16qi*)out = v; } Before this patch: vse

[PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-10 Thread Juzhe-Zhong
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 1. Since SELECT_VL result is not necessary always VF in non-final iteration. Current GIMPLE IR is wrong: # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)> ... _35 = .SELECT_VL (ivtmp_33, VF); _21 = vect_vec_iv_.8_22 + { VF, ... }; E.

[Committed] RISC-V: Add test for PR112469

2023-11-10 Thread Juzhe-Zhong
As PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112469 which has been fixed by Richard patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635994.html Add tests to avoid regression. Committed. PR target/112469 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autov

[Committed] RISC-V: Adapt VLS init tests

2023-11-13 Thread Juzhe-Zhong
Realize that init tests are wrong by my previous mistakes. Fix them and committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Fix init test. * gcc.target/riscv/rvv/autovec/vls/init-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/init-2.c: Ditto.

[Committed] RISC-V: Fix init-2.c assembly check

2023-11-13 Thread Juzhe-Zhong
Notice the assembly check of init-2.c is wrong. Committed. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/init-2.c: Fix vid.v check. --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/init-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuit

[Commit QUEUE V3] RISC-V: Support strided load/store

2023-11-13 Thread Juzhe-Zhong
Strided load/store has been approved. Rebase on V3 and adapt for middle-end IR change. Will commit after middle-end patche is approved. gcc/ChangeLog: * config/riscv/autovec.md (mask_len_strided_load_): New pattern. (mask_len_strided_store_): Ditto. * config/riscv/riscv-

[PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN

2023-11-13 Thread Juzhe-Zhong
This patch adds mask_len_strided_load/mask_len_strided_store. Document already has been reviewed. This patch adds OPTAB/IFN support as follows: 1. strided load GIMPLE level: v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) be expand (by internal-fn.cc) into: v = mask_len_strided_load

[PATCH] VECT: Add MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE into loop vectorizer

2023-11-13 Thread Juzhe-Zhong
This patch support generating MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE IR for invariant stride memory access. It's a special optimization for targets like RVV. RVV has both indexed load/store and stride load/store. In RVV, we always have gather/scatter and strided optab at the same time. E.

[PATCH] RISC-V: Disallow RVV mode address for any load/store[PR112535]

2023-11-14 Thread Juzhe-Zhong
This patch is quite obvious patch which disallow for load/store address register with RVV mode. PR target/112535 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimate_address_p): Disallow RVV modes base address. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autove

[PATCH] VECT: Clear LOOP_VINFO_USING_SELECT_VL_P when loop is not partial vectorized

2023-11-15 Thread Juzhe-Zhong
This patch fixes ICE: https://godbolt.org/z/z8T6o6qov : In function 'b': :2:6: error: missing definition 2 | void b() { | ^ for SSA_NAME: loop_len_8 in statement: _1 = -loop_len_8; during GIMPLE pass: vect :2:6: internal compiler error: verify_ssa failed 0x7f1b56331082 __libc_start_

[PATCH] RISC-V: Optimize VLA SLP with duplicate VLA shuffle indice

2023-11-16 Thread Juzhe-Zhong
When evaluating dynamic LMUL, notice we can do better on VLA SLP with duplicate VLA shuffle indice. Consider this following case: void foo (uint16_t *restrict a, uint16_t *restrict b, int n) { for (int i = 0; i < n; ++i) { a[i * 8] = b[i * 8 + 3] + 1; a[i * 8 + 1] = b[i * 8 + 6

[PATCH] RISC-V: Fix bug of tuple move splitter[PR112561]

2023-11-16 Thread Juzhe-Zhong
Fix segment fault on tuple move: bbl loader z ra 000102ac sp 003ffaf0 gp 0001c0b8 tp t0 000104a0 t1 000f t2 s0 s1 a0 003ffb30 a1 003ffb58 a2 000

[PATCH V2] RISC-V: Fix bug of tuple move splitter

2023-11-17 Thread Juzhe-Zhong
Fix segment fault on tuple move: bbl loader z ra 000102ac sp 003ffaf0 gp 0001c0b8 tp t0 000104a0 t1 000f t2 s0 s1 a0 003ffb30 a1 003ffb58 a2 000

[PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Juzhe-Zhong
This patch refactors RVV iteratros for easier maintain. E.g. (define_mode_iterator V [ RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVV

[Committed V3] RISC-V: Fix bug of tuple move splitter

2023-11-18 Thread Juzhe-Zhong
PR target/112561 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_tuple_move): Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112561.c: New test. --- gcc/config/riscv/riscv-v.cc | 4 .../gcc.target/riscv/rvv/autovec/pr11

[Committed V2] RISC-V: Optimize constant AVL for LRA pattern

2023-11-19 Thread Juzhe-Zhong
This optimization was discovered in the tuple move splitted bug fix patch. Before this patch: vsetivlizero,4,e16,mf2,ta,ma lhu a3,96(a5) vlseg8e16.v v1,(a5) lw a4,%lo(e)(a2) vsetvli a6,zero,e64,m2,ta,ma addia0,a7,8 vse16.v v

[BUG FIX] RISC-V: Fix VLS DI mode of slide1 instruction attribute

2023-11-19 Thread Juzhe-Zhong
The slide1 attributes are wrong for SEW=64 VLS modes. This patch fixes the following FAILs: FAIL: gcc.c-torture/execute/pr89369.c -O2 execution test FAIL: gcc.c-torture/execute/pr89369.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr8936

[PATCH] RISC-V Regression: Remove scalable compile option

2023-11-20 Thread Juzhe-Zhong
Since we already set scalable vectorization by default, this flag is redundant. Also, we are start to full coverage testing with different compile option. E.g --param=riscv-autovec-preference=fixed-vlmax. To avoid compile option confusion. Remove it. gcc/testsuite/ChangeLog: * lib/target

[BUG FIX] RISC-V: Fix intermediate mode on slide1 instruction for SEW64 on RV32

2023-11-20 Thread Juzhe-Zhong
This bug was discovered on PR112597, with -march=rv32gcv_zvl256b --param=riscv-autovec-preference=fixed-vlmax ICE: bug.c:10:1: error: unrecognizable insn: 10 | } | ^ (insn 10 9 11 2 (set (reg:V4SI 140) (unspec:V4SI [ (unspec:V4BI [ (const_v

[Committed] RISC-V: Fix reduc_run-9.c test value check bug

2023-11-20 Thread Juzhe-Zhong
The current test value check is incorrect which is exposed on -march=rv64gcv_zvl256b Confirm on X86 also abort: [jzzhong@rios-cad121:/work/home/jzzhong/work/insn]$./a.out --33.00,4078.00,45001776.00,63369904.00--- Aborted (core dumped) Adapt the value check according to X86 r

[BUG FIX] RISC-V: Disallow COSNT_VECTOR for DI on RV32

2023-11-21 Thread Juzhe-Zhong
This bug is exposed when testing on zvl512b RV32 system. The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE. So disallow DI CONST_VECTOR on RV32. PR target/112598 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Disallow DI CONST_VECTOR on RV32. gc

[Committed] RISC-V: Add missing dump check of pr112438.c

2023-11-21 Thread Juzhe-Zhong
Notice the dump check is missing, add it. Committed as it is obvious. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112438.c: Add missing dump check. --- gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/g

[PATCH] RISC-V: Fix permutation indice mode bug

2023-11-21 Thread Juzhe-Zhong
This patch fixes following FAILs on zvl512b: FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/

[PATCH] RISC-V: Fix incorrect use of vcompress in permutation auto-vectorization

2023-11-22 Thread Juzhe-Zhong
This patch fixes following FAILs on zvl512b of RV32 system: FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c execution test FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c execution test The root cause is that for permutation indice = {0,3,7,0} use vcompress optimizat

[Committed] RISC-V: Refine some codes of riscv-v.cc[NFC]

2023-11-23 Thread Juzhe-Zhong
This patch is NFC patch to refine unreasonable codes I notice. Tested on zvl128b/zvl256b/zvl512b/zvl1024b no regression. Committed. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Refine codes. (emit_vlmax_masked_gather_mu_insn): Ditto. (modulo_sel_ind

[PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Juzhe-Zhong
This patch fixes following FAILs in zvl1024b of both RV32/RV64: FAIL: gcc.c-torture/execute/990128-1.c -O2 execution test FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fuse-link

[Committed V2] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Juzhe-Zhong
This patch fixes following FAILs in zvl1024b of both RV32/RV64: FAIL: gcc.c-torture/execute/990128-1.c -O2 execution test FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fuse-li

[Committed] RISC-V: Add wrapper for emit vec_extract[NFC]

2023-11-23 Thread Juzhe-Zhong
Add wrapper for vec_extract since my following patch will need to call it. gcc/ChangeLog: * config/riscv/riscv-protos.h (emit_vec_extract): New function. * config/riscv/riscv-v.cc (emit_vec_extract): Ditto. * config/riscv/riscv.cc (riscv_legitimize_move): Refine codes. ---

[PATCH] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread Juzhe-Zhong
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case can be better optimized. v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... }) Before this patch, we are using genriec approach (vrgather): vid vadd.vx vrgather vmsgeu vrgather With this patch, we use vec_extrac

[PATCH V2] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread Juzhe-Zhong
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case can be better optimized. v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... }) Before this patch, we are using genriec approach (vrgather): vid vadd.vx vrgather vmsgeu vrgather With this patch, we use vec_extrac

<    3   4   5   6   7   8   9   10   11   12   >