[PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6 are quite messy and cause some bugs discovered by my downstream auto-vectorization test-generator. Before this patch. Phase 5 is cleanup_insns is the function remove AVL op

[PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong This patch fixes the requirement of V_WHOLE and V_FRACT. E.g. VNx8QI in V_WHOLE has no requirement which is incorrect. Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128 since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractio

[PATCH V3] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong Address comments from Jeff. This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6 are quite messy and cause some bugs discovered by my downstream auto-vectorization test-generator. Before this patch. Phase 5 is cleanup_insns

[PATCH] RISC-V: Enable select_vl for RVV auto-vectorization

2023-06-09 Thread juzhe . zhong
From: Juzhe-Zhong Consider this following example: void vec_add(int32_t *restrict c, int32_t *restrict a, int32_t *restrict b, int N) { for (long i = 0; i < N; i++) { c[i] = a[i] + b[i]; } } After this patch: vec_add: ble a3,zero,.L5 .L3: vsetvli a5

[PATCH] VECT: Add LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-11 Thread juzhe . zhong
From: Ju-Zhe Zhong Target like ARM SVE in GCC has an elegant way to handle both loop control and flow control simultaneously: loop_control_mask = WHILE_ULT flow_control_mask = comparison control_mask = loop_control_mask & flow_control_mask; MASK_LOAD (control_mask) MASK_STORE (control_mask) How

[PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-11 Thread juzhe . zhong
From: Juzhe-Zhong Optimize the following auto-vectorization codes: void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n) { for (int i = 0; i < n; i++) a[i] = b[i] >> c; } Before this patch: foo: ble a3,zero,.L5 .L3: vsetvli a5,a3,e32

[PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-11 Thread juzhe . zhong
From: Ju-Zhe Zhong Target like ARM SVE in GCC has an elegant way to handle both loop control and flow control simultaneously: loop_control_mask = WHILE_ULT flow_control_mask = comparison control_mask = loop_control_mask & flow_control_mask; MASK_LOAD (control_mask) MASK_STORE (control_mask) How

[PATCH] RISC-V: Add ZVFHMIN autovec block testcase

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong To be safe, add ZVFHMIN autovec block testcase to make sure we won't enable autovec in ZVFHMIN by mistakes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test. --- .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c | 34 +

[PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong To be safe, add ZVFHMIN autovec block testcase to make sure we won't enable autovec in ZVFHMIN by mistakes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test. --- .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c | 35 +

[PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong According to RVV ISA: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress) Decompress operation. Case 1 (nunits = POLY_INT_CST [16, 16]): _48 = VEC_PERM_EXPR <_37, _35,

[PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong According to RVV ISA: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress) Decompress operation. Case 1 (nunits = POLY_INT_CST [16, 16]): _48 = VEC_PERM_EXPR <_37, _35,

[PATCH] RISC-V: Add comments of some functions

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::single_step_npatterns_p): Add comment. (shuffle_generic_patterns): Ditto. (expand_vec_perm_const_1): Ditto. --- gcc/config/riscv/riscv-v.cc | 7 +++ 1 file changed, 7 insertions(+) diff

[PATCH] RISC-V: Add more SLP tests

2023-06-13 Thread juzhe . zhong
From: Juzhe-Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-10.c: New test. * gcc.target/riscv/rvv/autovec/partial

[PATCH] RISC-V: Fix bug of VLA SLP auto-vectorization

2023-06-13 Thread juzhe . zhong
From: Juzhe-Zhong Sorry for producing bugs in the previous VLA SLP patch. Consider this following permutation: _85 = VEC_PERM_EXPR <{ 99, 17, ... }, { 11, 80, ... }, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>; The correct result should be: _85 = {

[PATCH V2] RISC-V: Add more SLP tests

2023-06-13 Thread juzhe . zhong
From: Juzhe-Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-10.c: New test. * gcc.target/riscv/rvv/autovec/partial

[PATCH V3] RISC-V: Add more SLP tests

2023-06-13 Thread juzhe . zhong
From: Juzhe-Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-13.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp

[PATCH] RISC-V: Use merge approach to optimize vector permutation

2023-06-13 Thread juzhe . zhong
From: Juzhe-Zhong This patch is to optimize the permuation case that is suiteable use merge approach. Consider this following case: typedef int8_t vnx16qi __attribute__((vector_size (16))); #define MASK_16 0, 17, 2, 19, 4, 21, 6, 23, 8, 25, 10, 27, 12, 29, 14, 31 void __attribute__

[PATCH V3] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-15 Thread juzhe . zhong
From: Ju-Zhe Zhong Accoding to comments from Richi, split the first patch to add ifn && optabs of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this patch. And also add BIAS argument for possible s390's future use. The description of the patterns in doc are coming Robin. Af

[PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-15 Thread juzhe . zhong
From: Ju-Zhe Zhong This patch bootstrap pass on X86, ok for trunk ? Accoding to comments from Richi, split the first patch to add ifn && optabs of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this patch. And also add BIAS argument for possible s390's future use. The descri

[PATCH] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-11 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spu

[PATCH] RISC-V: Support COND_LEN_* patterns

2023-07-11 Thread Juzhe-Zhong
This patch is depending on the following patch on Vectorizer: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624179.html With this patch, we can handle operations may trap on elements outside the loop. These 2 following cases will be addressed by this patch: 1. integer division: #define

[PATCH] RISC-V: Support integer mult highpart auto-vectorization

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong This patch is adding an obvious missing mult_high auto-vectorization pattern. Consider this following case: #define DEF_LOOP(TYPE) \ void __attribute__ ((noipa))\ mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count)

[PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spu

[PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spu

[PATCH V4] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spu

[PATCH V2] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread Juzhe-Zhong
This middle-end has been merged: https://github.com/gcc-mirror/gcc/commit/0d4dd7e07a879d6c07a33edb2799710faa95651e With this patch, we can handle operations may trap on elements outside the loop. These 2 following cases will be addressed by this patch: 1. integer division: #define TEST_TYP

[PATCH] SSA MATH: Support COND_LEN_FMA for floating-point math optimization

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Previous patch we support COND_LEN_* binary operations. However, we didn't support COND_LEN_* ternary. Now, this patch support COND_LEN_* ternary. Consider this following case: #define TEST_TYPE(TYPE)

[PATCH V2] SSA MATH: Support COND_LEN_FMA for floating-point math optimization

2023-07-13 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Previous patch we support COND_LEN_* binary operations. However, we didn't support COND_LEN_* ternary. Now, this patch support COND_LEN_* ternary. Consider this following case: #define TEST_TYPE(TYPE)

[PATCH] RISC-V: Enable COND_LEN_FMA auto-vectorization

2023-07-13 Thread Juzhe-Zhong
Enable COND_LEN_FMA auto-vectorization for floating-point FMA auto-vectorization **NO** ffast-math. Since the middle-end support has been approved and I will merge it after I finished bootstrap && regression on X86. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624395.html Now, it's time

[PATCH V2] RISC-V: Enable COND_LEN_FMA auto-vectorization

2023-07-13 Thread Juzhe-Zhong
Add comments as Robin's suggestion in scatter_store_run-7.c Enable COND_LEN_FMA auto-vectorization for floating-point FMA auto-vectorization **NO** ffast-math. Since the middle-end support has been approved and I will merge it after I finished bootstrap && regression on X86. https://gcc.gnu.org

[PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-14 Thread juzhe . zhong
From: Ju-Zhe Zhong This patch add reduc_*_scal to support reduction auto-vectorization. Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. Consider this following case: int __attribute__((noipa)) and_loop (int32_t * __restrict x, int32_t n, int res) { for (int i =

[PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction

2023-07-14 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch adds mask_len_fold_left_plus pattern to support in-order floating-point reduction for target support len loop control. Consider this following case: double foo2 (double *__restrict a, double init, int *__restrict cond, int n)

[PATCH] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-16 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): Report ERROR for TARGET_MIN_VLEN > 4096 --- gcc/config/riscv/riscv.cc | 8 1 file changed, 8 insertions(+) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 6ed735d6983..ce523eea9ba 100644 -

[PATCH V2] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-16 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): Add TARGET_MIN_VLEN < 4096 check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvl-unimplemented.c: New test. --- gcc/config/riscv/riscv.cc | 8 .../gcc.target/risc

[PATCH V2] RISC-V: Support non-SLP unordered reduction

2023-07-17 Thread Juzhe-Zhong
This patch add reduc_*_scal to support reduction auto-vectorization. Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. Consider this following case: int __attribute__((noipa)) and_loop (int32_t * __restrict x, int32_t n, int res) { for (int i = 0; i < n; ++i) r

[PATCH] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread Juzhe-Zhong
Hi, Richard. RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS (inserted after RA) ICE: rvv.c:13:1: internal compiler error: in

[PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-17 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): Add sorry check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test. * gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test. --- gcc/config/riscv/riscv.cc

[PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard. RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS (inserted after RA) ICE: rvv.c:13:1: intern

[PATCH] RISC-V: Enable SLP un-order reduction

2023-07-17 Thread Juzhe-Zhong
This patch is to enable SLP un-order reduction autao-vectorization Consider this following case: int __attribute__((noipa)) add_loop (int *x, int n, int res) { for (int i = 0; i < n; ++i) { res += x[i * 2]; res += x[i * 2 + 1]; } return res; } --param riscv-autovec-prefer

[PATCH] RISC-V: Dynamic adjust size of VLA vector according to TARGET_MIN_VLEN

2023-07-17 Thread Juzhe-Zhong
This patch is to dynamic adjust size of VLA vectors according to TARGET_MIN_VLEN (-march=*zvl*b). Currently, VNx16QImode is always [16,16] when TARGET_MINV_LEN >= 128. We are going to add a bunch of VLS modes (V16QI,V32QI,etc), these modes should always be considered as having smaller size

[PATCH V2] RISC-V: Enable SLP un-order reduction

2023-07-18 Thread Juzhe-Zhong
This patch is to enable SLP un-order reduction autao-vectorization Consider this following case: int __attribute__((noipa)) add_loop (int *x, int n, int res) { for (int i = 0; i < n; ++i) { res += x[i * 2]; res += x[i * 2 + 1]; } return res; } --param riscv-autovec-prefer

[PATCH] MAINTAINERS: Add myself as riscv port reviewer.

2023-07-18 Thread juzhe . zhong
+306,7 @@ register allocation Peter Bergner register allocationKenneth Zadeck register allocationSeongbae Park riscv port Robin Dapp +riscv port Juzhe Zhong RTL optimizers Steven Bosscher

[PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-19 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile

[PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Juzhe-Zhong
This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } Compile with **NO** -ffast-math: Before thi

[PATCH] CODE STRUCTURE: Refine codes in Vectorizer

2023-07-20 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. I plan to refine the codes that I recently support for RVV auto-vectorization. This patch is inspired last review comments from Richard: https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/ Richard said he pre

[PATCH V2] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Juzhe-Zhong
This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } Compile with **NO** -ffast-math: Before thi

[PATCH] cleanup: Change LEN_MASK into MASK_LEN

2023-07-20 Thread Juzhe-Zhong
Hi. Since start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE, COND_LEN_* patterns, the order of len and mask is {mask,len,bias}. The reason we make "mask" argument comes before "len" is because we want to keep the "mask" location same as mask_* or cond_* patterns to make use of current code

[PATCH] cleanup: make all cond_len_* and mask_len_* consistent on the order of mask and len

2023-07-20 Thread Juzhe-Zhong
This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625121.html Hi, Richard and Richi. This patch is to align the order of mask and len. Currently, According to this piece code: if (final_len && final_mask) call = gimp

[PATCH] cleanup: Change condition order

2023-07-20 Thread Juzhe-Zhong
Hi, Richard and Richi. I have double check the recent codes for len && mask support again. Some places code structure: if (len_mask_fn) ... else if (mask_fn) ... some places code structure: if (mask_len_fn) ... else if (mask) Base on previous review comment from Richi: https://gcc.gnu.org/pip

[committed] RISC-V: Fix redundant variable declaration.

2023-07-21 Thread Juzhe-Zhong
Notice there is mistakes for RISC-V I made in the last patch. Fix it. Sorry about that. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_gather_scatter): Remove redundant variables. --- gcc/config/riscv/riscv-v.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/

[PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile

[PATCH V3] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile

[PATCH V4] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile

[PATCH] VECT: Support CALL vectorization for COND_LEN_*

2023-07-24 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch supports CALL vectorization by COND_LEN_*. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Outpu

[PATCH] internal-fn: Refine macro define of COND_* and COND_LEN_* internal functions

2023-07-25 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Base on previous disscussions, we should make COND_* and COND_LEN_* consistent. So, this patch define these internal function together by these 2 wrappers: #ifndef DEF_INTERNAL_COND_FN #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE)

[PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-28 Thread Juzhe-Zhong
Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD.

[PATCH] RISC-V: Support CALL conditional autovec patterns

2023-07-28 Thread Juzhe-Zhong
This patch is depending on middle-end support: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625696.html Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i

[PATCH] RISC-V: Enable basic VLS auto-vectorization

2023-07-29 Thread Juzhe-Zhong
Consider this following case: void foo (int8_t *in, int8_t *out, int8_t x) { for (int i = 0; i < 16; i++) in[i] = x; } Compile option: --param=riscv-autovec-preference=scalable -fno-builtin Before this patch: foo: li a5,16 csrra4,vlenb vsetvli a3,zero,e8,m1

[PATCH V2] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread Juzhe-Zhong
Consider this following case: void foo (int8_t *in, int8_t *out, int8_t x) { for (int i = 0; i < 16; i++) in[i] = x; } Compile option: --param=riscv-autovec-preference=scalable -fno-builtin Before this patch: foo: li a5,16 csrra4,vlenb vsetvli a3,zero,e8,m1

[PATCH] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Juzhe-Zhong
This patch is inspired by "lowerCTPOP" in LLVM. Support popcount auto-vectorization by following LLVM approach. https://godbolt.org/z/3K3GzvY7f Before this patch: :7:21: missed: couldn't vectorize loop :8:14: missed: not vectorized: relevant stmt not supported: _5 = __builtin_popcount (_4); Aft

[committed] RISC-V: Fix bug of get_mask_mode

2023-07-31 Thread Juzhe-Zhong
Fix bugs: ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ???void riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)???: ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for member ???require??? in ???riscv_vector::get_mask_mode(dest_mode)???

[PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Juzhe-Zhong
This patch is inspired by "lowerCTPOP" in LLVM. Support popcount auto-vectorization by LLVM approach. Before this patch: :7:21: missed: couldn't vectorize loop :8:14: missed: not vectorized: relevant stmt not supported: _5 = __builtin_popcount (_4); After this patch: popcount_32: ble

[PATCH V3] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD an

[PATCH 1/1] Fix bit-position comparison

2022-07-26 Thread juzhe . zhong
From: zhongjuzhe gcc/ChangeLog: * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE --- gcc/expr.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5..ac2b3c07df6 100644 --- a/gcc/expr.cc +++ b/gcc/

[PATCH 0/1] middle-end: Fix bit position comparison

2022-07-26 Thread juzhe . zhong
From: zhongjuzhe Hi, variable "bitpos" is compute using bitsize. I think it makes sense for bit position checking whether it is out-of-bounds to array using GET_MODE_BITSIZE instead of GET_MODE_PRECISION. This patch is useful for RVV (RISC-V 'V') support that I am going to push upstream. Thanks!

[PATCH V4] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD an

[PATCH] RISC-V: Support VLS basic operation auto-vectorization

2023-08-07 Thread Juzhe-Zhong
This patch support VLS modes auto-vectorization to enhance VLA auto-vectorization when niters is known. Consider this following case: #include #define DEF_OP_VV(PREFIX, NUM, TYPE, OP) \ void __attribute__ ((noinline, noclone))

[PATCH V4] VECT: Support CALL vectorization for COND_LEN_*

2023-08-07 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD an

[PATCH] tree-optimization/110897 - Fix missed vectorization of shift on both RISC-V and aarch64

2023-08-07 Thread Juzhe-Zhong
Consider this following case: #include #define TEST2_TYPE(TYPE)\ __attribute__((noipa))\ void vshiftr_##TYPE (TYPE *__restrict dst, TYPE *__restrict a, TYPE *__restrict b, int n) \ {

[PATCH] RISC-V: Support VLS shift vectorization

2023-08-07 Thread Juzhe-Zhong
After this patch, this following case will be well optimized: #include "riscv_vector.h" #define DEF_OP_VV(PREFIX, NUM, TYPE, OP) \ void __attribute__ ((noinline, noclone)) \ PREFIX##_##TYPE##NUM (TYPE *restrict a, TYPE *

[PATCH] RISC-V: Support neg VLS auto-vectorization

2023-08-07 Thread Juzhe-Zhong
#include "riscv_vector.h" #define DEF_OP_V(PREFIX, NUM, TYPE, OP)\ void __attribute__ ((noinline, noclone)) \ PREFIX##_##TYPE##NUM (TYPE *restrict a, TYPE *restrict b)\ {

[PATCH] RISC-V: Allow CONST_VECTOR for VLS modes.

2023-08-08 Thread Juzhe-Zhong
This patch enables COSNT_VECTOR for VLS modes. void foo1 (int * __restrict a) { for (int i = 0; i < 16; i++) a[i] = 8; } void foo2 (int * __restrict a) { for (int i = 0; i < 16; i++) a[i] = i; } Compile option: -O3 --param=riscv-autovec-preference=scalable Before this patch:

[PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-08 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, this patch is adding loop len control on extract_last autovectorization. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ {

[PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Juzhe-Zhong
Realize we have a bug in VSETVL PASS which is triggered by strided_load_run-1.c in RV32 system. FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/

[PATCH] RISC-V: Support NPATTERNS = 1 stepped vector[PR110950]

2023-08-09 Thread Juzhe-Zhong
This patch fix ICE: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950 0x1cf8939 expand_const_vector ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1587 PR target/110950 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Add NPATTERNS = 1 stepped vector su

[PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, T

[PATCH] RISC-V: Add missing modes to the iterators

2023-08-10 Thread Juzhe-Zhong
gcc/ChangeLog: * config/riscv/vector-iterators.md: Add missing modes. --- gcc/config/riscv/vector-iterators.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 14829989e09..30808ceb241 100644 --- a/g

[PATCH] RISC-V: Support TU for integer ternary OP[PR110964]

2023-08-10 Thread Juzhe-Zhong
PR target/110964 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_cond_len_ternop): Add integer ternary. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr110964.c: New test. --- gcc/config/riscv/riscv-v.cc | 3 +-- .../gcc.target/riscv

[PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread Juzhe-Zhong
This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110962 SUBROUTINE a(b,c,d) LOGICAL,DIMENSION(INOUT) :: b LOGICAL e REAL, DIMENSION(IN) :: c REAL, DIMENSION(INOUT) :: d REAL, DIMENSION(SIZE(c)) :: f WHERE (b.AND.e) WHERE (f>=0.) d = g ENDWHER

[PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, T

[PATCH] VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns

2023-08-10 Thread Juzhe-Zhong
This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns. Here we want to support this following autovectorization: #include void foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict cond, int n) { for (intptr_t i = 0; i < n; ++i) { if (con

[PATCH] RISC-V: Fix vec_series expander[PR110985]

2023-08-11 Thread Juzhe-Zhong
This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985 PR target/110985 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Refactor the expander. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c: New test. --- gc

[PATCH V2] RISC-V: Allow CONST_VECTOR for VLS modes

2023-08-11 Thread Juzhe-Zhong
This patch enables COSNT_VECTOR for VLS modes. void foo1 (int * __restrict a) { for (int i = 0; i < 16; i++) a[i] = 8; } void foo2 (int * __restrict a) { for (int i = 0; i < 16; i++) a[i] = i; } Compile option: -O3 --param=riscv-autovec-preference=scalable Before this patch:

[PATCH] VECT: Fix ICE on MASK_LEN_{LOAD, STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Juzhe-Zhong
This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110989 This ICE is caused because of this situation: mask__49.21_99 = vect__17.19_96 == { 0.0, ... }; ... vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0); The MASK_LEN_LOAD is using re

[PATCH V2] VECT: Fix ICE on MASK_LEN_{LOAD, STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Juzhe-Zhong
This ICE is caused because of this situation: mask__49.21_99 = vect__17.19_96 == { 0.0, ... }; ... vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0); The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison wheras the LEN is the dummy

[PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, T

[PATCH] RISC-V: Add TAREGT_VECTOR check into VLS modes

2023-08-11 Thread Juzhe-Zhong
This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110994 This is caused VLS modes incorrect codes int register allocation. The original case trigger the ICE is fortran code but I can reproduce with a C code. PR target/110994 gcc/ChangeLog: * config/riscv/riscv-

[PATCH] RISC-V: Fix autovec_length_operand predicate[PR110989]

2023-08-12 Thread Juzhe-Zhong
Currently, autovec_length_operand predicate incorrect configuration is discovered in PR110989 since this following situation: vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0); ---> dummy length = VF. The current autovec length operand failed to recogniz

[PATCH] VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

2023-08-13 Thread juzhe . zhong
From: Ju-Zhe Zhong Hi, Richard and Richi. This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer. Consider this simple case: void __attribute__ ((noinline, noclone)) foo (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict d, int *__restri

[PATCH] genrecog: Add SUBREG_BYTE.to_constant check to the genrecog

2023-08-14 Thread Juzhe-Zhong
Hi, there is genrecog issue happens in RISC-V backend. This is the ICE info: 0xfa3ba4 poly_int_pod<2u, unsigned short>::to_constant() const ../../../riscv-gcc/gcc/poly-int.h:504 0x28eaa91 recog_5 ../../../riscv-gcc/gcc/config/riscv/bitmanip.md:314 0x28ec5b4 recog_7 ../../.

[PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-14 Thread Juzhe-Zhong
This patch is depending on middle-end support: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html This patch allow us auto-vectorize this following case: #define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ void __attribute__ ((noinline, noclone))

[PATCH V2] VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

2023-08-15 Thread Juzhe-Zhong
Hi, Richard and Richi. This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer. Consider this simple case: void __attribute__ ((noinline, noclone)) foo (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict d, int *__restrict e, int *__restrict

[PATCH V2] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Juzhe-Zhong
This patch allow us auto-vectorize this following case: #define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ void __attribute__ ((noinline, noclone)) \ NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \

[PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-16 Thread Juzhe-Zhong
Hi, Richard and Richi. Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. Consider this following case: #define TEST_TYPE(TYPE)

[PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

2023-08-16 Thread Juzhe-Zhong
This patch is depending on middle-end patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns. Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold. gcc/ChangeLog: * con

[PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-16 Thread Juzhe-Zhong
void foo(_Float16 y, int64_t *i64p) { vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1); vx = __riscv_vadd_vv_i64m1 (vx, vx, 1); vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1); asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } zve64f: foo: vsetivlizero,1,e16,mf4,ta,ma

Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-18 Thread Juzhe Zhong
Thanks for Richi. I will wait for Richard's comments and fix for both of you then send V2 patch.

[PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-20 Thread Juzhe-Zhong
void foo(_Float16 y, int64_t *i64p) { vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1); vx = __riscv_vadd_vv_i64m1 (vx, vx, 1); vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1); asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } zve64f: foo: vsetivlizero,1,e16,mf4,ta,ma

[PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-20 Thread Juzhe-Zhong
This patch exports 'compute_antinout_edge' and 'compute_earliest' as global scope which is going to be used in VSETVL PASS of RISC-V backend. The demand fusion is the fusion of VSETVL information to emit VSETVL which dominate and pre-config for most of the RVV instructions in order to elide redu

Re: [PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-20 Thread Juzhe Zhong
I am so sorry sending the wrong and duplicate patch. Forget about this patch.

<    1   2   3   4   5   6   7   8   9   10   >