From: Juzhe-Zhong
This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 &&
Phase 6
are quite messy and cause some bugs discovered by my downstream
auto-vectorization
test-generator.
Before this patch.
Phase 5 is cleanup_insns is the function remove AVL op
From: Juzhe-Zhong
This patch fixes the requirement of V_WHOLE and V_FRACT.
E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractio
From: Juzhe-Zhong
Address comments from Jeff.
This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 &&
Phase 6
are quite messy and cause some bugs discovered by my downstream
auto-vectorization
test-generator.
Before this patch.
Phase 5 is cleanup_insns
From: Juzhe-Zhong
Consider this following example:
void vec_add(int32_t *restrict c, int32_t *restrict a, int32_t *restrict b,
int N) {
for (long i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
}
After this patch:
vec_add:
ble a3,zero,.L5
.L3:
vsetvli a5
From: Ju-Zhe Zhong
Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:
loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)
How
From: Juzhe-Zhong
Optimize the following auto-vectorization codes:
void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
{
for (int i = 0; i < n; i++)
a[i] = b[i] >> c;
}
Before this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32
From: Ju-Zhe Zhong
Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:
loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)
How
From: Juzhe-Zhong
To be safe, add ZVFHMIN autovec block testcase to make sure
we won't enable autovec in ZVFHMIN by mistakes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test.
---
.../gcc.target/riscv/rvv/autovec/zvfhmin-1.c | 34 +
From: Juzhe-Zhong
To be safe, add ZVFHMIN autovec block testcase to make sure
we won't enable autovec in ZVFHMIN by mistakes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test.
---
.../gcc.target/riscv/rvv/autovec/zvfhmin-1.c | 35 +
From: Juzhe-Zhong
According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing
vdecompress)
Decompress operation.
Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35,
From: Juzhe-Zhong
According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing
vdecompress)
Decompress operation.
Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35,
From: Juzhe-Zhong
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::single_step_npatterns_p): Add
comment.
(shuffle_generic_patterns): Ditto.
(expand_vec_perm_const_1): Ditto.
---
gcc/config/riscv/riscv-v.cc | 7 +++
1 file changed, 7 insertions(+)
diff
From: Juzhe-Zhong
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/partial
From: Juzhe-Zhong
Sorry for producing bugs in the previous VLA SLP patch.
Consider this following permutation:
_85 = VEC_PERM_EXPR <{ 99, 17, ... }, { 11, 80, ... }, { 0, POLY_INT_CST [4,
4], 1, POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>;
The correct result should be:
_85 = {
From: Juzhe-Zhong
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/partial
From: Juzhe-Zhong
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-10.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-11.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-13.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp
From: Juzhe-Zhong
This patch is to optimize the permuation case that is suiteable use
merge approach.
Consider this following case:
typedef int8_t vnx16qi __attribute__((vector_size (16)));
#define MASK_16 0, 17, 2, 19, 4, 21, 6, 23, 8, 25, 10, 27, 12, 29, 14,
31
void __attribute__
From: Ju-Zhe Zhong
Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.
The description of the patterns in doc are coming Robin.
Af
From: Ju-Zhe Zhong
This patch bootstrap pass on X86, ok for trunk ?
Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.
The descri
From: Ju-Zhe Zhong
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spu
This patch is depending on the following patch on Vectorizer:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624179.html
With this patch, we can handle operations may trap on elements outside the loop.
These 2 following cases will be addressed by this patch:
1. integer division:
#define
From: Ju-Zhe Zhong
This patch is adding an obvious missing mult_high auto-vectorization pattern.
Consider this following case:
#define DEF_LOOP(TYPE) \
void __attribute__ ((noipa))\
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count)
From: Ju-Zhe Zhong
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spu
From: Ju-Zhe Zhong
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spu
From: Ju-Zhe Zhong
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spu
This middle-end has been merged:
https://github.com/gcc-mirror/gcc/commit/0d4dd7e07a879d6c07a33edb2799710faa95651e
With this patch, we can handle operations may trap on elements outside the loop.
These 2 following cases will be addressed by this patch:
1. integer division:
#define TEST_TYP
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Previous patch we support COND_LEN_* binary operations. However, we didn't
support COND_LEN_* ternary.
Now, this patch support COND_LEN_* ternary. Consider this following case:
#define TEST_TYPE(TYPE)
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Previous patch we support COND_LEN_* binary operations. However, we didn't
support COND_LEN_* ternary.
Now, this patch support COND_LEN_* ternary. Consider this following case:
#define TEST_TYPE(TYPE)
Enable COND_LEN_FMA auto-vectorization for floating-point FMA
auto-vectorization **NO** ffast-math.
Since the middle-end support has been approved and I will merge it after I
finished bootstrap && regression on X86.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624395.html
Now, it's time
Add comments as Robin's suggestion in scatter_store_run-7.c
Enable COND_LEN_FMA auto-vectorization for floating-point FMA
auto-vectorization **NO** ffast-math.
Since the middle-end support has been approved and I will merge it after I
finished bootstrap && regression on X86.
https://gcc.gnu.org
From: Ju-Zhe Zhong
This patch add reduc_*_scal to support reduction auto-vectorization.
Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.
Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x,
int32_t n, int res)
{
for (int i =
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch adds mask_len_fold_left_plus pattern to support in-order
floating-point
reduction for target support len loop control.
Consider this following case:
double
foo2 (double *__restrict a,
double init,
int *__restrict cond,
int n)
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_option_override): Report ERROR for
TARGET_MIN_VLEN > 4096
---
gcc/config/riscv/riscv.cc | 8
1 file changed, 8 insertions(+)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6ed735d6983..ce523eea9ba 100644
-
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_option_override): Add TARGET_MIN_VLEN <
4096 check.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvl-unimplemented.c: New test.
---
gcc/config/riscv/riscv.cc | 8
.../gcc.target/risc
This patch add reduc_*_scal to support reduction auto-vectorization.
Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.
Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x,
int32_t n, int res)
{
for (int i = 0; i < n; ++i)
r
Hi, Richard.
RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS
(inserted after RA) ICE:
rvv.c:13:1: internal compiler error: in
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_option_override): Add sorry check.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test.
* gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test.
---
gcc/config/riscv/riscv.cc
From: Ju-Zhe Zhong
Hi, Richard.
RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS
(inserted after RA) ICE:
rvv.c:13:1: intern
This patch is to enable SLP un-order reduction autao-vectorization
Consider this following case:
int __attribute__((noipa))
add_loop (int *x, int n, int res)
{
for (int i = 0; i < n; ++i)
{
res += x[i * 2];
res += x[i * 2 + 1];
}
return res;
}
--param riscv-autovec-prefer
This patch is to dynamic adjust size of VLA vectors according to
TARGET_MIN_VLEN (-march=*zvl*b).
Currently, VNx16QImode is always [16,16] when TARGET_MINV_LEN >= 128.
We are going to add a bunch of VLS modes (V16QI,V32QI,etc), these modes
should always be considered
as having smaller size
This patch is to enable SLP un-order reduction autao-vectorization
Consider this following case:
int __attribute__((noipa))
add_loop (int *x, int n, int res)
{
for (int i = 0; i < n; ++i)
{
res += x[i * 2];
res += x[i * 2 + 1];
}
return res;
}
--param riscv-autovec-prefer
+306,7 @@ register allocation Peter Bergner
register allocationKenneth Zadeck
register allocationSeongbae Park
riscv port Robin Dapp
+riscv port Juzhe Zhong
RTL optimizers Steven Bosscher
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch support floating-point in-order reduction for loop length control.
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
When compile
This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
Compile with **NO** -ffast-math:
Before thi
From: Ju-Zhe Zhong
Hi, Richard and Richi.
I plan to refine the codes that I recently support for RVV auto-vectorization.
This patch is inspired last review comments from Richard:
https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/
Richard said he pre
This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
Compile with **NO** -ffast-math:
Before thi
Hi.
Since start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE, COND_LEN_*
patterns,
the order of len and mask is {mask,len,bias}.
The reason we make "mask" argument comes before "len" is because we want to keep
the "mask" location same as mask_* or cond_* patterns to make use of current
code
This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625121.html
Hi, Richard and Richi.
This patch is to align the order of mask and len.
Currently, According to this piece code:
if (final_len && final_mask)
call = gimp
Hi, Richard and Richi.
I have double check the recent codes for len && mask support again.
Some places code structure:
if (len_mask_fn)
...
else if (mask_fn)
...
some places code structure:
if (mask_len_fn)
...
else if (mask)
Base on previous review comment from Richi:
https://gcc.gnu.org/pip
Notice there is mistakes for RISC-V I made in the last patch.
Fix it. Sorry about that.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_gather_scatter): Remove redundant
variables.
---
gcc/config/riscv/riscv-v.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch support floating-point in-order reduction for loop length control.
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
When compile
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch support floating-point in-order reduction for loop length control.
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
When compile
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch support floating-point in-order reduction for loop length control.
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
When compile
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch supports CALL vectorization by COND_LEN_*.
Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond,
int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
a[i] = b[i] + a[i];
}
Outpu
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Base on previous disscussions, we should make COND_* and COND_LEN_*
consistent.
So, this patch define these internal function together by these 2
wrappers:
#ifndef DEF_INTERNAL_COND_FN
#define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE)
Hi, Richard and Richi.
Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
This patch choose (1) approach that Richard provided, meaning:
RVV implements cond_* optabs as expanders. RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.
This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625696.html
Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond,
int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
a[i] = b[i] + a[i
Consider this following case:
void
foo (int8_t *in, int8_t *out, int8_t x)
{
for (int i = 0; i < 16; i++)
in[i] = x;
}
Compile option: --param=riscv-autovec-preference=scalable -fno-builtin
Before this patch:
foo:
li a5,16
csrra4,vlenb
vsetvli a3,zero,e8,m1
Consider this following case:
void
foo (int8_t *in, int8_t *out, int8_t x)
{
for (int i = 0; i < 16; i++)
in[i] = x;
}
Compile option: --param=riscv-autovec-preference=scalable -fno-builtin
Before this patch:
foo:
li a5,16
csrra4,vlenb
vsetvli a3,zero,e8,m1
This patch is inspired by "lowerCTPOP" in LLVM.
Support popcount auto-vectorization by following LLVM approach.
https://godbolt.org/z/3K3GzvY7f
Before this patch:
:7:21: missed: couldn't vectorize loop
:8:14: missed: not vectorized: relevant stmt not supported: _5 =
__builtin_popcount (_4);
Aft
Fix bugs:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ???void
riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)???:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for
member ???require??? in ???riscv_vector::get_mask_mode(dest_mode)???
This patch is inspired by "lowerCTPOP" in LLVM.
Support popcount auto-vectorization by LLVM approach.
Before this patch:
:7:21: missed: couldn't vectorize loop
:8:14: missed: not vectorized: relevant stmt not supported: _5 =
__builtin_popcount (_4);
After this patch:
popcount_32:
ble
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
This patch choose (1) approach that Richard provided, meaning:
RVV implements cond_* optabs as expanders. RVV therefore supports
both IFN_COND_ADD an
From: zhongjuzhe
gcc/ChangeLog:
* expr.cc (expand_assignment): Change GET_MODE_PRECISION to
GET_MODE_BITSIZE
---
gcc/expr.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5..ac2b3c07df6 100644
--- a/gcc/expr.cc
+++ b/gcc/
From: zhongjuzhe
Hi, variable "bitpos" is compute using bitsize. I think it makes
sense for bit position checking whether it is out-of-bounds to
array using GET_MODE_BITSIZE instead of GET_MODE_PRECISION.
This patch is useful for RVV (RISC-V 'V') support that I am
going to push upstream. Thanks!
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
This patch choose (1) approach that Richard provided, meaning:
RVV implements cond_* optabs as expanders. RVV therefore supports
both IFN_COND_ADD an
This patch support VLS modes auto-vectorization to enhance VLA
auto-vectorization
when niters is known.
Consider this following case:
#include
#define DEF_OP_VV(PREFIX, NUM, TYPE, OP) \
void __attribute__ ((noinline, noclone))
From: Ju-Zhe Zhong
Hi, Richard and Richi.
Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
This patch choose (1) approach that Richard provided, meaning:
RVV implements cond_* optabs as expanders. RVV therefore supports
both IFN_COND_ADD an
Consider this following case:
#include
#define TEST2_TYPE(TYPE)\
__attribute__((noipa))\
void vshiftr_##TYPE (TYPE *__restrict dst, TYPE *__restrict a, TYPE
*__restrict b, int n) \
{
After this patch, this following case will be well optimized:
#include "riscv_vector.h"
#define DEF_OP_VV(PREFIX, NUM, TYPE, OP) \
void __attribute__ ((noinline, noclone)) \
PREFIX##_##TYPE##NUM (TYPE *restrict a, TYPE *
#include "riscv_vector.h"
#define DEF_OP_V(PREFIX, NUM, TYPE, OP)\
void __attribute__ ((noinline, noclone)) \
PREFIX##_##TYPE##NUM (TYPE *restrict a, TYPE *restrict b)\
{
This patch enables COSNT_VECTOR for VLS modes.
void foo1 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
a[i] = 8;
}
void foo2 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
a[i] = i;
}
Compile option: -O3 --param=riscv-autovec-preference=scalable
Before this patch:
From: Ju-Zhe Zhong
Hi, this patch is adding loop len control on extract_last autovectorization.
Consider this following case:
#include
#define EXTRACT_LAST(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
test_##TYPE (TYPE *x, int n, TYPE value) \
{
Realize we have a bug in VSETVL PASS which is triggered by strided_load_run-1.c
in RV32 system.
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/
This patch fix ICE: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950
0x1cf8939 expand_const_vector
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1587
PR target/110950
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Add NPATTERNS = 1
stepped vector su
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
Consider this following case:
#include
#define EXTRACT_LAST(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
test_##TYPE (TYPE *x, int n, T
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Add missing modes.
---
gcc/config/riscv/vector-iterators.md | 3 +++
1 file changed, 3 insertions(+)
diff --git a/gcc/config/riscv/vector-iterators.md
b/gcc/config/riscv/vector-iterators.md
index 14829989e09..30808ceb241 100644
--- a/g
PR target/110964
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_cond_len_ternop): Add integer ternary.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr110964.c: New test.
---
gcc/config/riscv/riscv-v.cc | 3 +--
.../gcc.target/riscv
This patch fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110962
SUBROUTINE a(b,c,d)
LOGICAL,DIMENSION(INOUT) :: b
LOGICAL e
REAL, DIMENSION(IN) :: c
REAL, DIMENSION(INOUT) :: d
REAL, DIMENSION(SIZE(c)) :: f
WHERE (b.AND.e)
WHERE (f>=0.)
d = g
ENDWHER
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
Consider this following case:
#include
#define EXTRACT_LAST(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
test_##TYPE (TYPE *x, int n, T
This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization
patterns.
Here we want to support this following autovectorization:
#include
void
foo (int8_t *__restrict a,
int8_t *__restrict b,
int8_t *__restrict cond,
int n)
{
for (intptr_t i = 0; i < n; ++i)
{
if (con
This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985
PR target/110985
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_series): Refactor the expander.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c: New test.
---
gc
This patch enables COSNT_VECTOR for VLS modes.
void foo1 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
a[i] = 8;
}
void foo2 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
a[i] = i;
}
Compile option: -O3 --param=riscv-autovec-preference=scalable
Before this patch:
This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110989
This ICE is caused because of this situation:
mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
...
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99,
POLY_INT_CST [2, 2], 0);
The MASK_LEN_LOAD is using re
This ICE is caused because of this situation:
mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
...
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99,
POLY_INT_CST [2, 2], 0);
The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison
wheras the LEN
is the dummy
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
Consider this following case:
#include
#define EXTRACT_LAST(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
test_##TYPE (TYPE *x, int n, T
This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110994
This is caused VLS modes incorrect codes int register allocation.
The original case trigger the ICE is fortran code but I can reproduce
with a C code.
PR target/110994
gcc/ChangeLog:
* config/riscv/riscv-
Currently, autovec_length_operand predicate incorrect configuration is
discovered in PR110989 since this following situation:
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99,
POLY_INT_CST [2, 2], 0); ---> dummy length = VF.
The current autovec length operand failed to recogniz
From: Ju-Zhe Zhong
Hi, Richard and Richi.
This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer.
Consider this simple case:
void __attribute__ ((noinline, noclone))
foo (int *__restrict a, int *__restrict b, int *__restrict c,
int *__restrict d, int *__restri
Hi, there is genrecog issue happens in RISC-V backend.
This is the ICE info:
0xfa3ba4 poly_int_pod<2u, unsigned short>::to_constant() const
../../../riscv-gcc/gcc/poly-int.h:504
0x28eaa91 recog_5
../../../riscv-gcc/gcc/config/riscv/bitmanip.md:314
0x28ec5b4 recog_7
../../.
This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html
This patch allow us auto-vectorize this following case:
#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
void __attribute__ ((noinline, noclone))
Hi, Richard and Richi.
This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer.
Consider this simple case:
void __attribute__ ((noinline, noclone))
foo (int *__restrict a, int *__restrict b, int *__restrict c,
int *__restrict d, int *__restrict e, int *__restrict
This patch allow us auto-vectorize this following case:
#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
void __attribute__ ((noinline, noclone)) \
NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \
Hi, Richard and Richi.
Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
Consider this following case:
#define TEST_TYPE(TYPE)
This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html
We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.
Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the
COND_LEN_FMS/COND_LEN_FNMS STMT fold.
gcc/ChangeLog:
* con
void foo(_Float16 y, int64_t *i64p)
{
vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}
zve64f:
foo:
vsetivlizero,1,e16,mf4,ta,ma
Thanks for Richi.
I will wait for Richard's comments and fix for both of you then send V2
patch.
void foo(_Float16 y, int64_t *i64p)
{
vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}
zve64f:
foo:
vsetivlizero,1,e16,mf4,ta,ma
This patch exports 'compute_antinout_edge' and 'compute_earliest' as global
scope
which is going to be used in VSETVL PASS of RISC-V backend.
The demand fusion is the fusion of VSETVL information to emit VSETVL which
dominate and pre-config for most
of the RVV instructions in order to elide redu
I am so sorry sending the wrong and duplicate patch.
Forget about this patch.
101 - 200 of 1101 matches
Mail list logo