This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal c
This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal c
FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects scan-tree-dump slp2
"unsupported unaligned access"
FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned access"
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
XPASS: gcc.dg/ve
This patch fix unitialized probability in GIMPLE IR code tests:
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-o
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
XPASS: gcc.dg/
Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation
fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal
compiler error: Segmentation fault
Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2
scan-assembler-times vsetvli\\s+zero,\\s*zero 2
Fix "2" into "3", the assembly is correct and better.
Committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c:
We are planning to enable "vect" testsuite with scalable vector
auto-vectorization.
This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
like ARM SVE.
---
gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 2 +-
1 file changed, 1 insert
This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.
Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug.
Such patch disable vectorization for misalign data movement.
After I check LLVM codes, LLVM supports misalign for VLS modes.
Before this patch:
sll a5,a4,0x1
add a5,a5,a1
lhu a3,64(a5)
lbu a5,66(a5)
Fix ICE in "vect" testsuite:
FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at
df-scan.cc:2958)
FAIL: gcc.dg/vect/pr64495.c (test for excess errors
After this patch, all current found VSETVL PASS related bugs in "vect" are
fixed.
gcc/ChangeLog:
* config/riscv
Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.
Bootstrap and Regression on X86 passed.
Ok for trunk?
gcc/ChangeLog:
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_m
Add vect_strided and vect_widen so that we will remove these following failures:
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c scan-tree-dump-times vect
"vectorized
Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.
Ok for trunk ?
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c
XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects
scan-tree-dump-times vect "OUTER LOOP
Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
Like ARM SVE, add RVV variable length xfail.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-reduc-7.c: Add RVV.
---
gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
b/gcc/testsuite/gcc.dg/vec
This patch is the final version of enabling vect_int test for RVV.
There are still 80+ FAILs and they can't be fixed by adjusting testcases or
target-supports.exp
Here is the analysis of **ALL** FAILs:
1. REAL highest priority FAILs:
ICE:
FAIL: gcc.dg/vect/vect-live-6.c (internal compiler
SC-V 'V' Extension for GNU compiler.
+ Copyright (C) 2023-2023 Free Software Foundation, Inc.
+ Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms
since we have added COST framework, we by default enable VECT_COMPARE_COSTS.
Also, add 16/32/64 to provide more choices for COST comparison.
This patch doesn't change any behavior from the current testsuite since we are
using
default COST model.
gcc/ChangeLog:
* config/riscv/riscv-v.cc
We are going to support dynamic LMUL support.
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add dynamic
enum.
* config/riscv/riscv.opt: Add dynamic compile option.
---
gcc/config/riscv/riscv-opts.h | 4 +++-
gcc/config/riscv/riscv.opt| 3 +++
2
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic
status.
* config/riscv/riscv-v.cc (preferred_simd_mode): Ditto.
(autovectorize_vector_modes): Ditto.
(vectorize_related_mode): Ditto.
---
gcc/config/riscv/riscv-opts.h | 2 +-
This patch support dynamic LMUL cost modeling with
--param=riscv-autovec-lmul=dynamic.
Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
int32_t *__res
Notice those functions need to be use by COST model for dynamic LMUL use.
Extract as a single patch and committed.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export
global.
(get_all_predecessors): New function.
(get_all_successors): Ditto
This patch support dynamic LMUL cost modeling with
--param=riscv-autovec-lmul=dynamic.
Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
int32_t *__res
This patch fix incorrect mode tieable between DI and V2SI which cause ICE
in RA.
PR target/111296
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_modes_tieable_p): Fix bug.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/base/pr111296.C: New test.
---
gcc/config/riscv/r
Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit
in RV32 system
E.g. V8QI
Since I realized such modes may cause inferior codegen for some situations in
RV32 system.
However, this is really quite ugly and it cause ICE for some cases in RV32:
FAIL: gcc.target/riscv/r
Fix bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111295
PR target/111295
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (insert_vsetvl):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr111295.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc
This patch fix incorrect mode tieable between DI and V2SI which cause ICE
in RA.
PR target/111296
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_modes_tieable_p): Fix incorrect mode
tieable for RVV modes.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/base/pr111296.C:
This patch removes the incorrect earliest poset vsetvl optimization,
such bug was found in vect-double-reduc-5.c which is runtime(execution fail)
and also in PR111313.
For VLMAX intrinsics, we always emit a bogus patter which is vlmax_avl (see
vector.md) to
occupy a scalar register which is used
This patch is not ready but they all will be fixed very soon.
gcc/ChangeLog:
* config/riscv/riscv.opt: Set default as scalable vectorization.
---
gcc/config/riscv/riscv.opt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/
This patterns fix these following ICE FAILs when running the whole GCC testsuite
with enabling scalable vector by default.
All of these FAILs are fixed:
FAIL: c-c++-common/opaque-vector.c -std=c++14 (internal compiler error: in
emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vec
This patch fixes these following FAILs:
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (internal compiler error:
in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O1 (internal comp
This patch fixes obvious bug: TARGET_MIN_VLEN is bitsize.
All these following bugs are fixed with this patch:
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (internal compiler error:
in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (test for excess err
This patch fixes over 100+ bogus FAILs due to experimental vector ABI warning.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_pass_in_vector_p): Only allow RVV type.
---
gcc/config/riscv/riscv.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv.c
VLS vfadd should depend on ZVFH instead of ZVFHMIN.
Obvious fix and committed.
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Fix floating-point operations
predicate.
---
gcc/config/riscv/vector-iterators.md | 24
1 file changed, 12 insertions(+), 12 deleti
This patch add VLS modes VEC_PERM support which fix these following
FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311:
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
"BIT_INSERT_EX
To make the dump FILE not too big, add TDF_DETAILS.
This patch fix these following FAILs in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311
FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions comparison
FAI
When debugging FAIL: gcc.dg/pr92301.c execution test.
Realize a vls vector permutation situation failed to vectorize since early
return false:
- /* For constant size indices, we dont't need to handle it here.
- Just leave it to vec_perm. */
- if (d->perm.length ().is_constant ())
-retu
If a const vector all elements are same, the slide up is unnecessary.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid
unnecessary slideup.
---
gcc/config/riscv/riscv-v.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/riscv/ri
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid
unnecessary slideup.
---
gcc/config/riscv/riscv-v.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index bee60de1d26..3cd1f61de0e
Committed.
gcc/ChangeLog:
* config/riscv/autovec-vls.md (*mov_vls): New pattern.
* config/riscv/vector-iterators.md: New iterator
---
gcc/config/riscv/autovec-vls.md | 8
gcc/config/riscv/vector-iterators.md | 15 +++
2 files changed, 23 insertions(+)
This patch add VLS modes VEC_PERM support which fix these following
FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311:
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
"BIT_INSERT_EX
I found that it's more reasonable to use existing dominance analysis.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::global_eliminate_vsetvl_insn): Use dominance analysis.
(pass_vsetvl::init): Ditto.
(pass_vsetvl::done): Ditto.
---
gcc/config/riscv/riscv-vs
I just finished V2 version of LMUL cost model.
Turns out we don't these redundant functions.
Remove them.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (get_all_predecessors): Remove.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
This patch support dynamic LMUL cost modeling with
--param=riscv-autovec-lmul=dynamic.
Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
int32_t *__res
This patch support dynamic LMUL cost modeling with
--param=riscv-autovec-lmul=dynamic.
Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
int32_t *__res
RVV is variable length vector but also has 256 bit VLS mode vector.
This test is vectorized as:
f:
vsetivlizero,8,e32,m2,ta,ma
vle32.v v2,0(a0)
vmv.v.i v4,1
vle16.v v1,0(a1)
vmseq.vvv0,v2,v4
vsetvli zero,zero,e16,m1,ta,ma
vmse
RVV is variable length vector but also has 256 bit VLS mode vector.
This test is vectorized as:
f:
vsetivlizero,8,e32,m2,ta,ma
vle32.v v2,0(a0)
vmv.v.i v4,1
vle16.v v1,0(a1)
vmseq.vvv0,v2,v4
vsetvli zero,zero,e16,m1,ta,ma
vmse
This patch fixes:
FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump-times
vect "loop vectorized" 1
FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1
For RVV, "loop vectorized" appears 2 times instead of 1. Because:
optimized: loop vectorized u
Like s390, add riscv to fix the fail.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-39.c: Add RISCV.
---
gcc/testsuite/gcc.dg/vect/bb-slp-39.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-39.c
b/gcc/testsuite/gcc.dg/vect/bb-sl
RVV didn't explicitly enable SAD optab but we can vectorize it
since loop vectorizer is able to recognize SAD pattern for RVV during analysis.
Current scan check of explicit SAD pattern looks odd,
it should be more reasonable to check recognition of SAD pattern during Loop
vectorize analysis.
Ot
RVV didn't explictly enable DIV_POW2 optab but we cen vectorize it.
We should check pattern recognition instead of explicit pattern check.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-sdiv-pow2-1.c: Fix dump check.
---
gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
1 file changed,
This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV.
The reason is RVV has 512 bit vector.
Here is comparison between RVV ans ARM SVE:
https://godbolt.org/z/xc5KE5rPs
But I notice AMDGCN also has 512 bit vector, seems this patch will cause FAIL
in GCN ?
Not sure whether GCN
Previously, in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635392.html
I use vect64 && vect128 to represent both RVV and AMDGCN. However, it caused
additional FAIL on ARM SVE.
I don't know why ARM SVE vect64 is set as true since their AdvSIMD is 128bit
vector and they don
As https://godbolt.org/z/hPsqahEa5 shows.
RVV failed dump check since "vectorizing stmts using SLP" shows 3 times instead
of 2.
The root cause is this code in main:
if (a[0] != 1
|| a[1] != 2
|| a[2] != 3
|| a[3] != 4
|| a[4] != 7
|| a[5] != 0
|| a[6] != 0
PR target/112420
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr65518.c: Fix check for RVV.
---
gcc/testsuite/gcc.dg/vect/pr65518.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/pr65518.c
b/gcc/testsuite/gcc.dg/vect/pr65518.c
index 3
Like all other targets, we add RISC-V into vect_cmdline_needed.
This patch fixes following FAILs:
FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0
loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect "vectorized 0
loops" 1
FAIL: gcc.dg/tree-ssa/gen
This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV.
The reason is RVV has 512 bit vector.
Here is comparison between RVV ans ARM SVE:
https://godbolt.org/z/xc5KE5rPs
Confirm GCN also matches 4 SLP. This patch is passed on both GCN and RVV.
Ok for trunk ?
gcc/testsuite/Chang
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-sdiv-pow2-1.c: Recover scan check.
* lib/target-supports.exp: Remove riscv.
---
gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
gcc/testsuite/lib/target-supports.exp| 4 +---
2 files changed, 2 insertions(+), 4 deletions(-)
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-33.c: Rewrite the condition.
---
gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 35 ---
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
b/gcc/testsuite/gcc.dg/vect/bb-slp-
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr97428.c: Add additional compile option for riscv.
---
gcc/testsuite/gcc.dg/vect/pr97428.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c
b/gcc/testsuite/gcc.dg/vect/pr97428.c
index ad6416096aa..60dd984cfd3
Since our user vsetvl intrinsics are defined as just calculate the VL output
which is the number of the elements to be processed. Such intrinsics do not
have any side effects. We should normalize them when they have same ratio.
E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
Nor
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
SELECT_VL result is not necessary always VF in non-final iteration.
Current GIMPLE IR is wrong:
# vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)>
...
_24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... };
After this patch which is
When fixing the induction variable vectorization bug, notice there is a ICE bug
in VSETVL PASS:
0x178015b rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int,
char const*)
../../../../gcc/gcc/rtl.cc:770
0x1079cdd rhs_regno(rtx_def const*)
../../../../gcc/gcc/rtl.h:19
This patch just adapt dynamic LMUL tests for following preparing patches.
Committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rv
When trying to use dynamic LMUL to compile benchmark.
Notice there is a bunch ICEs.
This patch fixes those ICEs and append tests.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Fix
ICE.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/
ICE has been fixed by
Richard:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450.
Add test to avoid future regression. Committed.
PR target/112450
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112450.c: New test.
---
.../gcc.target/riscv/rvv/autovec/pr112450.c
Since cond_copysign has been support into match.pd (middle-end).
We don't need to support conditional copysign by RTL combine pass.
Instead, we can support it by direct explicit cond_copysign optab.
conditional copysign tests are already available in the testsuite.
No need to add tests.
gcc/Chan
Although current GCC didn't cause ICE when I create FP16 vec_init case
with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong.
Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init
needs vfslide1down/vfslide1up.
It makes more sense to robustify the vec_init patterns
This patch is a small optimization for vector initialization.
Discovered when I am evaluating benchmarks.
Consider this following case:
void foo3 (int8_t *out, int8_t x, int8_t y)
{
v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x};
*(v16qi*)out = v;
}
Before this patch:
vse
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
1. Since SELECT_VL result is not necessary always VF in non-final iteration.
Current GIMPLE IR is wrong:
# vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
...
_35 = .SELECT_VL (ivtmp_33, VF);
_21 = vect_vec_iv_.8_22 + { VF, ... };
E.
As PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112469
which has been fixed by Richard patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635994.html
Add tests to avoid regression. Committed.
PR target/112469
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autov
Realize that init tests are wrong by my previous mistakes.
Fix them and committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Fix init test.
* gcc.target/riscv/rvv/autovec/vls/init-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/init-2.c: Ditto.
Notice the assembly check of init-2.c is wrong.
Committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/init-2.c: Fix vid.v check.
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/init-2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuit
Strided load/store has been approved.
Rebase on V3 and adapt for middle-end IR change.
Will commit after middle-end patche is approved.
gcc/ChangeLog:
* config/riscv/autovec.md (mask_len_strided_load_): New pattern.
(mask_len_strided_store_): Ditto.
* config/riscv/riscv-
This patch adds mask_len_strided_load/mask_len_strided_store.
Document already has been reviewed.
This patch adds OPTAB/IFN support as follows:
1. strided load
GIMPLE level:
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
be expand (by internal-fn.cc) into:
v = mask_len_strided_load
This patch support generating MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE IR
for invariant stride memory access.
It's a special optimization for targets like RVV.
RVV has both indexed load/store and stride load/store.
In RVV, we always have gather/scatter and strided optab at the same time.
E.
This patch is quite obvious patch which disallow for load/store address register
with RVV mode.
PR target/112535
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimate_address_p): Disallow RVV
modes base address.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autove
This patch fixes ICE:
https://godbolt.org/z/z8T6o6qov
: In function 'b':
:2:6: error: missing definition
2 | void b() {
| ^
for SSA_NAME: loop_len_8 in statement:
_1 = -loop_len_8;
during GIMPLE pass: vect
:2:6: internal compiler error: verify_ssa failed
0x7f1b56331082 __libc_start_
When evaluating dynamic LMUL, notice we can do better on VLA SLP with duplicate
VLA shuffle indice.
Consider this following case:
void
foo (uint16_t *restrict a, uint16_t *restrict b, int n)
{
for (int i = 0; i < n; ++i)
{
a[i * 8] = b[i * 8 + 3] + 1;
a[i * 8 + 1] = b[i * 8 + 6
Fix segment fault on tuple move:
bbl loader
z ra 000102ac sp 003ffaf0 gp 0001c0b8
tp t0 000104a0 t1 000f t2
s0 s1 a0 003ffb30 a1 003ffb58
a2 000
Fix segment fault on tuple move:
bbl loader
z ra 000102ac sp 003ffaf0 gp 0001c0b8
tp t0 000104a0 t1 000f t2
s0 s1 a0 003ffb30 a1 003ffb58
a2 000
This patch refactors RVV iteratros for easier maintain.
E.g.
(define_mode_iterator V [
RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN
> 32")
RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
(RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVV
PR target/112561
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_tuple_move): Fix bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112561.c: New test.
---
gcc/config/riscv/riscv-v.cc | 4
.../gcc.target/riscv/rvv/autovec/pr11
This optimization was discovered in the tuple move splitted bug fix patch.
Before this patch:
vsetivlizero,4,e16,mf2,ta,ma
lhu a3,96(a5)
vlseg8e16.v v1,(a5)
lw a4,%lo(e)(a2)
vsetvli a6,zero,e64,m2,ta,ma
addia0,a7,8
vse16.v v
The slide1 attributes are wrong for SEW=64 VLS modes.
This patch fixes the following FAILs:
FAIL: gcc.c-torture/execute/pr89369.c -O2 execution test
FAIL: gcc.c-torture/execute/pr89369.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr8936
Since we already set scalable vectorization by default, this flag is redundant.
Also, we are start to full coverage testing with different compile option.
E.g --param=riscv-autovec-preference=fixed-vlmax.
To avoid compile option confusion. Remove it.
gcc/testsuite/ChangeLog:
* lib/target
This bug was discovered on PR112597, with -march=rv32gcv_zvl256b
--param=riscv-autovec-preference=fixed-vlmax
ICE:
bug.c:10:1: error: unrecognizable insn:
10 | }
| ^
(insn 10 9 11 2 (set (reg:V4SI 140)
(unspec:V4SI [
(unspec:V4BI [
(const_v
The current test value check is incorrect which is exposed on
-march=rv64gcv_zvl256b
Confirm on X86 also abort:
[jzzhong@rios-cad121:/work/home/jzzhong/work/insn]$./a.out
--33.00,4078.00,45001776.00,63369904.00---
Aborted (core dumped)
Adapt the value check according to X86 r
This bug is exposed when testing on zvl512b RV32 system.
The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE.
So disallow DI CONST_VECTOR on RV32.
PR target/112598
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Disallow DI CONST_VECTOR
on RV32.
gc
Notice the dump check is missing, add it.
Committed as it is obvious.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112438.c: Add missing dump check.
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/gcc/testsuite/g
This patch fixes following FAILs on zvl512b:
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/
This patch fixes following FAILs on zvl512b of RV32 system:
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c execution test
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c execution test
The root cause is that for permutation indice = {0,3,7,0} use vcompress
optimizat
This patch is NFC patch to refine unreasonable codes I notice.
Tested on zvl128b/zvl256b/zvl512b/zvl1024b no regression.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Refine codes.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(modulo_sel_ind
This patch fixes following FAILs in zvl1024b of both RV32/RV64:
FAIL: gcc.c-torture/execute/990128-1.c -O2 execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fuse-link
This patch fixes following FAILs in zvl1024b of both RV32/RV64:
FAIL: gcc.c-torture/execute/990128-1.c -O2 execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fuse-li
Add wrapper for vec_extract since my following patch will need to call it.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (emit_vec_extract): New function.
* config/riscv/riscv-v.cc (emit_vec_extract): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_move): Refine codes.
---
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.
v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })
Before this patch, we are using genriec approach (vrgather):
vid
vadd.vx
vrgather
vmsgeu
vrgather
With this patch, we use vec_extrac
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.
v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })
Before this patch, we are using genriec approach (vrgather):
vid
vadd.vx
vrgather
vmsgeu
vrgather
With this patch, we use vec_extrac
701 - 800 of 1101 matches
Mail list logo