This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret
Trunk GCC:
vsetvli a5,zero,e8,mf2,ta,ma
li a4,-32768
vid.v v1
Update in v2: Add dynmaic lmul test.
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret
Trunk GCC:
vsetvli a5,zero,e8,mf2,ta,ma
li
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with
-march=rv64gcv in real hardware.
The root cause is incorrect cost model cause inefficient vectorization which
makes us performance drop significantly.
So this patch does:
1. Adjust vector to scalar cost by introducing v to sca
This patch fixes the following FAILs:
Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c -O0 execution test
FAIL: gcc.c-torture/execute/pr68532.c -O1 execution test
FAIL: gcc.c-torture/execut
Add more dump check to robostify the tests.
Committed.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto.
* gcc.target/r
Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin.
Update in v2: Add dynmaic lmul test.
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)
GCC 13.2.0:
lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with
-march=rv64gcv in real hardware.
The root cause is incorrect cost model cause inefficient vectorization which
makes us performance drop significantly.
So this patch does:
1. Adjust vector to scalar cost by introducing v to sca
As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404
We have ICE when we enable RVV in big-endian mode:
during RTL pass: expand
a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant,
at poly-int.h:588
0xab4c2c poly_int<2u, unsigned short>::to_constant
Recently notice there is a XPASS in RISC-V:
XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2
"vector operands from scalars"
XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not slp2 "vector operands from
scalars"
And checked both ARM SVE and RVV:
https://godbolt.org/
Notice there is a regression recently:
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "optimized: basic block" 2
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized:
basic block" 2
Checked on both ARM SVE an RVV:
https://godbo
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-43.c: Add vect128.
---
gcc/testsuite/gcc.dg/vect/bb-slp-43.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
b/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
index dad2d24262d..8aedb06bf72 100
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8
V3: Rebase to trunk and commit it.
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible
check
for conflict vsetvl fusion.
Buggy assembler before this patch:
.L69:
vsetvli a5,s1,e8,mf4,ta,ma -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128.
---
gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3
While running various benchmarks, I notice we miss vi variant support for
integer comparison.
That is, we can vectorize code into vadd.vi but we can't vectorize into
vmseq.vi.
Consider this following case:
void
foo (int n, int **__restrict a)
{
int b;
int c;
int d;
for (b = 0; b < n; b+
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memo
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memo
../../gcc/config/riscv/riscv.cc: In function 'void
riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)':
../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl'
[-Werror=unused-parameter]
4879 | tree fndecl,
|
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of
demanding sew_lmul.
But my previous typo makes VSETVL PASS miss honor the risc-v v spec.
Consider this following simple case:
int foo4 (void * in, void * out)
{
vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
v = __r
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.
It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into
vmv.s.x.
Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to
have
better chances to fuse vsetvl.
Consider this followi
This patch fixes the recent regression:
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in
reg_or_sub
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.
The root cause is wasting-memory variables:
unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap
This patch adds no fusion compile option to disable phase 2 global fusion.
It can help us to analyze the compile-time and debugging.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add
optim-no-fusion option.
* config/riscv/riscv-vsetvl.cc (pa
Notice full available is computed evey round of earliest fusion which is
redundant.
Actually we only need to compute it once in phase 3.
It's NFC patch and tested no regression. Committed.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data):
Remove redu
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly.
This patch add dump information of all predecessors for LCM delete vsetvl block
for better debugging.
Tested no regression.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function.
This patch fixes the recent noticed bug in RV32 glibc.
We incorrectly deleted a vsetvl:
...
and a4,a4,a3
vmv.v.i v1,0 ---> Missed vsetvl cause illegal
instruction report.
vse8.v v1,0(a5)
The root cause the laterin in LCM is incorrect.
This patch fixes the recent noticed bug in RV32 glibc.
We incorrectly deleted a vsetvl:
...
and a4,a4,a3
vmv.v.i v1,0 ---> Missed vsetvl cause illegal
instruction report.
vse8.v v1,0(a5)
The root cause the laterin in LCM is incorrect.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
Refine some codes.
(pre_vsetvl::emit_vsetvl): Ditto.
---
gcc/config/riscv/riscv-vsetvl.cc | 69 +---
1 file changed, 27 insertions(+), 42 deletions(-)
diff --git a
The compile time issue was discovered in SPEC 2017 wrf:
Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf
compilation .
Before this patch (Lazy vsetvl):
scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%)
13M ( 1%)
machine dep reorg
Due to recent middle-end loop vectorizer changes, these tests have regression
and
the changes are reasonable. Adapt test to fix the regression.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/shift-rv
This patch targets GCC-15.
Consider this following case:
unsigned int
single_loop_with_if_condition (unsigned int *restrict a, unsigned int *restrict
b,
unsigned int *restrict c, unsigned int loop_size)
{
unsigned int result = 0;
for (unsigned int i = 0; i < lo
Hi, before this patch, a simple conversion case for RVV codegen:
foo:
ble a2,zero,.L8
addiw a5,a2,-1
li a4,6
bleua5,a4,.L6
srliw a3,a2,3
sllia3,a3,3
add a3,a3,a0
mv a5,a0
mv a4,a1
vse
This patch fixes PR11153:
ble a1,zero,.L8
addiw a5,a1,-1
li a4,4
addisp,sp,-16
mv a2,a0
sext.w a3,a1
bleua5,a4,.L9
srliw a4,a3,2
sllia4,a4,4
mv a5,a0
add a4,a4,a0
After recent RVV cost model tweak, I found this PR issue has been fixed.
Add testcase and committed.
PR target/112387
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test.
---
.../vect/costmodel/riscv/rvv/pr112387.c | 19 +++
1
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization doesn't
do that.
To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen
Since middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640595.html
will change vectorization code.
Adapt tests for ths patch.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Adapt test.
---
gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr11298
Notice current generic vector cost model make PR112387 failed to vectorize.
Adapt it same as ARM SVE generic vector cost model which can fix it.
Committed as it is obvious fix.
PR target/112387
gcc/ChangeLog:
* config/riscv/riscv.cc: Adapt generic cost model same ARM SVE.
gcc/
This patch fixes the following FAILs in "full coverage" testing:
Running target
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution
Due to recent middle-end cost model changes, now we can do more VLA SLP.
Fix these following regressions:
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand
XPASS: gcc.target/riscv/rvv/autovec/parti
This patch fixes the following FAILs in "full coverage" testing:
Running target
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution
This patch fixes 12 ICEs of "full coverage" testing:
Running target
riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/torture/pr96513.c -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftr
After recent fixes, almost all real FAILs on RV64 full coverage testing are
fixed.
So, it's reasonable to start test RV32 vect testing now.
We will enable full coverage testing RV32 soon and to see what else need to be
fixed.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Enable
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add RV32.
---
gcc/testsuite/lib/target-supports.exp | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index bd38d72562d..370df10978d
For 'wv' instructions, e.g. vwadd.wv vd,vs2,vs1.
vs2 has same EEW as vd.
vs1 has smaller than vd.
So, vs2 can overlap with vd, but vs1 can only overlap highest-number of vd
when LMUL of vs1 is greater than 1.
We already have supported overlap for vs1 LMUL >= 1.
But I forget vs1 LMUL < 1, vs2 can
For 'wv' instructions, e.g. vwadd.wv vd,vs2,vs1.
vs2 has same EEW as vd.
vs1 has smaller than vd.
So, vs2 can overlap with vd, but vs1 can only overlap highest-number of vd
when LMUL of vs1 is greater than 1.
We already have supported overlap for vs1 LMUL >= 1.
But I forget vs1 LMUL < 1, vs2 can
Since
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e7abd09621a4401d44f4513adf126bce4b4828b
we only allow VLSmodes with size <= TARGET_MIN_VLEN * TARGET_MAX_LMUL.
So when -march=rv64gcv default LMUL = 1, we don't have VLS modes of
256/512/1024 vectors.
Disable them in vect test which fixes the
Fix this FAIL:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c scan-tree-dump-times
vect "Maximum lmul = 2" 1
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c | 2 +-
1
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Use builder.inner_mode
().
---
gcc/config/riscv/riscv-v.cc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index d1eb7a0a9a5..486f5deb296 1
Hi, this patch fixes these following regression FAILs on RVV:
XPASS: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects scan-tree-dump-not slp2
"vector operands from scalars"
XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not sl
Due to recent VLSmode changes (Change for fixing ICE and run-time FAIL).
The dump check is same as ARM SVE now. So adapt test for RISC-V.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-cond-1.c: Adapt for RISC-V.
---
gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++--
1 file changed, 2
This patch fixes bugs in the fusion of this following case:
li a5,-1
vmv.s.x v0,a5 -> demand any non-zero AVL
vsetvli a5, ...
Incorrect fusion after VSETVL PASS:
li a5,-1
vsetvli a5...
vmv.s.x v0, a5 --> a5 is modified as incorrect value.
We disallow this incorrect fusion above.
Full coverage
While trying to fix bugs of PR113097, notice this following situation we
generate redundant vsetvli
_255 = SELECT_VL (3, POLY_INT_CST [4, 4]);
COND_LEN (..., _255)
Before this patch:
vsetivli a5, 3...
...
vadd.vv (use a5)
After this patch:
...
vadd.vv (use AVL = 3)
The reason we can do this i
This patch fixes following ICE on full coverage testing of RV32.
Running target
riscv-sim/-march=rv32gc_zve32f/-mabi=ilp32d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic
FAIL: gcc.c-torture/compile/930120-1.c -O2 (internal compiler error: in
emit_move_insn, at expr.cc:4606)
FAIL: gcc.c-t
When working on evaluating x264 performance, I notice the best LMUL for such
case with -march=rv64gcv is LMUL = 2
LMUL = 1:
x264_pixel_8x8:
add a4,a1,a2
addia6,a0,16
vsetivlizero,4,e8,mf4,ta,ma
add a5,a4,a2
vle8.v v12,0(a6)
vle
Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L
Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Add one more ASM check.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
b/gcc/testsuite/
Currently, we compute RVV V_REGS liveness during better_main_loop_than_p which
is not appropriate
time to do that since we for example, when have the codes will finally pick
LMUL = 8 vectorization
factor, we compute liveness for LMUL = 8 multiple times which are redundant.
Since we have leverage
Tweak some codes of dynamic LMUL cost model to make computation more
predictable and accurate.
Tested on both RV32 and RV64 no regression.
Committed.
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak
LMUL estimation.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo.
---
.../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.
Notice we have this following situation:
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,f
Notice we have this following situation:
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,f
Consider this following case:
int f[12][100];
void bad1(int v1, int v2)
{
for (int r = 0; r < 100; r += 4)
{
int i = r + 1;
f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]);
f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]);
f[0][r+2] = f[1][r+2] * (f[2][r+2]) -
Notice current dynamic LMUL is not accurate for conversion codes.
Refine for it, there is current case is changed from choosing LMUL = 4 into
LMUL = 8.
Tested no regression, committed.
Before this patch (LMUL = 4): After this patch (LMUL = 8):
lw a7,56(sp)
This patch fixes the following choosing unexpected big LMUL which cause
register spillings.
Before this patch, choosing LMUL = 4:
addisp,sp,-160
addiw t1,a2,-1
li a5,7
bleut1,a5,.L16
vsetivlizero,8,e64,m4,ta,ma
vmv.v.x v4,a0
The redudant dump check is fragile and easily changed, not necessary.
Tested on both RV32/RV64 no regression.
Remove it and committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr
Committed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc: Move STMT_VINFO_TYPE (...) to
local.
---
gcc/config/riscv/riscv-vector-costs.cc | 9 -
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/gcc/config/riscv/riscv-vector-costs.cc
b/gcc/config/riscv/riscv-
This patch fixes the following situation:
vl4re16.v v12,0(a5)
...
vl4re16.v v16,0(a3)
vs4r.v v12,0(a5)
...
vl4re16.v v4,0(a0)
vs4r.v v16,0(a3)
...
vsetvli a3,zero,e16,m4,ta,ma
...
vmv.v.x v8,t6
vmsgeu.vv v2,v16,v8
vsub.vv v16,v16,v8
vs4r.v v16,0(a5)
...
vs4r.v v4,0(a0)
v
In
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d1eacedc6d9ba9f5522f2c8d49ccfdf7939ad72d
I optimize COND_LEN_xxx pattern with dummy len and dummy mask with too simply
solution which
causes redundant vsetvli in the following case:
vsetvli a5,a2,e8,m1,ta,ma
vle32.v v8,0(a0)
As PR113206, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --> a4 is pollu
As PR113206 and PR113209, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --
As PR113206 and PR113209, the bugs happens on the following situation:
li a4,32
...
vsetvli zero,a4,e8,m8,ta,ma
...
slliw a4,a3,24
sraiw a4,a4,24
bge a3,a1,.L8
sb a4,%lo(e)(a0)
vsetvli zero,a4,e8,m8,ta,ma --
Fix indent of some codes to make them 8 spaces align.
Committed.
gcc/ChangeLog:
* config/riscv/vector.md: Fix indent.
---
gcc/config/riscv/vector.md | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
in
Notice a case has "Maximum lmul = 16" which is incorrect.
Correct LMUL estimation for MASK_LEN_LOAD/MASK_LEN_STORE.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (variable_vectorized_p): New
function.
(compute_nregs_for_mode): Refine LMUL.
(max_number_of
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
Consider this following case:
void
f (int *restrict a, int *restrict b, int *restrict c, int *restrict d, int n)
{
for (int i = 0; i < n; i++)
{
int tmp = b[i] + 15;
int tmp2 = tmp + b[i];
c[i] = tmp2 + b[i];
d[i] = tmp + tmp2 + b[i];
}
}
Current dynamic LMUL cos
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift
with vector shift amount,
that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
shift with scalar shift amount,
that is
While working on fixing a bug, I notice this following code has redundant move:
#include "riscv_vector.h"
void
f (float x, float y, void *out)
{
float f[4] = { x, x, x, y };
vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4);
__riscv_vse32_v_f32m1 (out, v, 4);
}
Before this patch:
f:
vs
V2: Address comments from Robin.
While working on fixing a bug, I notice this following code has redundant move:
#include "riscv_vector.h"
void
f (float x, float y, void *out)
{
float f[4] = { x, x, x, y };
vfloat32m1_t v = __riscv_vle32_v_f32m1 (f, 4);
__riscv_vse32_v_f32m1 (out, v, 4);
}
1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize shift
with vector shift amount,
that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
shift with scalar shift amount,
that is
This patch fixes a bug of VSETVL PASS in this following situation:
Ignore curr info since prev info available with it:
prev_info: VALID (insn 8, bb 2)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=16, VLMUL=mf4, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, M
Obvious fix, Committed.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: replace std::max by MAX.
---
gcc/config/riscv/riscv-vsetvl.cc | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 7d748edc
As Robin suggested, remove gimple_uid check which is sufficient for our need.
Tested on both RV32/RV64 no regression, ok for trunk ?
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop
invariant check.
---
gcc/config/riscv/riscv-vector-costs.cc | 2 +-
We have supported segment load/store intrinsics.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-functions.def (vleff): Move
comments to real place.
(vcreate): Ditto.
---
gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +---
1 file chan
We have supported segment load/store intrinsics.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-functions.def (vleff): Move
comments.
(vundefined): Ditto.
---
gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++--
1 file changed, 2 inse
While working on overlap for widening instructions, I realize that we set
vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.
Since according to RVV ISA:
"The destination EEW equals the source EEW."
For both vwadd.wx/vfwadd.wf source vector and dest vector operand are same EEW.
So, they should
While working on overlap for widening instructions, I realize that we set
vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.
Since according to RVV ISA:
"The destination EEW equals the source EEW."
vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW) while
vwadd.wx only w
This patch fix 2 regression (one is bug regression, the other is performance
regression).
Those 2 regressions are both we are comparing ratio for same AVL in wrong place.
1. BUG regression:
avl_single-84.c:
f0:
li a5,999424
add a1,a1,a5
li a4,299008
This patch leverages the same approach as vwcvt.
Before this patch:
.L5:
add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e32,m4,ta,ma
vle32.v v16,0(s1)
vle32.v v12,0(a3)
mv s1,s2
vle32.v v8,0(a4)
vle32
Leverage previous approach.
Before this patch:
.L5:
add a3,s0,s2
add a4,s6,s2
add a5,s7,s2
vsetvli zero,s0,e64,m8,ta,ma
vle8.v v4,0(s2)
vle8.v v3,0(a3)
mv s2,s1
vle8.v v2,0(a4)
vle8.v v1,0(a5)
nop
Background:
RVV ISA vx instructions for example vadd.vx,
When EEW = 64 and RV32. We can't directly use vadd.vx.
Instead, we need to use:
sw
sw
vlse
vadd.vv
However, we have some special situation that we still can directly use
vadd.vx directly for EEW=64 && RV32.
that is, when scalar is a known
Committed as it is obvious fix.
gcc/ChangeLog:
* config/riscv/riscv.md: Rostify the constraints.
---
gcc/config/riscv/riscv.md | 19 +--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4c6f63677df
In serious high register pressure case (appended in this patch):
We see vluxei8.v v0,(s1),v1,v0.t which is not allowed.
Since according to RVV ISA:
+;; The destination vector register group for a masked vector instruction
cannot overlap the source mask register (v0),
+;; unless the destina
Since the destination of reduction is not a vector register group, there
is no need to apply overlap constraint.
Also confirm Clang:
The mir in LLVM has early clobber:
early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed %17:vr,
%48:gpr, %0:gprnox0, 3, 0; example.c:59:24
The mi
Consider this example:
#include "riscv_vector.h"
void
foo6 (void *in, void *out)
{
vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4);
vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1);
vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64);
vint32m4_t high
Consider this example:
#include "riscv_vector.h"
void
foo6 (void *in, void *out)
{
vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4);
vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1);
vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64);
vint32m4_t high
In serious high register pressure case (appended in this patch):
We see vluxei8.v v0,(s1),v1,v0.t which is not allowed.
Since according to RVV ISA:
+;; The destination vector register group for a masked vector instruction
cannot overlap the source mask register (v0),
+;; unless the destina
This patch fixes ICE exposed on full coverage testing:
=== g++: Unexpected fails for
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic ===
FAIL: g++.dg/pr106219.C -std=gnu++14 (internal compiler error: in require, at
machmode.h:313)
FAIL: g++
501 - 600 of 1101 matches
Mail list logo