Re: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5
Thanks for reporting it. I think we may need to change it into: + /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target {! vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_strided5 && vect_load_lanes } } } */ Could you verify it whether it work for you ? Thanks. juzhe.zh...@rivai.ai From: Andrew Stubbs Date: 2023-10-06 22:29 To: Juzhe-Zhong; gcc-patches@gcc.gnu.org CC: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com Subject: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5 On 15/09/2023 10:16, Juzhe-Zhong wrote: > This test failed in RISC-V: > FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect > "vectorizing stmts using SLP" 4 > FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using > SLP" 4 > > Because this loop: >/* SLP with unrolling by 8. */ >for (i = 0; i < N; i++) > { >out[i*5] = 8; >out[i*5 + 1] = 7; >out[i*5 + 2] = 81; >out[i*5 + 3] = 28; >out[i*5 + 4] = 18; > } > > is using vect_load_lanes with array size = 5. > instead of SLP. > > When we adjust the COST of LANES load store, then it will use SLP. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/slp-1.c: Add vect_stried5. > > --- > gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c > b/gcc/testsuite/gcc.dg/vect/slp-1.c > index 82e4f6469fb..d4a13f12df6 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-1.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c > @@ -122,5 +122,5 @@ int main (void) > } > > /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" > } } */ > - > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" > { target {! vect_strided5 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" > { target vect_strided5 } } } */ This patch causes a test regression on amdgcn because vect_strided5 is true (because check_effective_target_vect_fully_masked is true), but the testcase still gives the message 4 times. Perhaps because amdgcn uses masking and not vect_load_lanes? Andrew
Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
OK. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-10-07 11:18 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case From: xuli gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler times. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto. --- .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +- .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c index c566f8a4751..2ec9487a6c6 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c @@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c index d0e75258188..bcafce36895 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c @@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times
Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c new file mode 100644 index 000..c542278c1f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include
Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
These testcases cause multiple FAILs: I think you should /* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */ juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c new file mode 100644 index 000..c542278c1f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run {
Re: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
Also I have reverted your commit: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=066a43ce72ab6559ba14af9628df19daa0b85cdf Plz test the patch and verify it doesn't cause any FAILs if the toolchain doesn't have "zvfh_zfh". juzhe.zh...@rivai.ai From: juzhe.zh...@rivai.ai Date: 2023-10-07 17:49 To: pan2.li; gcc-patches CC: pan2.li; yanzhang.wang; kito.cheng Subject: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec These testcases cause multiple FAILs: I think you should /* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */ juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __bu
Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
Hi, Jeff. Address your comments and fix on V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632239.html I think it look reasonable good for a long term maintenance now. Ok for trunk ? juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-10-07 23:09 To: Juzhe-Zhong; gcc-patches CC: rguenther; rdapp.gcc Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV On 10/7/23 05:45, Juzhe-Zhong wrote: > This patch fixes the following dumple FAILs: > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > vect " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_ADD" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_MUL" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_RDIV" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_SUB" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_ADD" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_MUL" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_RDIV" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_SUB" 1 > > For RVV, the expected dumple IR is COND_LEN_* pattern. > > Also, we are still failing at this check: > > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = > \\.COND_LEN_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_LEN_SUB" > > Since we have a known bug in GIMPLE_FOLD that Robin is working on it. > > @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug > fix patch. > > Ok for trunk ? > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. > * gcc.dg/vect/vect-cond-arith-4.c: Ditto. > * gcc.dg/vect/vect-cond-arith-5.c: Ditto. > * gcc.dg/vect/vect-cond-arith-6.c: Ditto. Would it make more sense to adjust the regexp so that it matched the standard form as well as the LEN form? So for example we could have a regexp that matched COND_ADD and COND_LEN_ADD. Just wondering if that'll be better from a long term maintenance standpoint. Jeff
Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
Hi, Richi and Robin. Turns out COND(_LEN)?_ADD can't work. Is this patch Ok ? Or do you have another solution to change the dump check for RVV? Thanks. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-08 09:33 To: gcc-patches CC: rguenther; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_SUB" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_SUB" 1 For RVV, the expected dumple IR is COND_LEN_* pattern. Also, we are still failing at this check: FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_LEN_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_LEN_SUB" Since we have a known bug in GIMPLE_FOLD that Robin is working on it. @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug fix patch. Ok for trunk ? gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. * gcc.dg/vect/vect-cond-arith-4.c: Ditto. * gcc.dg/vect/vect-cond-arith-5.c: Ditto. * gcc.dg/vect/vect-cond-arith-6.c: Ditto. --- gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++-- gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c index 38994ea82a5..3832a660023 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c @@ -41,5 +41,5 @@ neg_xi (double *x) return res_3; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { vect_double_cond_arith && vect_fully_masked } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump { = \
Re: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV
Yes. We do have && enable char -> long conversion (vsext.vf8/vzext.vf8) Thanks for the comment, I will adapt test as you suggested. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 15:31 To: Jeff Law CC: Juzhe-Zhong; gcc-patches; richard.sandiford Subject: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV On Sun, 8 Oct 2023, Jeff Law wrote: > > > On 10/8/23 05:35, Juzhe-Zhong wrote: > > RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this > > case well. > > So, adjust dump check for RVV. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV. > I'd hoped to avoid a bunch of risc-v special casing in the generic part of the > testsuite. Basically the more we have target specific conditionals rather > than conditionals using properties, the more likely we are to keep revisiting > this stuff over time and possibly for other architectures as well. > > What is it about risc-v's vector support that allows it to optimize this case? > Is it the same property that allows us to handle the outer loop vectorization > tests that you changed in another patch? I suspect for VLA vectorization we can use direct conversion from char to long long here? I also notice the testcase uses 'char', not specifying its sign. So either of [sz]extVxyzDIVxyzQI is possibly provided by RISCV? (or possibly via some intermediate types in a multi-step conversion) For non-VLA and with the single vector size restriction we'd need unpacking. So it might be better { target { vect_unpack || { vect_vla && vect_sext_char_longlong } } } where I think neither vect_vla nor vect_sext_char_longlong exists. Richard - didn't you run into similar things with SVE? Richard. > Neither an ACK nor NAK right now. > > Jeff > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes
>> But you gobble the "or .." into an existing -mstrict-align flag - are >> you sure all implementations are >> self-consistent with handling non-vector memory instructions and >> vector memory instructions here? >> At least the above wording doesn't seem to impose such requirement. RVV ISA: "Support for misaligned vector memory accesses is independent of an implementation’s support for misaligned scalar memory accesses." Support misalign vector memory access is independent on scalar memory access. I think this patch (using -mno-strict-align) is not appropriate, which means I need additional compile option. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 16:01 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes On Sun, Oct 8, 2023 at 9:22 AM Juzhe-Zhong wrote: > > Previously, I removed the movmisalign pattern to fix the execution FAILs in > this commit: > https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 > > I was thinking that RVV doesn't allow misaligned at the beginning so I > removed that pattern. > However, after deep investigation && reading RVV ISA again and experiment on > SPIKE, > I realized I was wrong. > > RVV ISA reference: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints > > "If an element accessed by a vector memory instruction is not naturally > aligned to the size of the element, > either the element is transferred successfully or an address misaligned > exception is raised on that element." But you gobble the "or .." into an existing -mstrict-align flag - are you sure all implementations are self-consistent with handling non-vector memory instructions and vector memory instructions here? At least the above wording doesn't seem to impose such requirement. > It's obvious that RVV ISA does allow misaligned vector load/store. > > And experiment and confirm on SPIKE: > > [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike > --isa=rv64gcv --varch=vlen:128,elen:64 > ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 > a.out > bbl loader > z ra 00010158 sp 003ffb40 gp > 00012c48 > tp t0 000110da t1 000f t2 > > s0 00013460 s1 a0 00012ef5 a1 > 00012018 > a2 00012a71 a3 000d a4 0004 a5 > 00012a71 > a6 00012a71 a7 00012018 s2 s3 > > s4 s5 s6 s7 > > s8 s9 sA sB > > t3 t4 t5 t6 > > pc 00010258 va/inst 020660a7 sr 80026620 > Store/AMO access fault! > > [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike > --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 > ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 > a.out > bbl loader > > We can see SPIKE can pass previous *FAILED* execution tests with specifying > --misaligned to SPIKE. > > So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the > investigations I have done since > it can improve multiple vectorization tests and fix dumple FAILs. > > This patch fixes these following dump FAILs: > > FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects > scan-tree-dump-not optimi
Re: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV
Thanks Richi. I will try to figure out a better way to adapt the tests without adding riscv* specific targets variant. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 16:17 To: Juzhe-Zhong CC: gcc-patches; jeffreyalaw Subject: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV On Sun, 8 Oct 2023, Juzhe-Zhong wrote: > Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop > vectorizations. How so? I think this maybe goes with the other similar change. That is, when we already have specific target checks adding riscv-*-* looks sensible but when we don't we should figure if there's a capability we can (add and) test instead. > Fix these following XPASS FAILs: > > XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV. > * gcc.dg/vect/no-scevccp-outer-17.c: Ditto. > * gcc.dg/vect/no-scevccp-outer-19.c: Ditto. > * gcc.dg/vect/no-scevccp-outer-21.c: Ditto. > > --- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +- > 4 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > index c7c2fa8a504..12179949e00 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > @@ -59,4 +59,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > index ba904a6c03e..86554a98169 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > @@ -65,4 +65,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > index 5cd4049d08c..624b54accf4 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > @@ -49,4 +49,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > index 72e53c2bfb0..b30a5d78819 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > @@ -59,4 +59,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! { vect_pack_trunc } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen
Remove these functions: +static void +emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx sll_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sll_ops); +} + +static void +emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx srl_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, srl_ops); +} + +static void +emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx or_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred (IOR, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, or_ops); +} + Instead, For sll, you should use : rtx tmp = expand_binop (Pmode, ashl_optab, op_1, gen_int_mode (8, Pmode), NULL_RTX, 0, OPTAB_DIRECT); For srl, you should use: rtx tmp = expand_binop (Pmode, lshiftrt_optab, op_1, gen_int_mode (8, Pmode), NULL_RTX, 0, OPTAB_DIRECT); For or, you should use: expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0, OPTAB_DIRECT); juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-09 16:51 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen From: Pan Li This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 sllia2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addia5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addia4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vec_sll_scalar): New help func impl for emit vsll.vi/vsll.vx (emit_vec_srl_scalar): Likewise for vsrl.vi/vsrl.vx. (emit_vec_or): Likewise for vor.vv. (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 117 ++ .../riscv/rvv/autovec/unop/bswap16-0.c| 17 +++ .../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 +++ .../riscv/rvv/autovec/vls/bswap16-0.c | 34 + .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 4 +- 5 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 23633a2a74d..3e3b5f2e797 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -878,6 +878,33 @@ emit_vlmax_decompress_insn (rtx target, rtx op0, rtx op1, rtx mask) emit_vlmax_masked_gather_mu_insn (target, op1, sel, mask); } +static void +emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx sll_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sll_ops); +} + +static void +emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx srl_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, srl_ops); +} + +static void +emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx or_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred (IOR, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, or_ops); +} + /* Emit merge instruction. */ static machine_mode @@ -3030,6 +3057,94 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_bswap_pattern (struct expand_vec_perm_d *d) +{ + HOST_WIDE_INT diff; + unsigned i, size, step; + + if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) +return false; + + step = diff + 1; + size = step * GET_MODE_UNIT_BITSIZE
Re: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
Thanks Robin. Could you send V3 to Richi ? And commit it if Richi is ok with that. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-09 18:26 To: Andreas Schwab; juzhe.zhong CC: rdapp.gcc; gcc-patches; rguenther; jeffreyalaw Subject: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV On 10/9/23 09:32, Andreas Schwab wrote: > On Okt 09 2023, juzhe.zh...@rivai.ai wrote: > >> Turns out COND(_LEN)?_ADD can't work. > > It should work though. Tcl regexps are a superset of POSIX EREs. > The problem is that COND(_LEN)?_ADD matches two times against COND_LEN_ADD and a scan-tree-dump-times 1 will fail. So for those checks in vect-cond-arith-6.c we either need to switch to scan-tree-dump or change the pattern to "\.(?:COND|COND_LEN)_ADD". Juzhe, something like the attached works for me. Regards Robin diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c index 1af0fe642a0..7d26dbedc5e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c @@ -52,8 +52,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target vect_double_cond_arith } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c index ec3d9db4202..f7daa13685c 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c @@ -54,8 +54,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c index 2aeebd44f83..a80c30a50b2 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c @@ -56,8 +56,8 @@ main (void) } /* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target vect_double_cond_arith } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */
Re: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV
>> OK. Thanks. Committed. >> Note load/store-lanes is specifically pre-empting SLP if all >> loads/stores of a SLP intance can support that. Not sure if this >> heuristic is good for load/store lanes with high stride? Yeah, I understand your concern. Em, I am sure too. But RVV ISA define lanes load/store from 2 to 8 and LLVM already supported. I think we can fully support them, then let RISC-V COST model decide it whether it is profitable or not. Also, I found RVV can vectorize a TSVC case with stride = 5 lane_load/lane_store: tsvc-s353.c: -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v } } } } */ https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632213.html So, I think overall it is beneficial we support high stride lane load/store which can help us vectorize more cases. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 20:41 To: Juzhe-Zhong CC: gcc-patches; jeffreyalaw Subject: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV On Mon, 9 Oct 2023, Juzhe-Zhong wrote: > Reference: https://godbolt.org/z/G9jzf5Grh > > RVV is able to vectorize this case using SLP. However, with > -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6. OK. Note load/store-lanes is specifically pre-empting SLP if all loads/stores of a SLP intance can support that. Not sure if this heuristic is good for load/store lanes with high stride? > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6. > > --- > gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > index 7c7acd5bab6..96751faae7f 100644 > --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > @@ -18,4 +18,4 @@ foo (void) > } > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { ! vect_strided6 } } } } */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen
LGTM now. Thanks. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-09 21:09 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen From: Pan Li Update in v2 * Remove emit helper functions. * Take expand_binop instead. Original log: This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 sllia2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addia5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addia4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add handling for shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 91 +++ .../riscv/rvv/autovec/unop/bswap16-0.c| 17 .../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 + .../riscv/rvv/autovec/vls/bswap16-0.c | 34 +++ .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 4 +- 5 files changed, 188 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 23633a2a74d..c72e411f125 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3030,6 +3030,95 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_bswap_pattern (struct expand_vec_perm_d *d) +{ + HOST_WIDE_INT diff; + unsigned i, size, step; + + if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) +return false; + + step = diff + 1; + size = step * GET_MODE_UNIT_BITSIZE (d->vmode); + + switch (size) +{ +case 16: + break; +case 32: +case 64: + /* We will have VEC_PERM_EXPR after rtl expand when invoking + __builtin_bswap. It will generate about 9 instructions in + loop as below, no matter it is bswap16, bswap32 or bswap64. +.L2: + 1 vle16.v v4,0(a0) + 2 vmv.v.x v2,a7 + 3 vand.vv v2,v6,v2 + 4 sllia2,a5,1 + 5 vrgatherei16.vv v1,v4,v2 + 6 sub a4,a4,a5 + 7 vse16.v v1,0(a3) + 8 add a0,a0,a2 + 9 add a3,a3,a2 +bne a4,zero,.L2 + + But for bswap16 we may have a even simple code gen, which + has only 7 instructions in loop as below. +.L5 + 1 vle8.v v2,0(a5) + 2 addia5,a5,32 + 3 vsrl.vi v4,v2,8 + 4 vsll.vi v2,v2,8 + 5 vor.vv v4,v4,v2 + 6 vse8.v v4,0(a4) + 7 addia4,a4,32 +bne a5,a6,.L5 + + Unfortunately, the instructions in loop will grow to 13 and 24 + for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn) + for both the bswap64 and bswap32, but take shift and or (7 insn) + for bswap16. + */ +default: + return false; +} + + for (i = 0; i < step; i++) +if (!d->perm.series_p (i, step, diff - i, step)) + return false; + + if (d->testing_p) +return true; + + machine_mode vhi_mode; + poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2); + + if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode)) +return false; + + /* Step-1: Move op0 to src with VHI mode. */ + rtx src = gen_reg_rtx (vhi_mode); + emit_move_insn (src, gen_lowpart (vhi_mode, d->op0)); + + /* Step-2: Shift right 8 bits to dest. */ + rtx dest = expand_binop (vhi_mode, lshr_optab, src, gen_int_mode (8, Pmode), +NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-3: Shift left 8 bits to src. */ + src = expand_binop (vhi_mode, ashl_optab, src, gen_int_mode (8, Pmode), + NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-4: Logic Or dest and src to dest. */ + dest = expand_binop (vhi_mode, ior_optab, dest, src, +NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-5: Move src to target with VQI mode. */ + emit_move_insn (d->target, gen_lowpart
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
Oh. I realize this patch increase FAIL that I recently fixed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer will failed at first time but succeed at 2nd time), then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2 (ARM SVE 2 times because they have vec_pack_trunc_optab). I think the root cause of RVV failing at multiple tests of "vect" is that we don't enable vec_pack/vec_unpack/... stuff, we still succeed at vectorizations and we want to enable tests of them (Mostly just using different approach to vectorize it (cause dump FAIL) because of some changing I have done previously in the middle-end). So enabling "vec_pack" for RVV will fix some FAILs but increase some other FAILs. CC to Richi to see more reasonable suggestions. juzhe.zh...@rivai.ai 发件人: Maciej W. Rozycki 发送时间: 2023-10-10 06:38 收件人: 钟居哲 抄送: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng 主题: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Tue, 10 Oct 2023, 钟居哲 wrote: > Btw, could you rebase to the trunk and run regression again? Full regression-testing takes roughly 40 hours here and I do not normally update the tree midway through my work so as not to add variables and end up chasing a moving target, especially with such an unstable state that we have ended up with recently with the RISC-V port. Since I'm done with this part I can refresh and schedule another run if you are curious as to how it looks like from my side. For the C subset alone it'll take less. Maciej
Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV
Great ! I am gonna wait for Richi's approval. juzhe.zh...@rivai.ai From: Andrew Stubbs Date: 2023-10-10 17:40 To: Juzhe-Zhong; gcc-patches@gcc.gnu.org CC: rguent...@suse.de; jeffreya...@gmail.com Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV On 10/10/2023 02:39, Juzhe-Zhong wrote: > Here is the reference comparing dump IR between ARM SVE and RVV. > > https://godbolt.org/z/zqess8Gss > > We can see RVV has one more dump IR: > optimized: basic block part vectorized using 128 byte vectors > since RVV has 1024 bit vectors. > > The codegen is reasonable good. > > However, I saw GCN also has 1024 bit vector. > This patch may cause this case FAIL in GCN port ? > > Hi, GCN folk, could you check this patch in GCN port for me ? This patch *fixes* an existing test fail on GCN. :) It's probably one of the many I've never had time to analyze (and optimizing more than expected makes it low priority). LGTM Andrew
Re: [PATCH v2 0/4] RISC-V target attribute
LGTM on my side. IMHO, we need to support attribute (rvv_vector_bits) which depend on this patch, am I right? If yes, will you support this feature in GCC-14 release? juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-10-10 12:13 To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong Subject: [PATCH v2 0/4] RISC-V target attribute This patch set implement target attribute for RISC-V target, which is similar to other target like x86 or ARM, let user able to set some local setting per function without changing global settings. We support arch, tune and cpu first, and we will support other target attribute later, this version DOES NOT include multi-version function support yet, that is future work, probably work for GCC 15. The full proposal is put in RISC-V C-API document[1], which has discussed with RISC-V LLVM community, so we have consistent syntax and semantics. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35 v2 changelog: - Resolve awk multi-dimensional issue. - Tweak code format - Tweak testcases
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
It's weird. Could you give me the FAILs report? juzhe.zh...@rivai.ai From: Maciej W. Rozycki Date: 2023-10-10 18:18 To: 钟居哲 CC: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng Subject: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Mon, 9 Oct 2023, Maciej W. Rozycki wrote: > > Btw, could you rebase to the trunk and run regression again? > > Full regression-testing takes roughly 40 hours here and I do not normally > update the tree midway through my work so as not to add variables and end > up chasing a moving target, especially with such an unstable state that we > have ended up with recently with the RISC-V port. Since I'm done with > this part I can refresh and schedule another run if you are curious as to > how it looks like from my side. For the C subset alone it'll take less. After 10 hours I have now got: === gcc Summary === # of expected passes 194576 # of unexpected failures 600 # of unexpected successes 11 # of expected failures 1631 # of unresolved testcases 120 # of unsupported tests 3828 as at commit cc5033721553 ("Fixes for profile count/probability maintenance"), which is slightly better, but still far from your 92 FAILs. NB I ran this testing with `--param=riscv-autovec-preference=scalable'; I guess I could have mentioned it. Maciej
Re: Re: [PATCH] RISC-V: Enable full coverage vect tests
Thanks. Committed. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-11 14:54 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Enable full coverage vect tests Hi Juzhe, seems OK to me. We don't support most of the patterns directly but as we can and want to vectorize them it makes sens to enable the tests. Regards Robin
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
Hi, Maciej. I have enable all vectorization test on RVV which is committed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632598.html But I have added every test with: +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) As you said, you think we don't need to add check_effective_target_riscv_v every time. So, feel free to adjust it (remove check_effective_target_riscv_v) and send a patch. But I hope you can adjust each set of tests carefully to make every thing consistent. Thanks. juzhe.zh...@rivai.ai From: Maciej W. Rozycki Date: 2023-10-11 05:35 To: juzhe.zhong CC: gcc-patches; jeffreyalaw; Robin Dapp; Kito.cheng Subject: Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Tue, 10 Oct 2023, juzhe.zh...@rivai.ai wrote: > It's weird. Could you give me the FAILs report? I keep forgetting that I have a piece of code in my board description files that makes the testsuite leave output files in place, which helps much when debugging failures (although it's not a perfect solution for test cases like those verified at different optimisation levels where the output filename is reused and consequently subsequent outputs overwrite earlier ones; something to improve perhaps). Unfortunately the presence of output files confuses some test cases and makes them fail; arguably a test case bug. None of the offending test cases are directly related to RISC-V development, so I just ignore the presence of these failures and only focus on regressions and progressions between testsuite runs. Here are fresh results with the testsuite output tree made tidy: === gcc Summary === # of expected passes 194602 # of unexpected failures 145 # of unexpected successes 11 # of expected failures 1631 # of unresolved testcases 120 # of unsupported tests 3828 It probably makes no sense to clutter the mailing list with my FAIL and UNRESOLVED results; I can send them off-list if you find them useful. Maciej
Re: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-11 16:49 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization From: Pan Li This patch would like to support the FP lrint/lrintf auto vectorization. * long lrint (double) for rv64 * long lrintf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lrint (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lrint (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,dyn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint2): New pattern for lrint/lintf. * config/riscv/riscv-protos.h (expand_vec_lrint): New func decl for expanding lint. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl for vfcvt.x.f.v. (expand_vec_lrint): New function impl for expanding lint. * config/riscv/vector-iterators.md: New mode attr and iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for CVT like test case. * gcc.target/riscv/rvv/autovec/vls/def.h: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 11 +++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 20 ++ gcc/config/riscv/vector-iterators.md | 69 +++ .../riscv/rvv/autovec/unop/math-lrint-0.c | 14 .../riscv/rvv/autovec/unop/math-lrint-1.c | 14 .../riscv/rvv/autovec/unop/math-lrint-run-0.c | 63 + .../riscv/rvv/autovec/unop/math-lrint-run-1.c | 63 + .../riscv/rvv/autovec/unop/test-math.h| 24 +++ .../gcc.target/riscv/rvv/autovec/vls/def.h| 9 +++ .../riscv/rvv/autovec/vls/math-lrint-0.c | 30 .../riscv/rvv/autovec/vls/math-lrint-1.c | 30 12 files changed, 348 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 53e9d34eea1..dc76a01d82c 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2239,6 +2239,7 @@ (define_expand "avg3_ceil" ;; - round/roundf ;; - trunc/truncf ;; - roundeven/roundevenf +;; - lrint/lrintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2309,3 +2310,13 @@ (define_expand "roundeven2" DONE; } ) + +(define_expand "lrint2" + [(match_operand: 0 "register_operand") + (match_operand:V_VLS_FCONVERTL 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 43426a5326b..f6bd15b47b0 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -474,6 +474,7 @@ void expand_vec_rint (rtx, rtx, machine_mode, machine_mode); void expand_vec_round (rtx, rtx, machine_mode, machine_mode); void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode); void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode); +void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index c72e411f125..64f99d85d91 100644 ---
Re: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter
Refine the codes in V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632619.html juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-11 17:03 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter I suddenly I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix offset bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug. (gather_scatter_valid_offset_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test. --- gcc/config/riscv/autovec.md | 28 +-- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 16 +-- .../autovec/gather-scatter/offset_extend-1.c | 14 ++ 4 files changed, 42 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 41bff3a318f..07607bff71e 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -59,7 +59,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -74,7 +74,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -89,7 +89,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -104,7 +104,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -119,7 +119,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -134,7 +134,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -153,7 +153,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -172,7 +172,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -187,7 +187,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands,
Re: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter
Oh. Yes. Address comment: V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632623.html Use if (inner_offsize < BITS_PER_WORD) juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-11 17:50 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter Hi Juzhe, good that you noticed it now, I should have caught that in the review back then... One thing, though: > + if (inner_offsize < GET_MODE_BITSIZE (GET_MODE (ptr)).to_constant ()) Shouldn't ptr always be Pmode i.e. the bitsize == XLEN? Rest LGTM. Regards Robin
RISC-V: Support CORE-V XCVMAC and XCVALU extensions
../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX Built-in Functions' differ ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in Functions' is next for `CORE-V Built-in Functions' in menu but not in sectioning ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in sectioning ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector Intrinsics' differ ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' lacks menu item for `CORE-V Built-in Functions' despite being its Up target ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V Vector Intrinsics' differ In file included from ../../../../gcc/gcc/gensupport.cc:26:0: ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to hold all values of ‘enum rtx_code’ #define RTX_CODE_BITSIZE 8 ^ ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro ‘RTX_CODE_BITSIZE’ ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE; ^~~~ make[2]: *** [Makefile:3534: doc/gcc.info] Error 1 make[2]: *** Waiting for unfinished jobs rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod cpp.pod gcov.pod lto-dump.pod make[2]: Leaving directory '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc' make[1]: *** [Makefile:4648: all-gcc] Error 2 make[1]: Leaving directory '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1' make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2 juzhe.zh...@rivai.ai
Re: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions
Plz revert it. It blocks development of all targets. juzhe.zh...@rivai.ai From: Andrew Pinski Date: 2023-10-12 09:03 To: juzhe.zh...@rivai.ai CC: gcc-patches; jeffreyalaw; Kito.cheng; kito.cheng; Robin Dapp Subject: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions On Wed, Oct 11, 2023 at 6:01 PM juzhe.zh...@rivai.ai wrote: > > ../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector > Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX > Built-in Functions' differ > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in > Functions' is next for `CORE-V Built-in Functions' in menu but not in > sectioning > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector > Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in > sectioning > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in > Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector > Intrinsics' differ > ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' > lacks menu item for `CORE-V Built-in Functions' despite being its Up target > ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in > Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V > Vector Intrinsics' differ > In file included from ../../../../gcc/gcc/gensupport.cc:26:0: > ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to > hold all values of ‘enum rtx_code’ > #define RTX_CODE_BITSIZE 8 > ^ > ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro > ‘RTX_CODE_BITSIZE’ >ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE; > ^~~~ > > make[2]: *** [Makefile:3534: doc/gcc.info] Error 1 > make[2]: *** Waiting for unfinished jobs > rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod > cpp.pod gcov.pod lto-dump.pod > make[2]: Leaving directory > '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc' > make[1]: *** [Makefile:4648: all-gcc] Error 2 > make[1]: Leaving directory > '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1' > make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2 This is also recorded as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more than just RISCV; it depends on the version of texinfo that is installed too. Thanks, Andrew > > > juzhe.zh...@rivai.ai
Re: [PATCH v1] RISC-V: Support FP irintf auto vectorization
LGTM。 Thanks。 juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-12 09:52 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP irintf auto vectorization From: Pan Li This patch would like to support the FP irintf auto vectorization. * int irintf (float) Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on SF => SI. Given we have code like: void test_irintf (int *out, float *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_irintf (in[i]); } Before this patch: .L3: ... flw fa5,0(a1) fcvt.w.s a5,fa5,dyn sw a5,-4(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vle32.v v1,0(a1) vfcvt.x.f.v v1,v1 vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like DF => SI/HF => SI will be covered by the hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint2): Rename from. (lrint2): Rename to. * config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 9 ++- gcc/config/riscv/vector-iterators.md | 74 +-- .../riscv/rvv/autovec/unop/math-irint-0.c | 14 .../riscv/rvv/autovec/unop/math-irint-run-0.c | 63 .../riscv/rvv/autovec/vls/math-irint-0.c | 30 5 files changed, 149 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-irint-0.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index dc76a01d82c..c3a51e22ceb 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2240,6 +2240,7 @@ (define_expand "avg3_ceil" ;; - trunc/truncf ;; - roundeven/roundevenf ;; - lrint/lrintf +;; - irintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2311,12 +2312,12 @@ (define_expand "roundeven2" } ) -(define_expand "lrint2" - [(match_operand: 0 "register_operand") - (match_operand:V_VLS_FCONVERTL 1 "register_operand")] +(define_expand "lrint2" + [(match_operand:0 "register_operand") + (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")] "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" { -riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); +riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); DONE; } ) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index bb0c46ea30a..96ddd34c958 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -3281,8 +3281,8 @@ (define_mode_attr vnnconvert [ (V512DI "v512hf") ]) -;; L indicates convert to long -(define_mode_attr VLCONVERT [ +;; Convert to int, long and long long +(define_mode_attr V_I_L_LL_CONVERT [ (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI") (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI") @@ -3298,7 +3298,7 @@ (define_mode_attr VLCONVERT [ (V512DF "V512DI") ]) -(define_mode_attr vlconvert [ +(define_mode_attr v_i_l_ll_convert [ (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si") (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si") @@ -3314,40 +3314,40 @@ (define_mode_attr vlconvert [ (V512DF "v512di") ]) -(define_mode_iterator V_VLS_FCONVERTL [ - (RVVM8SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM4SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM1SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT && TARGET_MIN_VLEN > 32") - - (RVVM8DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM4DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM2DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - - (V1SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 &
Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-12 11:28 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP llrint auto vectorization From: Pan Li This patch would like to support the FP llrint auto vectorization. * long long llrint (double) This will be the CVT from DF => DI from the standard name's perpsective, which has been covered in previous PATCH(es). Thus, this patch only add some test cases. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t. * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-llrint-0.c| 14 + .../rvv/autovec/unop/math-llrint-run-0.c | 63 +++ .../riscv/rvv/autovec/unop/test-math.h| 2 + .../riscv/rvv/autovec/vls/math-llrint-0.c | 30 + 4 files changed, 109 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c new file mode 100644 index 000..2d90d232ba1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_double_int64_t___builtin_llrint: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +*/ +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c new file mode 100644 index 000..6b69f5568e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c @@ -0,0 +1,63 @@ +/* { dg-do run { target { riscv_v && rv64 } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +int64_t out[ARRAY_SIZE]; +int64_t ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint) +TEST_ASSERT (int64_t) + +TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llrint (1.2), 1) +TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llrint (-1.2), 2) +TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llrint (0.5), 3) +TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llrint (-0.5), 4) +TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llrint (0.1), 5) +TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llrint (-0.1), 6) +TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llrint (3.0), 7) +TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llrint (-3.0), 8) +TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llrint (4503599627370495.5), 9) +TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llrint (4503599627370497.0), 10) +TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llrint (-4503599627370495.5), 11) +TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llrint (-4503599627370496.0), 12) +TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llrint (-0.0), 13) +TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llrint (-0.0), 14) +TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llrint (9223372036854774784.0), 15) +TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, __builtin_llrint (9223372036854775808.0), 16) +TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llrint (-9223372036854775808.0), 17) +TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, __builtin_llrint (-9223372036854777856.0), 18) +TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llrint (__builtin_inf ()), 19) +TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llrint (-__builtin_inf ()), 20) +TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (double, int64_t, 1, __builtin_llrint, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 2, __builtin_llrint, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 3, __builtin_llrint, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 4, __builtin_llrint, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t
Re: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-12 16:59 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization From: Pan Li This patch would like to support the FP lround/lroundf auto vectorization. * long lround (double) for rv64 * long lroundf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lroundmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lround (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lround (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rmm sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi4 // RMM .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lround2): New pattern for lround/lroundf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lround): New func decl for expanding lround. * config/riscv/riscv-v.cc (expand_vec_lround): New func impl for expanding lround. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 10 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 10 +++ .../riscv/rvv/autovec/unop/math-lround-0.c| 19 + .../riscv/rvv/autovec/unop/math-lround-1.c| 19 + .../rvv/autovec/unop/math-lround-run-0.c | 72 +++ .../rvv/autovec/unop/math-lround-run-1.c | 72 +++ .../riscv/rvv/autovec/vls/math-lround-0.c | 30 .../riscv/rvv/autovec/vls/math-lround-1.c | 30 9 files changed, 264 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index ebc51ea69fd..33b11723c21 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2321,3 +2321,13 @@ (define_expand "lrint2" DONE; } ) + +(define_expand "lround2" + [(match_operand:0 "register_operand") + (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_lround (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 8c9f7e0ab11..b7eeeb8f55d 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -302,6 +302,7 @@ enum insn_type : unsigned int UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P, UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P, UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P, + UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P, UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P, UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P, UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P, @@ -475,6 +476,7 @@ void expand_vec_round (rtx, rtx, machine_mode, machine_mode); void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode); void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode); void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode); +void expand_vec_lround (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index a75eb59eb43..b61c745678b 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4122,4 +4122,14 @@ expand_vec_lrint (rtx op_0, rtx op_1, machine_mode vec_fp_mode, emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_DYN, vec_fp_mode); } +v
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
I tree-vect-slp.cc: vect_get_and_check_slp_defs 711: tree type = TREE_TYPE (oprnd); dt = dts[i]; if ((dt == vect_constant_def || dt == vect_external_def) && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () && (TREE_CODE (type) == BOOLEAN_TYPE || !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Build SLP failed: invalid type of def " "for variable-length SLP %T\n", oprnd); return -1; } Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed: Build SLP failed: invalid type of def juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 17:44 To: 钟居哲 CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, ??? wrote: > Thanks Richi point it out. > > I found this patch can't make conditional gather load succeed on SLP. > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > If no condition mask, in tree-vect-patterns.cc, I build MASK_LEN_GATHER_LOAD > (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD. > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow > naturally. > > If has condition mask, in tree-vect-patterns.cc, I build > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same > as MASK_GATHER_LOAD. > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP > flow naturally. > > Is it reasonable ? What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments even when the mask is -1? > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-11 20:50 > To: Juzhe-Zhong > CC: gcc-patches; richard.sandiford > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > This patch fixes this following FAILs in RISC-V regression: > > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump > > vect "Loop contains only SLP stmts" > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only > > SLP stmts" > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump > > vect "Loop contains only SLP stmts" > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only > > SLP stmts" > > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. > > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in > > tree-vect-patterns.cc if it is same > > situation as GATHER_LOAD (no conditional mask). > > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask > > argument is a dummy mask. > > > > gcc/ChangeLog: > > > > * tree-vect-slp.cc (vect_get_operand_map): > > (vect_build_slp_tree_1): > > (vect_build_slp_tree_2): > > * tree-vect-stmts.cc (vectorizable_load): > > > > --- > > gcc/tree-vect-slp.cc | 18 -- > > gcc/tree-vect-stmts.cc | 4 ++-- > > 2 files changed, 18 insertions(+), 4 deletions(-) > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > index fa098f9ff4e..712c04ec278 100644 > > --- a/gcc/tree-vect-slp.cc > > +++ b/gcc/tree-vect-slp.cc > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned > > char swap = 0) > >case IFN_MASK_GATHER_LOAD: > > return arg1_arg4_map; > > > > + case IFN_MASK_LEN_GATHER_LOAD: > > + /* In tree-vect-patterns.cc, we will have these 2 situations: > > + > > + - Unconditional gather load transforms > > + into MASK_LEN_GATHER_LOAD with dummy mask which is -1. > > + > > + - Conditional gather load transforms > > + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ > > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map > > + : nullptr; > > + > >case IFN_MASK_STORE: > > return arg3_arg2_map; > > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > > >if (cfn == CFN_MASK_LOAD > >|| cfn == CFN_GATHER_LOAD > > - || cfn == CFN_MASK_GATHER_LOAD) &g
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Hi, Richi. I restrict as you said into vect_external_def. Then this condition made SLP failed: - if (mask_index >= 0 + if (mask_index >= 0 && internal_fn_len_index (ifn) < 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) return false; So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not check scalar mask. Then ICE here: vect_slp_analyze_node_operations if (child && (SLP_TREE_DEF_TYPE (child) == vect_constant_def || SLP_TREE_DEF_TYPE (child) == vect_external_def) /* Perform usual caching, note code-generation still code-gens these nodes multiple times but we expect to CSE them later. */ && !visited_set.add (child)) { visited_vec.safe_push (child); /* ??? After auditing more code paths make a "default" and push the vector type from NODE to all children if it is not already set. */ /* Compute the number of vectors to be generated. */ tree vector_type = SLP_TREE_VECTYPE (child); if (!vector_type) { /* For shifts with a scalar argument we don't need to cost or code-generate anything. ??? Represent this more explicitely. */ gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) > assert FAILed. == shift_vec_info_type) && j == 1); continue; } Could you help me with that? juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 17:55 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > I tree-vect-slp.cc: > vect_get_and_check_slp_defs > 711: > > tree type = TREE_TYPE (oprnd); > dt = dts[i]; > if ((dt == vect_constant_def >|| dt == vect_external_def) > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > && (TREE_CODE (type) == BOOLEAN_TYPE > || !can_duplicate_and_interleave_p (vinfo, stmts.length (), > type))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "Build SLP failed: invalid type of def " > "for variable-length SLP %T\n", oprnd); > return -1; > } > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this > condition, then SLP failed: > Build SLP failed: invalid type of def I think this can be restricted to vect_external_def, but some history might reveal the cases we put this code in for (we should be able to materialize all constants?). At least uniform boolean constants should be fine. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:44 > To: ??? > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, ??? wrote: > > > Thanks Richi point it out. > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > If no condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as > > GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow > > naturally. > > > > If has condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same > > as MASK_GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP > > flow naturally. > > > > Is it reasonable ? > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > even when the mask is -1? > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-11 20:50 > > To: Juzhe-Zhong > > CC: gcc-patches; richard.sandiford > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > This patch fixes this following FAILs in RISC-V regression:
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Oh. I see. Here make vect_constant_def failed to SLP: tree-vect-slp.cc: vect_build_slp_tree_2 line 2354: if (oprnd_info->first_dt == vect_external_def || oprnd_info->first_dt == vect_constant_def) { slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops); SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt; oprnd_info->ops = vNULL; children.safe_push (invnode); continue; } It seems that we handle vect_constant_def same as vect_external_def. So failed to SLP ? juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 17:55 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > I tree-vect-slp.cc: > vect_get_and_check_slp_defs > 711: > > tree type = TREE_TYPE (oprnd); > dt = dts[i]; > if ((dt == vect_constant_def >|| dt == vect_external_def) > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > && (TREE_CODE (type) == BOOLEAN_TYPE > || !can_duplicate_and_interleave_p (vinfo, stmts.length (), > type))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "Build SLP failed: invalid type of def " > "for variable-length SLP %T\n", oprnd); > return -1; > } > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this > condition, then SLP failed: > Build SLP failed: invalid type of def I think this can be restricted to vect_external_def, but some history might reveal the cases we put this code in for (we should be able to materialize all constants?). At least uniform boolean constants should be fine. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:44 > To: ??? > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, ??? wrote: > > > Thanks Richi point it out. > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > If no condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as > > GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow > > naturally. > > > > If has condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same > > as MASK_GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP > > flow naturally. > > > > Is it reasonable ? > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > even when the mask is -1? > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-11 20:50 > > To: Juzhe-Zhong > > CC: gcc-patches; richard.sandiford > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > This patch fixes this following FAILs in RISC-V regression: > > > > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump > > > vect "Loop contains only SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only > > > SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump > > > vect "Loop contains only SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only > > > SLP stmts" > > > > > > The root cause of these FAIL is that GCC SLP failed on > > > MASK_LEN_GATHER_LOAD. > > > > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in > > > tree-vect-patterns.cc if it is same > > > situation as GATHER_LOAD (no conditional mask). > > > > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask > > > argument is a dummy mask. > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-slp.cc (vect_get_operand_map): > >
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
In tree-vect-stmts.cc vect_check_scalar_mask Failed here: /* If the caller is not prepared for adjusting an external/constant SLP mask vector type fail. */ if (slp_node && !mask_node && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "SLP mask argument is not vectorized.\n"); return false; } If we allow vect_constant_def, we should adjust constant SLP mask ? in the caller "vectorizable_load" ? But I don't know how to adjust that. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 17:55 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > I tree-vect-slp.cc: > vect_get_and_check_slp_defs > 711: > > tree type = TREE_TYPE (oprnd); > dt = dts[i]; > if ((dt == vect_constant_def >|| dt == vect_external_def) > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > && (TREE_CODE (type) == BOOLEAN_TYPE > || !can_duplicate_and_interleave_p (vinfo, stmts.length (), > type))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "Build SLP failed: invalid type of def " > "for variable-length SLP %T\n", oprnd); > return -1; > } > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this > condition, then SLP failed: > Build SLP failed: invalid type of def I think this can be restricted to vect_external_def, but some history might reveal the cases we put this code in for (we should be able to materialize all constants?). At least uniform boolean constants should be fine. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:44 > To: ??? > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, ??? wrote: > > > Thanks Richi point it out. > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > If no condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as > > GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow > > naturally. > > > > If has condition mask, in tree-vect-patterns.cc, I build > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same > > as MASK_GATHER_LOAD. > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP > > flow naturally. > > > > Is it reasonable ? > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > even when the mask is -1? > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-11 20:50 > > To: Juzhe-Zhong > > CC: gcc-patches; richard.sandiford > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > This patch fixes this following FAILs in RISC-V regression: > > > > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump > > > vect "Loop contains only SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only > > > SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump > > > vect "Loop contains only SLP stmts" > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only > > > SLP stmts" > > > > > > The root cause of these FAIL is that GCC SLP failed on > > > MASK_LEN_GATHER_LOAD. > > > > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in > > > tree-vect-patterns.cc if it is same > > > situation as GATHER_LOAD (no conditional mask). > > > > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask > > > argument is a dummy mask. > > > > > > gcc/ChangeLog: > > > > &g
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
The mask node is NULL since the caller : if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) return false; pass NULL to mask_node. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 19:14 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > In tree-vect-stmts.cc > > vect_check_scalar_mask > > Failed here: > > /* If the caller is not prepared for adjusting an external/constant > SLP mask vector type fail. */ > if (slp_node > && !mask_node ^^^ where's the mask_node? > && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "SLP mask argument is not vectorized.\n"); > return false; > } > > If we allow vect_constant_def, we should adjust constant SLP mask ? in the > caller "vectorizable_load" ? > > But I don't know how to adjust that. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:55 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > > > I tree-vect-slp.cc: > > vect_get_and_check_slp_defs > > 711: > > > > tree type = TREE_TYPE (oprnd); > > dt = dts[i]; > > if ((dt == vect_constant_def > >|| dt == vect_external_def) > > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > > && (TREE_CODE (type) == BOOLEAN_TYPE > > || !can_duplicate_and_interleave_p (vinfo, stmts.length > > (), > > type))) > > { > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > "Build SLP failed: invalid type of def " > > "for variable-length SLP %T\n", oprnd); > > return -1; > > } > > > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this > > condition, then SLP failed: > > Build SLP failed: invalid type of def > > I think this can be restricted to vect_external_def, but some history > might reveal the cases we put this code in for (we should be able to > materialize all constants?). At least uniform boolean constants > should be fine. > > > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-12 17:44 > > To: ??? > > CC: gcc-patches; richard.sandiford > > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Thu, 12 Oct 2023, ??? wrote: > > > > > Thanks Richi point it out. > > > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > > > If no condition mask, in tree-vect-patterns.cc, I build > > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as > > > GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP > > > flow naturally. > > > > > > If has condition mask, in tree-vect-patterns.cc, I build > > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments > > > same as MASK_GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD > > > SLP flow naturally. > > > > > > Is it reasonable ? > > > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > > even when the mask is -1? > > > > > > > > juzhe.zh...@rivai.ai > > > > > > From: Richard Biener > > > Date: 2023-10-11 20:50 > > > To: Juzhe-Zhong > > > CC: gcc-patches; richard.sandiford > > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > > > This patch fixes this following FAILs in RI
Re: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization
OK. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 09:38 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization From: Pan Li This patch would like to support the FP lfloor/lfloorf auto vectorization. * long lfloor (double) for rv64 * long lfloorf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lfloormn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lfloor (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lfloor (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.da5,fa5,rdn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrma6 ... fsrmi 2 // RDN .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrma6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lfloor2): New pattern for lfloor/lfloorf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lfloor): New func decl for expanding lfloor. * config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl for expanding lfloor. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 11 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 10 +++ .../riscv/rvv/autovec/unop/math-lfloor-0.c| 19 + .../riscv/rvv/autovec/unop/math-lfloor-1.c| 19 + .../rvv/autovec/unop/math-lfloor-run-0.c | 69 +++ .../rvv/autovec/unop/math-lfloor-run-1.c | 69 +++ .../riscv/rvv/autovec/vls/math-lfloor-0.c | 30 .../riscv/rvv/autovec/vls/math-lfloor-1.c | 30 9 files changed, 259 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 267691a0095..c5b1e52cbf9 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2242,6 +2242,7 @@ (define_expand "avg3_ceil" ;; - lrint/lrintf ;; - irintf ;; - lceil/lceilf +;; - lfloor/lfloorf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2342,3 +2343,13 @@ (define_expand "lceil2" DONE; } ) + +(define_expand "lfloor2" + [(match_operand:0 "register_operand") + (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index ab65ab19524..49bdcdf2f93 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -304,6 +304,7 @@ enum insn_type : unsigned int UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P, UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P, UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P, + UNARY_OP_FRM_RDN = UNARY_OP | FRM_RDN_P, UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P, UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P, UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P, @@ -479,6 +480,7 @@ void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode); void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode); void expand_vec_lround (rtx, rtx, machine_mode, machine_mode); void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode); +void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/r
Re: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases
LGTM。 juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 10:22 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases From: Pan Li Leverage stdint-gcc.h for the int64_t types instead of typedef. Or we may have conflict with stdint-gcc.h in somewhere else. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include stdint-gcc.h for int types. * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t typedef. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c | 1 + .../gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c | 1 + gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h | 2 -- 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c index 2d90d232ba1..4bf125f8cc8 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c @@ -2,6 +2,7 @@ /* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ /* { dg-final { check-function-bodies "**" "" } } */ +#include #include "test-math.h" /* diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c index 6b69f5568e9..409175a8dff 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c @@ -1,6 +1,7 @@ /* { dg-do run { target { riscv_v && rv64 } } } */ /* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ +#include #include "test-math.h" #define ARRAY_SIZE 128 diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h index 3867bc50a14..a1c9d55bd48 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h @@ -68,8 +68,6 @@ #define FRM_RMM 4 #define FRM_DYN 7 -typedef long long int64_t; - static inline void set_rm (unsigned rm) { -- 2.34.1
Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Hi, Richi. As you suggest, I keep MAK_LEN_GATHER_LOAD (...,-1) format and support SLP for that in V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632846.html Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-12 19:14 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > In tree-vect-stmts.cc > > vect_check_scalar_mask > > Failed here: > > /* If the caller is not prepared for adjusting an external/constant > SLP mask vector type fail. */ > if (slp_node > && !mask_node ^^^ where's the mask_node? > && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "SLP mask argument is not vectorized.\n"); > return false; > } > > If we allow vect_constant_def, we should adjust constant SLP mask ? in the > caller "vectorizable_load" ? > > But I don't know how to adjust that. > > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:55 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote: > > > I tree-vect-slp.cc: > > vect_get_and_check_slp_defs > > 711: > > > > tree type = TREE_TYPE (oprnd); > > dt = dts[i]; > > if ((dt == vect_constant_def > >|| dt == vect_external_def) > > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > > && (TREE_CODE (type) == BOOLEAN_TYPE > > || !can_duplicate_and_interleave_p (vinfo, stmts.length > > (), > > type))) > > { > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > "Build SLP failed: invalid type of def " > > "for variable-length SLP %T\n", oprnd); > > return -1; > > } > > > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this > > condition, then SLP failed: > > Build SLP failed: invalid type of def > > I think this can be restricted to vect_external_def, but some history > might reveal the cases we put this code in for (we should be able to > materialize all constants?). At least uniform boolean constants > should be fine. > > > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-12 17:44 > > To: ??? > > CC: gcc-patches; richard.sandiford > > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Thu, 12 Oct 2023, ??? wrote: > > > > > Thanks Richi point it out. > > > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > > > If no condition mask, in tree-vect-patterns.cc, I build > > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as > > > GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP > > > flow naturally. > > > > > > If has condition mask, in tree-vect-patterns.cc, I build > > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments > > > same as MASK_GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD > > > SLP flow naturally. > > > > > > Is it reasonable ? > > > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > > even when the mask is -1? > > > > > > > > juzhe.zh...@rivai.ai > > > > > > From: Richard Biener > > > Date: 2023-10-11 20:50 > > > To: Juzhe-Zhong > > > CC: gcc-patches; richard.sandiford > > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > > > This patch fixes this following FAILs in RISC-V regression: > > > > > > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-obje
Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV
Thanks. Committed. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-10-13 14:01 To: Juzhe-Zhong CC: GCC Patches; Jeff Law; Richard Biener Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV LGTM Juzhe-Zhong 於 2023年10月12日 週四 22:45 寫道: Like ARM SVE and GCN, add RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr69907.c: Add RVV. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c index b348526b62f..f63b42a271a 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c @@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2) /* Disable for SVE because for long or variable-length vectors we don't get an unrolled epilogue loop. Also disable for AArch64 Advanced SIMD, because there we can vectorize the epilogue using mixed vector sizes. - Likewise for AMD GCN. */ -/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! amdgcn*-*-* } } } } } */ + Likewise for AMD GCN and RVV. */ +/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! amdgcn*-*-* } && { ! riscv_v } } } } } } */ -- 2.36.3
Re: [PATCH v1] RISC-V: Add test for FP llround auto vectorization
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 14:15 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add test for FP llround auto vectorization From: Pan Li The below FP API are supported already by sharing the same standard name, as well as the machine mode. long long llround (double); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llround-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-llround-0.c | 20 ++ .../rvv/autovec/unop/math-llround-run-0.c | 64 +++ .../riscv/rvv/autovec/vls/math-llround-0.c| 30 + 3 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llround-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c new file mode 100644 index 000..4f8b4553a91 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include +#include "test-math.h" + +/* +** test_double_int64_t___builtin_llround: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+4 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ret +*/ +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c new file mode 100644 index 000..c5b60847cc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c @@ -0,0 +1,64 @@ +/* { dg-do run { target { riscv_v && rv64 } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +int64_t out[ARRAY_SIZE]; +int64_t ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround) +TEST_ASSERT (int64_t) + +TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llround (1.2), 1) +TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llround (-1.2), 2) +TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llround (0.5), 3) +TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llround (-0.5), 4) +TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llround (0.1), 5) +TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llround (-0.1), 6) +TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llround (3.0), 7) +TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llround (-3.0), 8) +TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llround (4503599627370495.5), 9) +TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llround (4503599627370497.0), 10) +TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llround (-4503599627370495.5), 11) +TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llround (-4503599627370496.0), 12) +TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llround (-0.0), 13) +TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llround (-0.0), 14) +TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llround (9223372036854774784.0), 15) +TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16) +TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llround (-9223372036854775808.0), 17) +TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18) +TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llround (__builtin_inf ()), 19) +TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llround (-__builtin_inf ()), 20) +TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (double, int64_t, 1, __builtin_llround, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 2, __builtin_llround, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 3, __builtin_llround, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 4, __builtin_llround, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 5, __builtin_llround, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int6
Re: [PATCH v1] RISC-V: Add test for FP llceil auto vectorization
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 15:20 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add test for FP llceil auto vectorization From: Pan Li The below FP API are supported already by sharing the same standard name, as well as the machine mode. long long llceil (double); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-llceil-0.c| 20 ++ .../rvv/autovec/unop/math-llceil-run-0.c | 64 +++ .../riscv/rvv/autovec/vls/math-llceil-0.c | 30 + 3 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c new file mode 100644 index 000..3480c3ea91d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include +#include "test-math.h" + +/* +** test_double_int64_t___builtin_llceil: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ret +*/ +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llceil) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c new file mode 100644 index 000..5ccbe64ffb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c @@ -0,0 +1,64 @@ +/* { dg-do run { target { riscv_v && rv64 } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +int64_t out[ARRAY_SIZE]; +int64_t ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llceil) +TEST_ASSERT (int64_t) + +TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llceil (1.2), 1) +TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llceil (-1.2), 2) +TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llceil (0.5), 3) +TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llceil (-0.5), 4) +TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llceil (0.1), 5) +TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llceil (-0.1), 6) +TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llceil (3.0), 7) +TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llceil (-3.0), 8) +TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llceil (4503599627370495.5), 9) +TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llceil (4503599627370497.0), 10) +TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llceil (-4503599627370495.5), 11) +TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llceil (-4503599627370496.0), 12) +TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llceil (-0.0), 13) +TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llceil (-0.0), 14) +TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llceil (9223372036854774784.0), 15) +TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16) +TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llceil (-9223372036854775808.0), 17) +TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18) +TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llceil (__builtin_inf ()), 19) +TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llceil (-__builtin_inf ()), 20) +TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (double, int64_t, 1, __builtin_llceil, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 2, __builtin_llceil, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 3, __builtin_llceil, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 4, __builtin_llceil, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 5, __builtin_llceil, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 6, __builtin_llceil, in, out, ref, AR
Re: [PATCH v1] RISC-V: Add test for FP iceil auto vectorization
Ok juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 16:06 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add test for FP iceil auto vectorization From: Pan Li The below FP API are supported already by sharing the same standard name, as well as the machine mode. int iceil (float); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-iceil-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-iceil-0.c | 19 ++ .../riscv/rvv/autovec/unop/math-iceil-run-0.c | 63 +++ .../riscv/rvv/autovec/vls/math-iceil-0.c | 30 + 3 files changed, 112 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-iceil-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c new file mode 100644 index 000..2d4a1d163d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float_int___builtin_iceilf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ret +*/ +TEST_UNARY_CALL_CVT (float, int, __builtin_iceilf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c new file mode 100644 index 000..714173a7f8b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c @@ -0,0 +1,63 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +float in[ARRAY_SIZE]; +int out[ARRAY_SIZE]; +int ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (float, int, __builtin_iceilf) +TEST_ASSERT (int) + +TEST_INIT_CVT (float, 1.2, int, __builtin_iceilf (1.2), 1) +TEST_INIT_CVT (float, -1.2, int, __builtin_iceilf (-1.2), 2) +TEST_INIT_CVT (float, 0.5, int, __builtin_iceilf (0.5), 3) +TEST_INIT_CVT (float, -0.5, int, __builtin_iceilf (-0.5), 4) +TEST_INIT_CVT (float, 0.1, int, __builtin_iceilf (0.1), 5) +TEST_INIT_CVT (float, -0.1, int, __builtin_iceilf (-0.1), 6) +TEST_INIT_CVT (float, 3.0, int, __builtin_iceilf (3.0), 7) +TEST_INIT_CVT (float, -3.0, int, __builtin_iceilf (-3.0), 8) +TEST_INIT_CVT (float, 8388607.5, int, __builtin_iceilf (8388607.5), 9) +TEST_INIT_CVT (float, 8388609.0, int, __builtin_iceilf (8388609.0), 10) +TEST_INIT_CVT (float, -8388607.5, int, __builtin_iceilf (-8388607.5), 11) +TEST_INIT_CVT (float, -8388609.0, int, __builtin_iceilf (-8388609.0), 12) +TEST_INIT_CVT (float, 0.0, int, __builtin_iceilf (-0.0), 13) +TEST_INIT_CVT (float, -0.0, int, __builtin_iceilf (-0.0), 14) +TEST_INIT_CVT (float, 2147483520.0, int, __builtin_iceilf (2147483520.0), 15) +TEST_INIT_CVT (float, 2147483648.0, int, 0x7fff, 16) +TEST_INIT_CVT (float, -2147483648.0, int, __builtin_iceilf (-2147483648.0), 17) +TEST_INIT_CVT (float, -2147483904.0, int, 0x8000, 18) +TEST_INIT_CVT (float, __builtin_inf (), int, __builtin_iceilf (__builtin_inff ()), 19) +TEST_INIT_CVT (float, -__builtin_inf (), int, __builtin_iceilf (-__builtin_inff ()), 20) +TEST_INIT_CVT (float, __builtin_nanf (""), int, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (float, int, 1, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 2, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 3, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 4, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 5, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 6, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 7, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 8, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 9, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 10, __builtin_iceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 11, __builtin_iceilf, in, out, r
Re: [PATCH v1] RISC-V: Add test for FP ifloor auto vectorization
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 16:23 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add test for FP ifloor auto vectorization From: Pan Li The below FP API are supported already by sharing the same standard name, as well as the machine mode. int ifloor (float); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-ifloor-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ifloor-0.c| 19 ++ .../rvv/autovec/unop/math-ifloor-run-0.c | 63 +++ .../riscv/rvv/autovec/vls/math-ifloor-0.c | 30 + 3 files changed, 112 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ifloor-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c new file mode 100644 index 000..b9ec415d690 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float_int___builtin_ifloorf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+2 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ret +*/ +TEST_UNARY_CALL_CVT (float, int, __builtin_ifloorf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c new file mode 100644 index 000..8ef4da0ea88 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c @@ -0,0 +1,63 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +float in[ARRAY_SIZE]; +int out[ARRAY_SIZE]; +int ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (float, int, __builtin_ifloorf) +TEST_ASSERT (int) + +TEST_INIT_CVT (float, 1.2, int, __builtin_ifloorf (1.2), 1) +TEST_INIT_CVT (float, -1.2, int, __builtin_ifloorf (-1.2), 2) +TEST_INIT_CVT (float, 0.5, int, __builtin_ifloorf (0.5), 3) +TEST_INIT_CVT (float, -0.5, int, __builtin_ifloorf (-0.5), 4) +TEST_INIT_CVT (float, 0.1, int, __builtin_ifloorf (0.1), 5) +TEST_INIT_CVT (float, -0.1, int, __builtin_ifloorf (-0.1), 6) +TEST_INIT_CVT (float, 3.0, int, __builtin_ifloorf (3.0), 7) +TEST_INIT_CVT (float, -3.0, int, __builtin_ifloorf (-3.0), 8) +TEST_INIT_CVT (float, 8388607.5, int, __builtin_ifloorf (8388607.5), 9) +TEST_INIT_CVT (float, 8388609.0, int, __builtin_ifloorf (8388609.0), 10) +TEST_INIT_CVT (float, -8388607.5, int, __builtin_ifloorf (-8388607.5), 11) +TEST_INIT_CVT (float, -8388609.0, int, __builtin_ifloorf (-8388609.0), 12) +TEST_INIT_CVT (float, 0.0, int, __builtin_ifloorf (-0.0), 13) +TEST_INIT_CVT (float, -0.0, int, __builtin_ifloorf (-0.0), 14) +TEST_INIT_CVT (float, 2147483520.0, int, __builtin_ifloorf (2147483520.0), 15) +TEST_INIT_CVT (float, 2147483648.0, int, 0x7fff, 16) +TEST_INIT_CVT (float, -2147483648.0, int, __builtin_ifloorf (-2147483648.0), 17) +TEST_INIT_CVT (float, -2147483904.0, int, 0x8000, 18) +TEST_INIT_CVT (float, __builtin_inf (), int, __builtin_ifloorf (__builtin_inff ()), 19) +TEST_INIT_CVT (float, -__builtin_inf (), int, __builtin_ifloorf (-__builtin_inff ()), 20) +TEST_INIT_CVT (float, __builtin_nanf (""), int, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (float, int, 1, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 2, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 3, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 4, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 5, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 6, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 7, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 8, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 9, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (float, int, 10, __builtin_ifloorf, in, out, ref, ARRAY_SIZE); + RUN
Re: [PATCH v1] RISC-V: Add test for FP llfloor auto vectorization
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-13 17:49 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add test for FP llfloor auto vectorization From: Pan Li The below FP API are supported already by sharing the same standard name, as well as the machine mode. long long llfloor (double); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llfloor-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-llfloor-0.c | 20 ++ .../rvv/autovec/unop/math-llfloor-run-0.c | 64 +++ .../riscv/rvv/autovec/vls/math-llfloor-0.c| 30 + 3 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llfloor-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c new file mode 100644 index 000..4b10f966015 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include +#include "test-math.h" + +/* +** test_double_int64_t___builtin_llfloor: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+2 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ret +*/ +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llfloor) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c new file mode 100644 index 000..22829132e96 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c @@ -0,0 +1,64 @@ +/* { dg-do run { target { riscv_v && rv64 } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +int64_t out[ARRAY_SIZE]; +int64_t ref[ARRAY_SIZE]; + +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llfloor) +TEST_ASSERT (int64_t) + +TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llfloor (1.2), 1) +TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llfloor (-1.2), 2) +TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llfloor (0.5), 3) +TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llfloor (-0.5), 4) +TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llfloor (0.1), 5) +TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llfloor (-0.1), 6) +TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llfloor (3.0), 7) +TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llfloor (-3.0), 8) +TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llfloor (4503599627370495.5), 9) +TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llfloor (4503599627370497.0), 10) +TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llfloor (-4503599627370495.5), 11) +TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llfloor (-4503599627370496.0), 12) +TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llfloor (-0.0), 13) +TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llfloor (-0.0), 14) +TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llfloor (9223372036854774784.0), 15) +TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16) +TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llfloor (-9223372036854775808.0), 17) +TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18) +TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llfloor (__builtin_inf ()), 19) +TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llfloor (-__builtin_inf ()), 20) +TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21) + +int +main () +{ + RUN_TEST_CVT (double, int64_t, 1, __builtin_llfloor, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 2, __builtin_llfloor, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 3, __builtin_llfloor, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 4, __builtin_llfloor, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int64_t, 5, __builtin_llfloor, in, out, ref, ARRAY_SIZE); + RUN_TEST_CVT (double, int6
Re: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.
Thanks Robin. Committed. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-16 17:12 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements. Hi Juzhe, this LGTM. I was first concerned whether we would want to stop e.g. at LMUL = 1 and only continue with a specific flag but actually this should be done via the costs. If an implementation wants to penalize or incentivize some behavior it can always adjust the costs which should be sufficient. Regards Robin
Re: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store
V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633120.html with some bug fix. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-16 11:57 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrrt0,vlenb csrra6,vlenb sllit1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.viv4,v4,1 sllit3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7)- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) - spill sllia3,a3,1 addit4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleua3,a6,.L3 csrra4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vvv4,v24,v20 bgtua7,a6,.L13 csrra1,vlenb sllia1,a1,2 add a1,a1,sp li a4,-1 csrrt0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 sllit1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vvv0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vvv12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addwa0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrr a6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srli a2,a6,1 vmv.v.x v4,a1 vid.v v12 slli a3,a3,1 vand.vi v0,v12,1 addi t1,a2,-1 vmseq.vi v0,v0,1 slli a6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minu a4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vv v2,v16,v6 bgtu a4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,-1 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v6,v4,v2,v0 vredsum.vs v6,v6,v1 vmul.vx v0,v12,a3 vadd.vi v0,v0,-1 van
[PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
Hi, Richard. >> Does IFN_COND_LEN make conceptual sense on RVV? If so, would defining >> it solve some of these problems? Yes, IFN_COND_LEN make sense to RVV. We have vmerge instruction which depending on VL/AVL. I must say my internal RVV GCC has IFN_LEN_VCOND_MASK which simplify COND_LEN_ADD (mask, a, 0, b, len, bias) into LEN_VCOND_MASK (mask, a, b, len, bias) I think upstream GCC could consider this approach. Thanks. juzhe.zh...@rivai.ai
Re: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Hi, Richard. >> slp_op and mask_vectype are only initialised when mask_index >= 0. >>Shouldn't this code be under mask_index >= 0 too? >>Also, when do we encounter mismatched mask_vectypes? Presumably the SLP >>node has a known vectype by this point. I think a comment would be useful. Address comment and I think we won't encounter mismatch mask_vectypes. So, I changed code in V4 as follows: + if (mask_index >= 0 && slp_node) + { + bool match_p + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype); + gcc_assert (match_p); + } https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633209.html Assert we always match mask_vectype. Tested on RISC-V and bootstrap && regtest on X86 passed. Could you confirm it ? juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-10-17 05:34 To: Juzhe-Zhong CC: gcc-patches; rguenther Subject: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Juzhe-Zhong writes: > This patch fixes this following FAILs in RISC-V regression: > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump > vect "Loop contains only SLP stmts" > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP > stmts" > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump > vect "Loop contains only SLP stmts" > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP > stmts" > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. > > We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: > > 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, > condtional mask). > >This situation we just need to leverage the current MASK_GATHER_LOAD which > can achieve SLP MASK_LEN_GATHER_LOAD. > > 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, > zero, -1) > >Current SLP check will failed on dummy mask -1, so we relax the check in > tree-vect-slp.cc and allow it to be materialized. > > Consider this following case: > > void __attribute__((noipa)) > f (int *restrict y, int *restrict x, int *restrict indices, int n) > { > for (int i = 0; i < n; ++i) > { > y[i * 2] = x[indices[i * 2]] + 1; > y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; > } > } > > https://godbolt.org/z/WG3M3n7Mo > > GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: > > f: > ble a3,zero,.L5 > .L3: > vsetvli a5,a3,e8,mf4,ta,ma > vsetvli zero,a5,e32,m1,ta,ma > vlseg2e32.v v6,(a2) > vsetvli a4,zero,e64,m2,ta,ma > vsext.vf2 v2,v6 > vsll.vi v2,v2,2 > vsetvli zero,a5,e32,m1,ta,ma > vluxei64.v v1,(a1),v2 > vsetvli a4,zero,e64,m2,ta,ma > vsext.vf2 v2,v7 > vsetvli zero,zero,e32,m1,ta,ma > vadd.vi v4,v1,1 > vsetvli zero,zero,e64,m2,ta,ma > vsll.vi v2,v2,2 > vsetvli zero,a5,e32,m1,ta,ma > vluxei64.v v2,(a1),v2 > vsetvli a4,zero,e32,m1,ta,ma > sllia6,a5,3 > vadd.vi v5,v2,2 > sub a3,a3,a5 > vsetvli zero,a5,e32,m1,ta,ma > vsseg2e32.v v4,(a0) > add a2,a2,a6 > add a0,a0,a6 > bne a3,zero,.L3 > .L5: > ret > > After this patch: > > f: > ble a3,zero,.L5 > li a5,1 > csrr t1,vlenb > slli a5,a5,33 > srli a7,t1,2 > addi a5,a5,1 > slli a3,a3,1 > neg t3,a7 > vsetvli a4,zero,e64,m1,ta,ma > vmv.v.x v4,a5 > .L3: > minu a5,a3,a7 > vsetvli zero,a5,e32,m1,ta,ma > vle32.v v1,0(a2) > vsetvli a4,zero,e64,m2,ta,ma > vsext.vf2 v2,v1 > vsll.vi v2,v2,2 > vsetvli zero,a5,e32,m1,ta,ma > vluxei64.v v2,(a1),v2 > vsetvli a4,zero,e32,m1,ta,ma > mv a6,a3 > vadd.vv v2,v2,v4 > vsetvli zero,a5,e32,m1,ta,ma > vse32.v v2,0(a0) > add a2,a2,t1 > add a0,a0,t1 > add a3,a3,t3 > bgtu a6,a7,.L3 > .L5: > ret > > Note that I found we are missing conditional mask gather_load SLP test, > Append a test for it in this patch. Yeah, we're missing a target-independent test. I'm afraid I used aarch64-specific tests for a lot of this stuff, since (a) I wanted to check the quality of the asm output and (b) it's very hard to write gcc.dg/vect tests that don't fail on some target or other. Thanks for picking this up. > > Tested on RISC-V and Bootstrap && Regression on X86 passed. > > Ok for trunk ? > > gcc/ChangeLog: > > * tree-vect-slp.cc (vect_get_operand_map): Add M
Re: [PATCH] RISC-V: Fix failed testcase when use -cmodel=medany
OK juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 17:57 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Fix failed testcase when use -cmodel=medany This little path fix a failed testcase when use -cmodel=medany. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/cpymem-1.c: Split check. --- gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c index 9bb4904e8e9..549d6648104 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c @@ -50,7 +50,7 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l) Use extern here so that we get a known alignment, lest DATA_ALIGNMENT force us to make the scan pattern accomodate code for different alignments depending on word size. -** f3: +** f3: { target { any-opts "-mcmodel=medlow" } } **lui\s+[ta][0-7],%hi\(a_a\) **lui\s+[ta][0-7],%hi\(a_b\) **addi\s+a4,[ta][0-7],%lo\(a_b\) @@ -61,6 +61,16 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l) **ret */ +/* +** f3: { target { any-opts "-mcmodel=medany" } } +**lla\s+[ta][0-7],a_b +**vsetivli\s+zero,16,e32,m4,ta,ma +**vle32.v\s+v\d+,0\([ta][0-7]\) +**lla\s+[ta][0-7],a_a +**vse32\.v\s+v\d+,0\([ta][0-7]\) +**ret +*/ + extern struct { __INT32_TYPE__ a[16]; } a_a, a_b; void f3 () -- 2.36.3
Re: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]
Committed. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-17 15:30 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832] Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832 Now, we enable more tests in rvv.exp in this patch and fix the bug. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests. --- gcc/config/riscv/riscv-vector-costs.cc | 19 +-- gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 10 -- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 33061efb1d0..af87388a1e4 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -154,6 +154,14 @@ compute_local_program_points ( } } +static machine_mode +get_biggest_mode (machine_mode mode1, machine_mode mode2) +{ + unsigned int mode1_size = GET_MODE_BITSIZE (mode1).to_constant (); + unsigned int mode2_size = GET_MODE_BITSIZE (mode2).to_constant (); + return mode1_size >= mode2_size ? mode1 : mode2; +} + /* Compute local live ranges of each vectorized variable. Note that we only compute local live ranges (within a block) since local live ranges information is accurate enough for us to determine @@ -201,12 +209,12 @@ compute_local_live_ranges ( { unsigned int point = program_point.point; gimple *stmt = program_point.stmt; - machine_mode mode = biggest_mode; tree lhs = gimple_get_lhs (stmt); if (lhs != NULL_TREE && is_gimple_reg (lhs) && !POINTER_TYPE_P (TREE_TYPE (lhs))) { - mode = TYPE_MODE (TREE_TYPE (lhs)); + biggest_mode = get_biggest_mode (biggest_mode, +TYPE_MODE (TREE_TYPE (lhs))); bool existed_p = false; pair &live_range = live_ranges->get_or_insert (lhs, &existed_p); @@ -225,7 +233,9 @@ compute_local_live_ranges ( the future. */ if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var))) { - mode = TYPE_MODE (TREE_TYPE (var)); + biggest_mode + = get_biggest_mode (biggest_mode, + TYPE_MODE (TREE_TYPE (var))); bool existed_p = false; pair &live_range = live_ranges->get_or_insert (var, &existed_p); @@ -238,9 +248,6 @@ compute_local_live_ranges ( live_range = pair (0, point); } } - if (GET_MODE_SIZE (mode).to_constant () - > GET_MODE_SIZE (biggest_mode).to_constant ()) - biggest_mode = mode; } if (dump_enabled_p ()) for (hash_map::iterator iter = live_ranges->begin (); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp index ff76e17d0e6..674ba0d72b4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp @@ -58,10 +58,12 @@ set AUTOVEC_TEST_OPTS [list \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \ {-ftree-vectorize -O3 --param riscv-autovec-lmul=m8} \ + {-ftree-vectorize -O3 --param riscv-autovec-lmul=dynamic} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \ {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} \ - {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} ] + {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} \ + {-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic} ] foreach op $AUTOVEC_TEST_OPTS { dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/partial/*.\[cS\]]] \ "" "$op" @@ -104,18 +106,22 @@ set AUTOVEC_TEST_OPTS [list \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \ + {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=dynamic -ffast-math} \ {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \ {-ftree-vectorize -O2 --param riscv-autovec-preference=
Re: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager
+ demand_system dem; + auto_vec vector_block_infos; + + /* data for avl reaching defintion. */ + sbitmap avl_regs; + sbitmap *avl_def_in; + sbitmap *avl_def_out; + sbitmap *reg_def_loc; + + /* data for vsetvl info reaching defintion. */ + vsetvl_info unknow_info; + auto_vec vsetvl_def_exprs; + sbitmap *vsetvl_def_in; + sbitmap *vsetvl_def_out; + + /* data for lcm */ + auto_vec exprs; + sbitmap *avloc; + sbitmap *avin; + sbitmap *avout; + sbitmap *kill; + sbitmap *antloc; + sbitmap *transp; + sbitmap *insert; + sbitmap *del; + struct edge_list *edges; + + auto_vec delete_list; All of them add "m_" prefix. earliest_fusion_worthwhile_p -> successors_probability_equal_p calculate_dominance_info (CDI_POST_DOMINATORS); > remove free_dominance_info (CDI_POST_DOMINATORS); ---> remove juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager This sub-patch refactor vector_infos_manager to a pre_vsetvl class which is responsible for the entire lazy vsetvl jobs. There is no need to introduce a separate vsetvl infos manager, because vsetvl infos are modified by the optimization code. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::vector_infos_manager): Removed. (class pre_vsetvl): New class. (vector_infos_manager::create_expr): Removed. (vector_infos_manager::get_expr_id): Removed. (vector_infos_manager::all_same_ratio_p): Removed. (vector_infos_manager::all_avail_in_compatible_p): Removed. (vector_infos_manager::all_same_avl_p): Removed. (vector_infos_manager::expr_set_num): Removed. (vector_infos_manager::release): Removed. (vector_infos_manager::create_bitmap_vectors): Removed. (vector_infos_manager::free_bitmap_vectors): Removed. (vector_infos_manager::dump): Removed. * config/riscv/riscv-vsetvl.h (class vector_infos_manager): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 632 +-- gcc/config/riscv/riscv-vsetvl.h | 75 2 files changed, 257 insertions(+), 450 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index be40b6fdf4c..c219ad178bb 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2390,402 +2390,284 @@ public: } }; -vector_infos_manager::vector_infos_manager () +class pre_vsetvl { - vector_edge_list = nullptr; - vector_kill = nullptr; - vector_del = nullptr; - vector_insert = nullptr; - vector_antic = nullptr; - vector_transp = nullptr; - vector_comp = nullptr; - vector_avin = nullptr; - vector_avout = nullptr; - vector_antin = nullptr; - vector_antout = nullptr; - vector_earliest = nullptr; - vector_insn_infos.safe_grow_cleared (get_max_uid ()); - vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun)); - if (!optimize) -{ - basic_block cfg_bb; - rtx_insn *rinsn; - FOR_ALL_BB_FN (cfg_bb, cfun) - { - vector_block_infos[cfg_bb->index].local_dem = vector_insn_info (); - vector_block_infos[cfg_bb->index].reaching_out = vector_insn_info (); - FOR_BB_INSNS (cfg_bb, rinsn) - vector_insn_infos[INSN_UID (rinsn)].parse_insn (rinsn); - } -} - else -{ - for (const bb_info *bb : crtl->ssa->bbs ()) - { - vector_block_infos[bb->index ()].local_dem = vector_insn_info (); - vector_block_infos[bb->index ()].reaching_out = vector_insn_info (); - for (insn_info *insn : bb->real_insns ()) - vector_insn_infos[insn->uid ()].parse_insn (insn); - vector_block_infos[bb->index ()].probability = profile_probability (); - } -} -} - -void -vector_infos_manager::create_expr (vector_insn_info &info) -{ - for (size_t i = 0; i < vector_exprs.length (); i++) -if (*vector_exprs[i] == info) - return; - vector_exprs.safe_push (&info); -} - -size_t -vector_infos_manager::get_expr_id (const vector_insn_info &info) const -{ - for (size_t i = 0; i < vector_exprs.length (); i++) -if (*vector_exprs[i] == info) - return i; - gcc_unreachable (); -} - -auto_vec -vector_infos_manager::get_all_available_exprs ( - const vector_insn_info &info) const -{ - auto_vec available_list; - for (size_t i = 0; i < vector_exprs.length (); i++) -if (info.available_p (*vector_exprs[i])) - available_list.safe_push (i); - return available_list; -} - -bool -vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const -{ - if (bitmap_empty_p (bitdata)) -return false; - - int ratio = -1; - unsigned int bb_index; - sbitmap_iterator sbi; - - EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi) -{ - if (ratio == -1) - ratio = vector_exprs[bb_index]->get_ratio (); - else if (vector_exprs[bb_index]->get_ratio () != ratio) - return false; -} - return true; -} - -/* Return TRUE if the incoming vector configurat
Re: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
LGMT this patch. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl This sub-patch remove the method about optimize vsetvl infos into class pre_vsetvl. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): Removed. (pass_vsetvl::get_block_info): Removed. (pass_vsetvl::update_vector_info): Removed. (pass_vsetvl::update_block_info): Removed. (pass_vsetvl::simple_vsetvl): Removed. (pass_vsetvl::lazy_vsetvl): Removed. (pass_vsetvl::execute): Removed. (make_pass_vsetvl): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 228 --- 1 file changed, 87 insertions(+), 141 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index c219ad178bb..3f07fde782f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2684,54 +2684,8 @@ const pass_data pass_data_vsetvl = { class pass_vsetvl : public rtl_opt_pass { private: - vector_infos_manager *m_vector_manager; - - const vector_insn_info &get_vector_info (const rtx_insn *) const; - const vector_insn_info &get_vector_info (const insn_info *) const; - const vector_block_info &get_block_info (const basic_block) const; - const vector_block_info &get_block_info (const bb_info *) const; - vector_block_info &get_block_info (const basic_block); - vector_block_info &get_block_info (const bb_info *); - void update_vector_info (const insn_info *, const vector_insn_info &); - void update_block_info (int, profile_probability, const vector_insn_info &); - - void simple_vsetvl (void) const; - void lazy_vsetvl (void); - - /* Phase 1. */ - void compute_local_backward_infos (const bb_info *); - - /* Phase 2. */ - bool need_vsetvl (const vector_insn_info &, const vector_insn_info &) const; - void transfer_before (vector_insn_info &, insn_info *) const; - void transfer_after (vector_insn_info &, insn_info *) const; - void emit_local_forward_vsetvls (const bb_info *); - - /* Phase 3. */ - bool earliest_fusion (void); - void vsetvl_fusion (void); - - /* Phase 4. */ - void prune_expressions (void); - void compute_local_properties (void); - bool can_refine_vsetvl_p (const basic_block, const vector_insn_info &) const; - void refine_vsetvls (void) const; - void cleanup_vsetvls (void); - bool commit_vsetvls (void); - void pre_vsetvl (void); - - /* Phase 5. */ - rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const; - void local_eliminate_vsetvl_insn (const bb_info *) const; - bool global_eliminate_vsetvl_insn (const bb_info *) const; - void ssa_post_optimization (void) const; - - /* Phase 6. */ - void df_post_optimization (void) const; - - void init (void); - void done (void); - void compute_probabilities (void); + void simple_vsetvl (); + void lazy_vsetvl (); public: pass_vsetvl (gcc::context *ctxt) : rtl_opt_pass (pass_data_vsetvl, ctxt) {} @@ -2741,69 +2695,11 @@ public: virtual unsigned int execute (function *) final override; }; // class pass_vsetvl -const vector_insn_info & -pass_vsetvl::get_vector_info (const rtx_insn *i) const -{ - return m_vector_manager->vector_insn_infos[INSN_UID (i)]; -} - -const vector_insn_info & -pass_vsetvl::get_vector_info (const insn_info *i) const -{ - return m_vector_manager->vector_insn_infos[i->uid ()]; -} - -const vector_block_info & -pass_vsetvl::get_block_info (const basic_block bb) const -{ - return m_vector_manager->vector_block_infos[bb->index]; -} - -const vector_block_info & -pass_vsetvl::get_block_info (const bb_info *bb) const -{ - return m_vector_manager->vector_block_infos[bb->index ()]; -} - -vector_block_info & -pass_vsetvl::get_block_info (const basic_block bb) -{ - return m_vector_manager->vector_block_infos[bb->index]; -} - -vector_block_info & -pass_vsetvl::get_block_info (const bb_info *bb) -{ - return m_vector_manager->vector_block_infos[bb->index ()]; -} - -void -pass_vsetvl::update_vector_info (const insn_info *i, - const vector_insn_info &new_info) -{ - m_vector_manager->vector_insn_infos[i->uid ()] = new_info; -} - void -pass_vsetvl::update_block_info (int index, profile_probability prob, - const vector_insn_info &new_info) -{ - m_vector_manager->vector_block_infos[index].probability = prob; - if (m_vector_manager->vector_block_infos[index].local_dem - == m_vector_manager->vector_block_infos[index].reaching_out) -m_vector_manager->vector_block_infos[index].local_dem = new_info; - m_vector_manager->vector_block_infos[index].reaching_out = new_info; -} - -/* Simple m_vsetvl_insert vsetvl for optimize == 0. */ -void -pass_vsetvl::simple_vsetvl (void) const +pass_vsetvl::simple_vsetvl () {
Re: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to vsetvl_block_info class
+ const vsetvl_info &get_header_info () const + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[0]; + } Change it into get_entry_info (be consistent with mode-switching naming which also uses LCM). + const vsetvl_info &get_footer_info () const + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[infos.length () - 1]; + } Change it into get_exit_info (be consistent with mode-switching naming which also uses LCM). juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to vsetvl_block_info class This sub-patch adjust vector_block_info codes and rename to vsetvl_block_info. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (class vsetvl_block_info): New. * config/riscv/riscv-vsetvl.h (struct vector_block_info): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 55 +++- gcc/config/riscv/riscv-vsetvl.h | 14 2 files changed, 54 insertions(+), 15 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index b5ed1ea774a..d91b0272d9f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -85,7 +85,6 @@ along with GCC; see the file COPYING3. If not see #include "predict.h" #include "profile-count.h" #include "gcse.h" -#include "riscv-vsetvl.h" using namespace rtl_ssa; using namespace riscv_vector; @@ -1218,6 +1217,60 @@ public: } }; +class vsetvl_block_info +{ +public: + /* The static execute probability of the demand info. */ + profile_probability probability; + + auto_vec infos; + vsetvl_info m_info; + bb_info *m_bb; + + bool full_available; + + vsetvl_block_info () : m_bb (nullptr), full_available (false) + { +infos.safe_grow_cleared (0); +m_info.set_empty (); + } + vsetvl_block_info (const vsetvl_block_info &other) +: probability (other.probability), infos (other.infos.copy ()), + m_info (other.m_info), m_bb (other.m_bb) + {} + + vsetvl_info &get_header_info () + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[0]; + } + vsetvl_info &get_footer_info () + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[infos.length () - 1]; + } + const vsetvl_info &get_header_info () const + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[0]; + } + const vsetvl_info &get_footer_info () const + { +gcc_assert (!empty_p ()); +return infos.is_empty () ? m_info : infos[infos.length () - 1]; + } + + bool empty_p () const { return infos.is_empty () && !has_info (); } + bool has_info () const { return !m_info.empty_p (); } + void set_info (const vsetvl_info &info) + { +gcc_assert (infos.is_empty ()); +m_info = info; +m_info.set_bb (m_bb); + } + void set_empty_info () { m_info.set_empty (); } +}; + class demand_system { private: diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h index 96e36403af7..16c84e0684b 100644 --- a/gcc/config/riscv/riscv-vsetvl.h +++ b/gcc/config/riscv/riscv-vsetvl.h @@ -55,19 +55,5 @@ enum def_type CLOBBER_DEF = 1 << 4 }; -struct vector_block_info -{ - /* The local_dem vector insn_info of the block. */ - vector_insn_info local_dem; - - /* The reaching_out vector insn_info of the block. */ - vector_insn_info reaching_out; - - /* The static execute probability of the demand info. */ - profile_probability probability; - - vector_block_info () = default; -}; - } // namespace riscv_vector #endif -- 2.36.3
Re: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2
LGTM on algorithm of local analysis. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2 This sub-patch combine phase 1 and 2 to use the new demand system and delay the insert of vsetvl insn into phase 4. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): New. (pass_vsetvl::compute_local_backward_infos): Removed. (pass_vsetvl::need_vsetvl): Removed. (pass_vsetvl::transfer_before): Removed. (pass_vsetvl::transfer_after): Removed. (pass_vsetvl::emit_local_forward_vsetvls): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 269 ++- 1 file changed, 123 insertions(+), 146 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 3f07fde782f..33bdcec04d8 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2669,6 +2669,129 @@ public: } }; +void +pre_vsetvl::fuse_local_vsetvl_info () +{ + reg_def_loc += sbitmap_vector_alloc (last_basic_block_for_fn (cfun), GP_REG_LAST + 1); + bitmap_vector_clear (reg_def_loc, last_basic_block_for_fn (cfun)); + bitmap_ones (reg_def_loc[ENTRY_BLOCK_PTR_FOR_FN (cfun)->index]); + + for (bb_info *bb : crtl->ssa->bbs ()) +{ + auto &block_info = get_block_info (bb); + block_info.m_bb = bb; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, " Try fuse basic block %d\n", bb->index ()); + } + auto_vec infos; + for (insn_info *insn : bb->real_nondebug_insns ()) + { + vsetvl_info curr_info = vsetvl_info (insn); + if (curr_info.valid_p () || curr_info.unknown_p ()) + infos.safe_push (curr_info); + + /* Collecting GP registers modified by the current bb. */ + if (insn->is_real ()) + for (def_info *def : insn->defs ()) + if (def->is_reg () && GP_REG_P (def->regno ())) + bitmap_set_bit (reg_def_loc[bb->index ()], def->regno ()); + } + + vsetvl_info prev_info = vsetvl_info (); + prev_info.set_empty (); + for (auto &curr_info : infos) + { + if (prev_info.empty_p ()) + prev_info = curr_info; + else if ((curr_info.unknown_p () && prev_info.valid_p ()) +|| (curr_info.valid_p () && prev_info.unknown_p ())) + { + block_info.infos.safe_push (prev_info); + prev_info = curr_info; + } + else if (curr_info.valid_p () && prev_info.valid_p ()) + { + if (dem.available_with (prev_info, curr_info)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, +"Ignore curr info since prev info " +"available with it:\n"); + fprintf (dump_file, " prev_info: "); + prev_info.dump (dump_file, ""); + fprintf (dump_file, " curr_info: "); + curr_info.dump (dump_file, ""); + fprintf (dump_file, "\n"); + } + if (!curr_info.use_by_non_rvv_insn_p () + && vsetvl_insn_p (curr_info.get_insn ()->rtl ())) + delete_list.safe_push (curr_info); + + if (curr_info.get_read_vl_insn ()) + prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ()); + } + else if (dem.compatible_with (prev_info, curr_info)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Fuse curr info since prev info " + "compatible with it:\n"); + fprintf (dump_file, " prev_info: "); + prev_info.dump (dump_file, ""); + fprintf (dump_file, " curr_info: "); + curr_info.dump (dump_file, ""); + } + dem.merge_with (prev_info, curr_info); + if (curr_info.get_read_vl_insn ()) + prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ()); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, " prev_info after fused: "); + prev_info.dump (dump_file, ""); + fprintf (dump_file, "\n"); + } + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, +"Cannot fuse uncompatible infos:\n"); + fprintf (dump_file, " prev_info: "); + prev_info.dump (dump_file, " "); + fprintf (dump_file, " curr_info: "); + curr_info.dump (dump_file, " "); + } + block_info.infos.safe_push (prev_info); + prev_info = curr_info; + } + } + } + + if (prev_info.valid_p () || prev_info.unknown_p ()) + block_info.infos.safe_push (prev_info); +} + + avl_regs = sbitmap_alloc (GP_REG_LA
Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow
compute_vsetvl_lcm_data -> compute_lcm_local_properties juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow This sub-patch add some helper functions for computing reaching defintion data and three computational functions for different object. These three functions are used by phase 2 and 3. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New. (compute_reaching_defintion): New. (pre_vsetvl::compute_avl_def_data): New. (pre_vsetvl::compute_vsetvl_def_data): New. (pre_vsetvl::compute_vsetvl_lcm_data): New. --- gcc/config/riscv/riscv-vsetvl.cc | 468 +++ 1 file changed, 468 insertions(+) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 33bdcec04d8..b1269e8cf4f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -103,6 +103,121 @@ along with GCC; see the file COPYING3. If not see using namespace rtl_ssa; using namespace riscv_vector; +/* Set the bitmap DST to the union of SRC of predecessors of + basic block B. + It's a bit different from bitmap_union_of_preds in cfganal.cc. This function + takes into account the case where pred is ENTRY basic block. The main reason + for this difference is to make it easier to insert some special value into + the ENTRY base block. For example, vsetvl_info with a status of UNKNOW. */ +static void +bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b) +{ + unsigned int set_size = dst->size; + edge e; + unsigned ix; + + for (ix = 0; ix < EDGE_COUNT (b->preds); ix++) +{ + e = EDGE_PRED (b, ix); + bitmap_copy (dst, src[e->src->index]); + break; +} + + if (ix == EDGE_COUNT (b->preds)) +bitmap_clear (dst); + else +for (ix++; ix < EDGE_COUNT (b->preds); ix++) + { + unsigned int i; + SBITMAP_ELT_TYPE *p, *r; + + e = EDGE_PRED (b, ix); + p = src[e->src->index]->elms; + r = dst->elms; + for (i = 0; i < set_size; i++) + *r++ |= *p++; + } +} + +/* Compute the reaching defintion in and out based on the gen and KILL + informations in each Base Blocks. + This function references the compute_avaiable implementation in lcm.cc */ +static void +compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in, + sbitmap *out) +{ + edge e; + basic_block *worklist, *qin, *qout, *qend, bb; + unsigned int qlen; + edge_iterator ei; + + /* Allocate a worklist array/queue. Entries are only added to the + list if they were not already on the list. So the size is + bounded by the number of basic blocks. */ + qin = qout = worklist += XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS); + + /* Put every block on the worklist; this is necessary because of the + optimistic initialization of AVOUT above. Use reverse postorder + to make the forward dataflow problem require less iterations. */ + int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS); + int n = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, false); + for (int i = 0; i < n; ++i) +{ + bb = BASIC_BLOCK_FOR_FN (cfun, rpo[i]); + *qin++ = bb; + bb->aux = bb; +} + free (rpo); + + qin = worklist; + qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS]; + qlen = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; + + /* Mark blocks which are successors of the entry block so that we + can easily identify them below. */ + FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs) +e->dest->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + /* Iterate until the worklist is empty. */ + while (qlen) +{ + /* Take the first entry off the worklist. */ + bb = *qout++; + qlen--; + + if (qout >= qend) + qout = worklist; + + /* Do not clear the aux field for blocks which are successors of the + ENTRY block. That way we never add then to the worklist again. */ + if (bb->aux != ENTRY_BLOCK_PTR_FOR_FN (cfun)) + bb->aux = NULL; + + bitmap_union_of_preds_with_entry (in[bb->index], out, bb); + + if (bitmap_ior_and_compl (out[bb->index], gen[bb->index], in[bb->index], + kill[bb->index])) + /* If the out state of this block changed, then we need +to add the successors of this block to the worklist +if they are not already on the worklist. */ + FOR_EACH_EDGE (e, ei, bb->succs) + if (!e->dest->aux && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)) + { + *qin++ = e->dest; + e->dest->aux = e; + qlen++; + + if (qin >= qend) + qin = worklist; + } +} + + clear_aux_for_edges (); + clear_aux_for_blocks (); + free (worklist); +} + stat
Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow
Copy and paste the original comments: -/* Compute the local properties of each recorded expression. - - Local properties are those that are defined by the block, irrespective of - other blocks. - - An expression is transparent in a block if its operands are not modified - in the block. - - An expression is computed (locally available) in a block if it is computed - at least once and expression would contain the same value if the - computation was moved to the end of the block. - - An expression is locally anticipatable in a block if it is computed at - least once and expression would contain the same value if the computation - was moved to the beginning of the block. */ -void -pass_vsetvl::compute_local_properties (void) -{ - /* - If T is locally available at the end of a block, then T' must be - available at the end of the same block. Since some optimization has - occurred earlier, T' might not be locally available, however, it must - have been previously computed on all paths. As a formula, T at AVLOC(B) - implies that T' at AVOUT(B). - An "available occurrence" is one that is the last occurrence in the - basic block and the operands are not modified by following statements in - the basic block [including this insn]. - - - If T is locally anticipated at the beginning of a block, then either - T', is locally anticipated or it is already available from previous - blocks. As a formula, this means that T at ANTLOC(B) implies that T' at - ANTLOC(B) at AVIN(B). - An "anticipatable occurrence" is one that is the first occurrence in the - basic block, the operands are not modified in the basic block prior - to the occurrence and the output is not used between the start of - the block and the occurrence. */ juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow This sub-patch add some helper functions for computing reaching defintion data and three computational functions for different object. These three functions are used by phase 2 and 3. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New. (compute_reaching_defintion): New. (pre_vsetvl::compute_avl_def_data): New. (pre_vsetvl::compute_vsetvl_def_data): New. (pre_vsetvl::compute_vsetvl_lcm_data): New. --- gcc/config/riscv/riscv-vsetvl.cc | 468 +++ 1 file changed, 468 insertions(+) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 33bdcec04d8..b1269e8cf4f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -103,6 +103,121 @@ along with GCC; see the file COPYING3. If not see using namespace rtl_ssa; using namespace riscv_vector; +/* Set the bitmap DST to the union of SRC of predecessors of + basic block B. + It's a bit different from bitmap_union_of_preds in cfganal.cc. This function + takes into account the case where pred is ENTRY basic block. The main reason + for this difference is to make it easier to insert some special value into + the ENTRY base block. For example, vsetvl_info with a status of UNKNOW. */ +static void +bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b) +{ + unsigned int set_size = dst->size; + edge e; + unsigned ix; + + for (ix = 0; ix < EDGE_COUNT (b->preds); ix++) +{ + e = EDGE_PRED (b, ix); + bitmap_copy (dst, src[e->src->index]); + break; +} + + if (ix == EDGE_COUNT (b->preds)) +bitmap_clear (dst); + else +for (ix++; ix < EDGE_COUNT (b->preds); ix++) + { + unsigned int i; + SBITMAP_ELT_TYPE *p, *r; + + e = EDGE_PRED (b, ix); + p = src[e->src->index]->elms; + r = dst->elms; + for (i = 0; i < set_size; i++) + *r++ |= *p++; + } +} + +/* Compute the reaching defintion in and out based on the gen and KILL + informations in each Base Blocks. + This function references the compute_avaiable implementation in lcm.cc */ +static void +compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in, + sbitmap *out) +{ + edge e; + basic_block *worklist, *qin, *qout, *qend, bb; + unsigned int qlen; + edge_iterator ei; + + /* Allocate a worklist array/queue. Entries are only added to the + list if they were not already on the list. So the size is + bounded by the number of basic blocks. */ + qin = qout = worklist += XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS); + + /* Put every block on the worklist; this is necessary because of the + optimistic initialization of AVOUT above. Use reverse postorder + to make the forward dataflow problem require less iterations.
Re: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class This patch adjust move the code phase 2 and 3 from pass_vsetvl to pre_vsetvl class. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): New. (pre_vsetvl::pre_global_vsetvl_info): New. (pass_vsetvl::prune_expressions): Removed. (pass_vsetvl::compute_local_properties): Removed. (pass_vsetvl::earliest_fusion): Removed. (pass_vsetvl::vsetvl_fusion): Removed. (pass_vsetvl::pre_vsetvl): Removed. (pass_vsetvl::compute_probabilities): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 829 +++ 1 file changed, 398 insertions(+), 431 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index b1269e8cf4f..a112895a283 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -3260,6 +3260,404 @@ pre_vsetvl::fuse_local_vsetvl_info () } } +bool +pre_vsetvl::earliest_fuse_vsetvl_info () +{ + compute_avl_def_data (); + compute_vsetvl_def_data (); + compute_vsetvl_lcm_data (); + + unsigned num_exprs = exprs.length (); + struct edge_list *edges = create_edge_list (); + unsigned num_edges = NUM_EDGES (edges); + sbitmap *antin += sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs); + sbitmap *antout += sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs); + + sbitmap *earliest = sbitmap_vector_alloc (num_edges, num_exprs); + + compute_available (avloc, kill, avout, avin); + compute_antinout_edge (antloc, transp, antin, antout); + compute_earliest (edges, num_exprs, antin, antout, avout, kill, earliest); + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "\n Compute LCM earliest insert data:\n\n"); + fprintf (dump_file, "Expression List (%u):\n", num_exprs); + for (unsigned i = 0; i < num_exprs; i++) + { + const auto &info = *exprs[i]; + fprintf (dump_file, " Expr[%u]: ", i); + info.dump (dump_file, ""); + } + fprintf (dump_file, "\nbitmap data:\n"); + for (const bb_info *bb : crtl->ssa->bbs ()) + { + unsigned int i = bb->index (); + fprintf (dump_file, " BB %u:\n", i); + fprintf (dump_file, "avloc: "); + dump_bitmap_file (dump_file, avloc[i]); + fprintf (dump_file, "kill: "); + dump_bitmap_file (dump_file, kill[i]); + fprintf (dump_file, "antloc: "); + dump_bitmap_file (dump_file, antloc[i]); + fprintf (dump_file, "transp: "); + dump_bitmap_file (dump_file, transp[i]); + + fprintf (dump_file, "avin: "); + dump_bitmap_file (dump_file, avin[i]); + fprintf (dump_file, "avout: "); + dump_bitmap_file (dump_file, avout[i]); + fprintf (dump_file, "antin: "); + dump_bitmap_file (dump_file, antin[i]); + fprintf (dump_file, "antout: "); + dump_bitmap_file (dump_file, antout[i]); + } + fprintf (dump_file, "\n"); + fprintf (dump_file, " earliest:\n"); + for (unsigned ed = 0; ed < num_edges; ed++) + { + edge eg = INDEX_EDGE (edges, ed); + + if (bitmap_empty_p (earliest[ed])) + continue; + fprintf (dump_file, "Edge(bb %u -> bb %u): ", eg->src->index, +eg->dest->index); + dump_bitmap_file (dump_file, earliest[ed]); + } + fprintf (dump_file, "\n"); +} + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "Fused global info result:\n"); +} + + bool changed = false; + for (unsigned ed = 0; ed < num_edges; ed++) +{ + sbitmap e = earliest[ed]; + if (bitmap_empty_p (e)) + continue; + + unsigned int expr_index; + sbitmap_iterator sbi; + EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi) + { + vsetvl_info &curr_info = *exprs[expr_index]; + if (!curr_info.valid_p ()) + continue; + + edge eg = INDEX_EDGE (edges, ed); + if (eg->probability == profile_probability::never ()) + continue; + if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun) + || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) + continue; + + vsetvl_block_info &src_block_info = get_block_info (eg->src); + vsetvl_block_info &dest_block_info = get_block_info (eg->dest); + + if (src_block_info.probability + == profile_probability::uninitialized ()) + continue; + + if (src_block_info.empty_p ()) + { + vsetvl_info new_curr_info = curr_info; + new_curr_info.set_bb (crtl->ssa->bb (eg->dest)); + bool has_compatible_
Re: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn into Phase 4
LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn into Phase 4 This sub-patch move the modification of rtl codes from pass_vsetvl into pre_vsetvl class. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): New. (pass_vsetvl::can_refine_vsetvl_p): Removed. (pass_vsetvl::refine_vsetvls): Removed. (pass_vsetvl::cleanup_vsetvls): Removed. (pass_vsetvl::commit_vsetvls): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 389 +++ 1 file changed, 134 insertions(+), 255 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index a112895a283..5d84d290e9e 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -3658,6 +3658,140 @@ pre_vsetvl::pre_global_vsetvl_info () } } +void +pre_vsetvl::emit_vsetvl () +{ + bool need_commit = false; + + for (const bb_info *bb : crtl->ssa->bbs ()) +{ + for (const auto &curr_info : get_block_info (bb).infos) + { + insn_info *insn = curr_info.get_insn (); + if (curr_info.ignore_p ()) + { + if (vsetvl_insn_p (insn->rtl ())) + eliminate_insn (insn->rtl ()); + continue; + } + else if (curr_info.valid_p ()) + { + if (vsetvl_insn_p (insn->rtl ())) + { + const vsetvl_info temp = vsetvl_info (insn); + if (!(curr_info == temp)) + { + if (dump_file) + { + fprintf (dump_file, "\n Change vsetvl info from: "); + temp.dump (dump_file, ""); + fprintf (dump_file, " to: "); + curr_info.dump (dump_file, ""); + } + change_vsetvl_insn (insn, curr_info); + } + } + else + { + if (dump_file) + { + fprintf (dump_file, +"\n Insert vsetvl info before insn %d: ", +insn->uid ()); + curr_info.dump (dump_file, ""); + } + insert_vsetvl (EMIT_BEFORE, insn->rtl (), curr_info); + } + } + } +} + + for (const vsetvl_info &item : delete_list) +{ + gcc_assert (vsetvl_insn_p (item.get_insn ()->rtl ())); + eliminate_insn (item.get_insn ()->rtl ()); +} + + /* Insert vsetvl as LCM suggest. */ + for (int ed = 0; ed < NUM_EDGES (edges); ed++) +{ + edge eg = INDEX_EDGE (edges, ed); + sbitmap i = insert[ed]; + if (bitmap_count_bits (i) < 1) + continue; + + if (bitmap_count_bits (i) > 1) + /* For code with infinite loop (e.g. pr61634.c), The data flow is +completely wrong. */ + continue; + + gcc_assert (bitmap_count_bits (i) == 1); + unsigned expr_index = bitmap_first_set_bit (i); + const vsetvl_info &info = *exprs[expr_index]; + gcc_assert (info.valid_p ()); + if (dump_file) + { + fprintf (dump_file, +"\n Insert vsetvl info at edge(bb %u -> bb %u): ", +eg->src->index, eg->dest->index); + info.dump (dump_file, ""); + } + rtl_profile_for_edge (eg); + start_sequence (); + + insn_info *insn = info.get_insn (); + insert_vsetvl (EMIT_DIRECT, insn->rtl (), info); + rtx_insn *rinsn = get_insns (); + end_sequence (); + default_rtl_profile (); + + /* We should not get an abnormal edge here. */ + gcc_assert (!(eg->flags & EDGE_ABNORMAL)); + need_commit = true; + insert_insn_on_edge (rinsn, eg); +} + + /* Insert vsetvl info that was not deleted after lift up. */ + for (const bb_info *bb : crtl->ssa->bbs ()) +{ + const vsetvl_block_info &block_info = get_block_info (bb); + if (!block_info.has_info ()) + continue; + + const vsetvl_info &footer_info = block_info.get_footer_info (); + insn_info *insn = footer_info.get_insn (); + + if (footer_info.ignore_p ()) + continue; + + edge eg; + edge_iterator eg_iterator; + FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs) + { + gcc_assert (!(eg->flags & EDGE_ABNORMAL)); + if (dump_file) + { + fprintf ( + dump_file, + "\n Insert missed vsetvl info at edge(bb %u -> bb %u): ", + eg->src->index, eg->dest->index); + footer_info.dump (dump_file, ""); + } + start_sequence (); + insert_vsetvl (EMIT_DIRECT, insn->rtl (), footer_info); + rtx_insn *rinsn = get_insns (); + end_sequence (); + default_rtl_profile (); + insert_insn_on_edge (rinsn, eg); + need_commit = true; + } +} + + if (need_commit) +commit_edge_insertions (); +} + + const pass_data pass_data_vsetvl = { RTL_PASS, /* type */ "vsetvl", /* name */ @@ -3790,261 +3924,6 @@ make_pass_vsetvl (gcc::context *ctxt) return new pass_vsetvl (ctxt); } - -/* Return true if VSETVL in the block can be refin
Re: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase
LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase This sub-patch deletes partial post optimize code(which implement in the main phase) and move the remain cleanup code to pre_vsetvl class. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::cleaup): New. (pre_vsetvl::remove_avl_operand): New. (pre_vsetvl::remove_unused_dest_operand): New. (pass_vsetvl::get_vsetvl_at_end): Removed. (local_avl_compatible_p): Removed. (pass_vsetvl::local_eliminate_vsetvl_insn): Removed. (get_first_vsetvl_before_rvv_insns): Removed. (pass_vsetvl::global_eliminate_vsetvl_insn): Removed. (pass_vsetvl::ssa_post_optimization): Removed. (has_no_uses): Removed. (pass_vsetvl::df_post_optimization): Removed. (pass_vsetvl::init): Removed. (pass_vsetvl::done): Removed. (pass_vsetvl::lazy_vsetvl): Removed. --- gcc/config/riscv/riscv-vsetvl.cc | 675 --- 1 file changed, 76 insertions(+), 599 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 5d84d290e9e..ac636623b3f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -3791,6 +3791,82 @@ pre_vsetvl::emit_vsetvl () commit_edge_insertions (); } +void +pre_vsetvl::cleaup () +{ + remove_avl_operand (); + remove_unused_dest_operand (); +} + +void +pre_vsetvl::remove_avl_operand () +{ + for (const bb_info *bb : crtl->ssa->bbs ()) +for (insn_info *insn : bb->real_nondebug_insns ()) + { + rtx_insn *rinsn = insn->rtl (); + /* Erase the AVL operand from the instruction. */ + if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn))) + continue; + rtx avl = get_vl (rinsn); + if (count_regno_occurrences (rinsn, REGNO (avl)) == 1) + { + /* Get the list of uses for the new instruction. */ + auto attempt = crtl->ssa->new_change_attempt (); + insn_change change (insn); + /* Remove the use of the substituted value. */ + access_array_builder uses_builder (attempt); + uses_builder.reserve (insn->num_uses () - 1); + for (use_info *use : insn->uses ()) + if (use != find_access (insn->uses (), REGNO (avl))) + uses_builder.quick_push (use); + use_array new_uses = use_array (uses_builder.finish ()); + change.new_uses = new_uses; + change.move_range = insn->ebb ()->insn_range (); + rtx pat; + if (fault_first_load_p (rinsn)) + pat = simplify_replace_rtx (PATTERN (rinsn), avl, const0_rtx); + else + { + rtx set = single_set (rinsn); + rtx src = simplify_replace_rtx (SET_SRC (set), avl, const0_rtx); + pat = gen_rtx_SET (SET_DEST (set), src); + } + bool ok = change_insn (crtl->ssa, change, insn, pat); + gcc_assert (ok); + } + } +} + +void +pre_vsetvl::remove_unused_dest_operand () +{ + df_analyze (); + hash_set to_delete; + basic_block cfg_bb; + rtx_insn *rinsn; + FOR_ALL_BB_FN (cfg_bb, cfun) +{ + FOR_BB_INSNS (cfg_bb, rinsn) + { + if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn)) + { + rtx vl = get_vl (rinsn); + vsetvl_info info = vsetvl_info (rinsn); + if (has_no_uses (cfg_bb, rinsn, REGNO (vl))) + { + if (!info.has_vlmax_avl ()) + { + rtx new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, + NULL_RTX); + validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, +false); + } + } + } + } +} +} const pass_data pass_data_vsetvl = { RTL_PASS, /* type */ @@ -3923,602 +3999,3 @@ make_pass_vsetvl (gcc::context *ctxt) { return new pass_vsetvl (ctxt); } - -/* Some instruction can not be accessed in RTL_SSA when we don't re-init - the new RTL_SSA framework but it is definetely at the END of the block. - - Here we optimize the VSETVL is hoisted by LCM: - - Before LCM: - bb 1: - vsetvli a5,a2,e32,m1,ta,mu - bb 2: - vsetvli zero,a5,e32,m1,ta,mu - ... - - After LCM: - bb 1: - vsetvli a5,a2,e32,m1,ta,mu - LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate - bb 2: - ... - */ -rtx_insn * -pass_vsetvl::get_vsetvl_at_end (const bb_info *bb, vector_insn_info *dem) const -{ - rtx_insn *end_vsetvl = BB_END (bb->cfg_bb ()); - if (end_vsetvl && NONDEBUG_INSN_P (end_vsetvl)) -{ - if (JUMP_P (end_vsetvl)) - end_vsetvl = PREV_INSN (end_vsetvl); - - if (NONDEBUG_INSN_P (end_vsetvl) - && vsetvl_discard_result_insn_p (end_vsetvl)) - { - /* Only handle single succ. here, multiple succ. is much - more complicated. */ - if (single_succ_p (bb->cfg_bb ())) - { - edge e = single_succ_edge (bb->cfg_bb ()); - *dem = get_block_info (e->dest).local_dem; - return end_vsetvl; - } - } -} - return nullptr; -} - -/* This predicator should only used w
Re: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h
OK juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h This sub-patch delete the unused header file riscv-vsetvl.h since we no need export any function. gcc/ChangeLog: * config/riscv/t-riscv: Removed riscv-vsetvl.h * config/riscv/riscv-vsetvl.h: Removed. --- gcc/config/riscv/riscv-vsetvl.h | 59 - gcc/config/riscv/t-riscv| 2 +- 2 files changed, 1 insertion(+), 60 deletions(-) delete mode 100644 gcc/config/riscv/riscv-vsetvl.h diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h deleted file mode 100644 index 16c84e0684b..000 --- a/gcc/config/riscv/riscv-vsetvl.h +++ /dev/null @@ -1,59 +0,0 @@ -/* VSETVL pass header for RISC-V 'V' Extension for GNU compiler. - Copyright (C) 2022-2023 Free Software Foundation, Inc. - Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd. - -This file is part of GCC. - -GCC is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 3, or(at your option) -any later version. - -GCC is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with GCC; see the file COPYING3. If not see -<http://www.gnu.org/licenses/>. */ - -#ifndef GCC_RISCV_VSETVL_H -#define GCC_RISCV_VSETVL_H - -namespace riscv_vector { - -/* Classification of vsetvl instruction. */ -enum vsetvl_type -{ - VSETVL_NORMAL, - VSETVL_VTYPE_CHANGE_ONLY, - VSETVL_DISCARD_RESULT, - NUM_VSETVL_TYPE -}; - -enum emit_type -{ - /* emit_insn directly. */ - EMIT_DIRECT, - EMIT_BEFORE, - EMIT_AFTER, -}; - -enum def_type -{ - REAL_SET = 1 << 0, - PHI_SET = 1 << 1, - BB_HEAD_SET = 1 << 2, - BB_END_SET = 1 << 3, - /* ??? TODO: In RTL_SSA framework, we have REAL_SET, - PHI_SET, BB_HEAD_SET, BB_END_SET and - CLOBBER_DEF def_info types. Currently, - we conservatively do not optimize clobber - def since we don't see the case that we - need to optimize it. */ - CLOBBER_DEF = 1 << 4 -}; - -} // namespace riscv_vector -#endif diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index f137e1f17ef..dd17056fe82 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -64,7 +64,7 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \ $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \ $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \ insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \ - predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \ + predict.h profile-count.h \ $(srcdir)/config/riscv/riscv-vsetvl.def $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ $(srcdir)/config/riscv/riscv-vsetvl.cc -- 2.36.3
Re: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL
OK juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:34 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL This sub-patch reoriganize the functions that used to modify RTL. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (has_no_uses): Moved. (validate_change_or_fail): Moved. (gen_vsetvl_pat): Removed. (emit_vsetvl_insn): Removed. (eliminate_insn): Removed. (change_insn): Removed. (change_vsetvl_insn): New. (pre_vsetvl::emit_vsetvl): New. (pre_vsetvl::remove_avl_operand): Adjust. (pre_vsetvl::remove_unused_dest_operand): Adjust. (pass_vsetvl::simple_vsetvl): Adjust. --- gcc/config/riscv/riscv-vsetvl.cc | 443 --- 1 file changed, 176 insertions(+), 267 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index d91b0272d9f..78816cbee15 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -680,6 +680,30 @@ get_bb_index (unsigned expr_id, unsigned num_bb) return expr_id % num_bb; } +/* Return true if the SET result is not used by any instructions. */ +static bool +has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno) +{ + if (bitmap_bit_p (df_get_live_out (cfg_bb), regno)) +return false; + + rtx_insn *iter; + for (iter = NEXT_INSN (rinsn); iter && iter != NEXT_INSN (BB_END (cfg_bb)); + iter = NEXT_INSN (iter)) +if (df_find_use (iter, regno_reg_rtx[regno])) + return false; + + return true; +} + +/* Change insn and Assert the change always happens. */ +static void +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group) +{ + bool change_p = validate_change (object, loc, new_rtx, in_group); + gcc_assert (change_p); +} + /* This flags indicates the minimum demand of the vl and vtype values by the RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV instruction only needs the SEW/LMUL ratio to remain the same, and does not @@ -1126,6 +1150,28 @@ public: } } + /* Returns the corresponding vsetvl rtx pat. */ + rtx get_vsetvl_pat (bool ignore_vl = false) const + { +rtx avl = get_avl (); +/* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s, + set the value of avl to (const_int 0) so that VSETVL PASS will + insert vsetvl correctly.*/ +if (!get_avl ()) + avl = GEN_INT (0); +rtx sew = gen_int_mode (get_sew (), Pmode); +rtx vlmul = gen_int_mode (get_vlmul (), Pmode); +rtx ta = gen_int_mode (get_ta (), Pmode); +rtx ma = gen_int_mode (get_ma (), Pmode); + +if (change_vtype_only_p ()) + return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma); +else if (has_reg_vl () && !ignore_vl) + return gen_vsetvl (Pmode, get_vl (), avl, sew, vlmul, ta, ma); +else + return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma); + } + bool operator== (const vsetvl_info &other) const { gcc_assert (!uninit_p () && !other.uninit_p () @@ -1938,199 +1984,6 @@ public: } }; -/* Emit vsetvl instruction. */ -static rtx -gen_vsetvl_pat (enum vsetvl_type insn_type, const vsetvl_info &info, rtx vl) -{ - rtx avl = info.get_avl (); - /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s, - set the value of avl to (const_int 0) so that VSETVL PASS will - insert vsetvl correctly.*/ - if (!info.get_avl ()) -avl = GEN_INT (0); - rtx sew = gen_int_mode (info.get_sew (), Pmode); - rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode); - rtx ta = gen_int_mode (info.get_ta (), Pmode); - rtx ma = gen_int_mode (info.get_ma (), Pmode); - - if (insn_type == VSETVL_NORMAL) -{ - gcc_assert (vl != NULL_RTX); - return gen_vsetvl (Pmode, vl, avl, sew, vlmul, ta, ma); -} - else if (insn_type == VSETVL_VTYPE_CHANGE_ONLY) -return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma); - else -return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma); -} - -static rtx -gen_vsetvl_pat (rtx_insn *rinsn, const vsetvl_info &info, rtx vl = NULL_RTX) -{ - rtx new_pat; - vsetvl_info new_info = info; - /* For vmv.x.s, use 0 for avl. */ - if (!info.get_avl ()) -{ - new_info.set_avl (const0_rtx); - new_info.set_avl_def (nullptr); -} - - if (vl) -new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl); - else -{ - if (vsetvl_insn_p (rinsn) && !info.change_vtype_only_p ()) - new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn)); - else if (info.change_vtype_only_p () -|| INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only) - new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX); - else - new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, new_info, NULL_RTX); -} - return new_pat; -} - -static void -emit_vsetvl_insn (enum vsetvl_type insn_type, enum emit_type em
Re: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases
OK juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-10-17 19:35 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases This sub-patch adjust some testcases and add some bugfix testcases. PR target/111037 PR target/111234 PR target/111725 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust. * gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust. * gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust. * gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust. * gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust. * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust. * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust. * gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust. * gcc.target/riscv/rvv/base/pr111037-1.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here. * gcc.target/riscv/rvv/base/pr111037-2.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust. * gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test. * gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test. * gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test. * gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test. --- .../gcc.target/riscv/rvv/base/scalar_move-1.c | 2 +- .../riscv/rvv/vsetvl/avl_single-104.c | 35 +++ .../riscv/rvv/vsetvl/avl_single-105.c | 23 .../riscv/rvv/vsetvl/avl_single-23.c | 7 ++-- .../riscv/rvv/vsetvl/avl_single-46.c | 3 +- .../riscv/rvv/vsetvl/avl_single-89.c | 8 ++--- .../riscv/rvv/vsetvl/avl_single-95.c | 2 +- .../riscv/rvv/vsetvl/imm_bb_prop-1.c | 7 ++-- .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/pr109773-1.c | 2 +- .../riscv/rvv/{base => vsetvl}/pr111037-1.c | 0 .../riscv/rvv/{base => vsetvl}/pr111037-2.c | 0 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c | 16 + .../gcc.target/riscv/rvv/vsetvl/pr111037-4.c | 16 + .../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +++--- .../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +++--- .../riscv/rvv/vsetvl/vlmax_conflict-12.c | 1 - .../riscv/rvv/vsetvl/vlmax_conflict-3.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c | 4 +-- .../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c | 4 ++- .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 2 +- 21 files changed, 125 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c index 18349132a88..c833d8989e9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c @@ -46,8 +46,8 @@ int32_t foo3 (int32_t *base, size_t vl) ** vl1re32\.v\tv[0-9]+,0\([a-x0-9]+\) ** vsetvli\tzero,[a-x0-9]+,e32,m1,t[au],m[au] ** vadd.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+ -** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au] ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+ +** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au] ** vmv.v.x\tv[0-9]+,\s*[a-x0-9]+ ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+ ** ret diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c new file mode 100644 index 000..fb3577dcb98 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2 -fno-tree-vectorize" } */ + +#include "riscv_vector.h" + +void +foo (int cond, int vl, int *in, int *out, int n) +{ + if (cond > 30) +{ + vint32m1_t v = __riscv_vle32_v_i32m1 ((int32_t *) in, vl); + __riscv_vse32_v_i32m1 ((int32_t *) out, v, vl); +} + else if (cond < 10) +{ + vint8mf4_t v = __riscv_vle8_v_i8mf4 ((int8_t *) in, vl); + v = __riscv_vle8_v_i8mf4_tu (v, (int8_t *) in + 10, vl); + __riscv_vse8_v_i8mf4 ((int8_t *) out, v, vl); +} + else +{ + vl = vl * 2; +} + + for (int i = 0; i
Re: [PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Forget about this patch. Commit log code example is wrong, fixed it in V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633420.html Thanks. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-18 18:21 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations: 1. better LICM. 2. More opportunities of transforming 'vv' into 'vx' in the future. Before this patch: f3: ble a4,zero,.L8 csrrt0,vlenb sllit1,t0,4 csrra6,vlenb sub sp,sp,t1 csrra5,vlenb sllia6,a6,3 sllia5,a5,2 add a6,a6,sp vsetvli a7,zero,e16,m8,ta,ma sllia4,a4,3 vid.v v8 addit6,a5,-1 vand.vi v8,v8,-2 neg t5,a5 vs8r.v v8,0(sp) vadd.vi v8,v8,1 vs8r.v v8,0(a6) j .L4 .L12: vsetvli a7,zero,e16,m8,ta,ma .L4: csrrt0,vlenb sllit0,t0,3 vl8re16.v v16,0(sp) add t0,t0,sp vmv.v.x v8,t6 mv t1,a4 vand.vv v24,v16,v8 mv a6,a4 vl8re16.v v16,0(t0) vand.vv v8,v16,v8 bleua4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v20,0(a2) vle8.v v16,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v24 vadd.vv v4,v16,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v20,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtut1,a5,.L12 csrrt0,vlenb sllit1,t0,4 add sp,sp,t1 jr ra .L8: ret After this patch: bar: ble a3,zero,.L5 csrr a5,vlenb csrr t1,vlenb srli a5,a5,1 srli a7,t1,1 addi a5,a5,-1 vsetvli a4,zero,e32,m2,ta,ma slli a3,a3,1 vmv.v.x v2,a5 vid.v v18 vmv.v.x v6,a1 vand.vi v10,v18,-2 vand.vi v0,v18,1 vadd.vi v16,v10,1 vmseq.vi v0,v0,1 vand.vv v10,v10,v2 vand.vv v16,v16,v2 slli t1,t1,1 vsetvli zero,a4,e32,m2,ta,ma neg t3,a7 viota.m v4,v0 vsetvli a4,zero,e32,m2,ta,mu vmv.v.x v8,a2 vrgather.vv v14,v6,v4 vrgather.vv v12,v8,v4 vmv.v.i v2,0 vrgather.vv v14,v8,v4,v0.t vrgather.vv v12,v6,v4,v0.t .L4: mv a2,a3 mv a5,a3 bleu a3,a7,.L3 mv a5,a7 .L3: vsetvli zero,a5,e32,m2,ta,ma vle32.v v6,0(a0) vsetvli a6,zero,e32,m2,ta,ma add a3,a3,t3 vrgather.vv v4,v6,v10 vrgather.vv v8,v6,v16 vsub.vv v4,v4,v12 add a0,a0,t1 vsetvli zero,a5,e32,m2,tu,ma vadd.vv v2,v2,v4 vmacc.vv v2,v14,v8 bgtu a2,a7,.L4 li a5,-1 vsetvli a6,zero,e32,m2,ta,ma li a4,0 vmv.v.i v4,0 vmul.vx v0,v18,a5 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vv v0,v0,v4 vand.vi v18,v18,1 vmerge.vvm v6,v4,v2,v0 vmseq.vv v18,v18,v4 vmv.s.x v1,a4 vmv1r.v v0,v18 vredsum.vs v6,v6,v1 vmerge.vvm v4,v4,v2,v0 vmv.x.s a0,v6 vredsum.vs v4,v4,v1 vmv.x.s a5,v4 addw a0,a0,a5 ret .L5: li a0,0 ret Note that this patch triggers multiple FAILs: FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test They failed are all because of bugs on VSETVL PASS: 10dd4: 0c707057vsetvli zero,zero,e8,mf2,ta,ma 10dd8: 5e06b8d7vmv.v.i v17,13 10ddc: 9ed030d7
Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
More details of VSETVL bug: Loop: 10ddc: 9ed030d7vmv1r.v v1,v13 10de0: b21040d7vncvt.x.x.w v1,v1 10de4: 5e0785d7vmv.v.v v11,v15 10de8: b700a5d7vmacc.vvv11,v1,v16 10dec: a6e8a0d7vmadd.vvv1,v17,v14 10df0: 26b7b5d7vand.vi v11,v11,15 10df4: 0c75f7d7vsetvli a5,a1,e8,mf2,ta,ma 10df8: 0c707557vsetvli a0,zero,e8,mf2,ta,ma 10dfc: 2617b0d7vand.vi v1,v1,15 10e00: 0c75f057vsetvli zero,a1,e8,mf2,ta,ma 10e04: 8d9dsub a1,a1,a5 10e06: 020705a7vse8.v v11,(a4) 10e0a: 0c77f057vsetvli zero,a5,e8,mf2,ta,ma 10e0e: 020685a7vse8.v v11,(a3) 10e12: 020600a7vse8.v v1,(a2) 10e16: 973eadd a4,a4,a5 10e18: 0c807557vsetvli a0,zero,e16,m1,ta,ma 10e1c: 96beadd a3,a3,a5 10e1e: 963eadd a2,a2,a5 10e20: 02d606d7vadd.vv v13,v13,v12 10e24: fdc5bneza1,10ddc The vncvt.x.x.w consume e16m1 VTYPE vsetvl but it shouldn't, it should be e8mf2. This issue is fixed by recent refactor patch. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-18 18:25 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations: 1. better LICM. 2. More opportunities of transforming 'vv' into 'vx' in the future. Before this patch: f3: ble a4,zero,.L8 csrrt0,vlenb sllit1,t0,4 csrra6,vlenb sub sp,sp,t1 csrra5,vlenb sllia6,a6,3 sllia5,a5,2 add a6,a6,sp vsetvli a7,zero,e16,m8,ta,ma sllia4,a4,3 vid.v v8 addit6,a5,-1 vand.vi v8,v8,-2 neg t5,a5 vs8r.v v8,0(sp) vadd.vi v8,v8,1 vs8r.v v8,0(a6) j .L4 .L12: vsetvli a7,zero,e16,m8,ta,ma .L4: csrrt0,vlenb sllit0,t0,3 vl8re16.v v16,0(sp) add t0,t0,sp vmv.v.x v8,t6 mv t1,a4 vand.vv v24,v16,v8 mv a6,a4 vl8re16.v v16,0(t0) vand.vv v8,v16,v8 bleua4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v20,0(a2) vle8.v v16,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v24 vadd.vv v4,v16,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v20,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtut1,a5,.L12 csrrt0,vlenb sllit1,t0,4 add sp,sp,t1 jr ra .L8: ret After this patch: f3: ble a4,zero,.L6 csrr a6,vlenb csrr a5,vlenb slli a6,a6,2 slli a5,a5,2 addi a6,a6,-1 slli a4,a4,3 neg t5,a5 vsetvli t1,zero,e16,m8,ta,ma vmv.v.x v24,a6 vid.v v8 vand.vi v8,v8,-2 vadd.vi v16,v8,1 vand.vv v8,v8,v24 vand.vv v16,v16,v24 .L4: mv t1,a4 mv a6,a4 bleu a4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v28,0(a2) vle8.v v24,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v28,v8 vadd.vv v4,v24,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v28,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v28,v16 vadd.vv v4,v4,v24 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtu t1,a5,.L4 .L6: ret Note that this patch triggers multiple FAILs: FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_loa
[PATCH] RISC-V: Add popcount fallback expander.
LGTM popcount patch. juzhe.zh...@rivai.ai
Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Hi, this patch fix V4 issue: Previously as Richard S commented: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html slp_op and mask_vectype are only initialised when mask_index >= 0. Shouldn't this code be under mask_index >= 0 too? Also, when do we encounter mismatched mask_vectypes? Presumably the SLP node has a known vectype by this point. I think a comment would be useful. Since I didn't encounter mismatched case in the regression of RISC-V and X86, so I fix it in V4 patch as follows: + if (mask_index >= 0 && slp_node) + { + bool match_p + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype); + gcc_assert (match_p); + } Add assertion here. However, recently an ICE suddenly appear today in RISC-V regression: FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885) FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors) This is because we are encountering that mask_vectype is boolean type and it is external def. Then vect_maybe_update_slp_op_vectype will return false. Then I fix this piece of code in V5 here: + if (mask_index >= 0 && slp_node + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype)) + { + /* We don't vectorize the boolean type external SLP mask. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"incompatible vector types for invariants\n"); + return false; + } Bootstrap and Regression on x86 passed. Thanks. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-18 20:36 To: gcc-patches CC: richard.sandiford; rguenther; Juzhe-Zhong Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma sllia6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrr t1,vlenb slli a5,a5,33 srli a7,t1,2 addi a5,a5,1 slli a3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minu a5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtu a6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto.
Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
Ping。Ok for trunk? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-26 19:35 To: gcc-patches CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; kito.cheng; pan2.li; Juzhe-Zhong Subject: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support From: Juzhe-Zhong This patch support FMA auto-vectorization pattern. 1. Let's RA decide vmacc or vmadd. 2. Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing ternop-3.c. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern. (*fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. (emit_vlmax_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto. * config/riscv/vector.md: Fix vimuladd instruction bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add ternary tests * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test. --- gcc/config/riscv/autovec.md | 65 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 20 gcc/config/riscv/vector.md| 2 +- .../riscv/rvv/autovec/ternop/ternop-1.c | 28 + .../riscv/rvv/autovec/ternop/ternop-2.c | 34 ++ .../riscv/rvv/autovec/ternop/ternop-3.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-1.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-2.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-3.c | 104 ++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + 11 files changed, 477 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 7fe4d94de39..04825df1210 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -373,3 +373,68 @@ DONE; } ) + +;; = +;; == Ternary arithmetic +;; = + +;; - +;; [INT] VMACC and VMADD +;; - +;; Includes: +;; - vmacc +;; - vmadd +;; - + +;; We can't expand FMA for the following reasons: +;; 1. Before RA, we don't know which multiply-add instruction is the ideal one. +;;The vmacc is the ideal instruction when operands[3] overlaps operands[0]. +;;The vmadd is the ideal instruction when operands[1|2] overlaps operands[0]. +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which +;;is the operands[5]. Since operands[5] should overlap operands[0], this operand +;;should be allocated the same regno as operands[1|2|3]. +;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined +;;operand. +;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand. +;; +;; In this situation, we design the codegen of FMA as follows: +;; 1. clobber a scratch in the expand pattern of FMA. +;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap operands[0]. +;; 3. Generate instructions (vmacc or vmadd) according to the register allocation +;;result after reload_completed. +(define_expand "fma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr")) + (match_operand:VI 3 "register_operand" " vr"))) + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (plus:VI + (mult:VI + (match_operand:VI 1 "register_operand
Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
This is existing bug in GCC 13. I think I should split into 2 patches. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-05-29 11:17 To: juzhe.zhong CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw; pan2.li Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support LGTM, but with one question. On Fri, May 26, 2023 at 7:36 PM wrote: > > From: Juzhe-Zhong > > This patch support FMA auto-vectorization pattern. > 1. Let's RA decide vmacc or vmadd. > 2. Fix bug of vector.md which generate incorrect information to VSETVL >PASS when testing ternop-3.c. Does this bug also appear in GCC 13? or this is new bug introduced at trunk
Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
This patch is fixing VSETVL PASS bug. Ok for trunk ? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-26 11:01 To: gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization From: Juzhe-Zhong Fix bug reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974 PR target/109974 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109974.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 30 ++- .../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 9847d649d1d..fe55f4ccd30 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2) return false; if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2))) return false; - gcc_assert (insn1->uses ().size () == insn2->uses ().size ()); + /* RTL_SSA uses include REG_NOTE. Consider this following case: + + insn1 RTL: + (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159]) + (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201]) + (reg:DI 14 a4 [276]))) 408 {*umindi3} + (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201]) + (const_int 2 [0x2])) + (nil))) + The RTL_SSA uses of this instruction has 2 uses: + 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice. + 2. (reg:DI 14 a4 [276]) - once. + + insn2 RTL: + (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160]) + (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199]) + (reg:DI 14 a4 [276]))) 408 {*umindi3} + (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) + (const_int 2 [0x2])) + (nil))) + The RTL_SSA uses of this instruction has 3 uses: + 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once + 2. (reg:DI 14 a4 [276]) - once + 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once + + Return false when insn1->uses ().size () != insn2->uses ().size () + */ + if (insn1->uses ().size () != insn2->uses ().size ()) +return false; for (size_t i = 0; i < insn1->uses ().size (); i++) if (insn1->uses ()[i] != insn2->uses ()[i]) return false; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c new file mode 100644 index 000..06a8562ebab --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3" } */ + +#include + +void +func (int8_t *__restrict x, int64_t *__restrict y, int n) +{ + for (int i = 0, j = 0; i < n; i++, j +=2 ) + { +x[i + 0] += 1; +y[j + 0] += 1; +y[j + 1] += 2; + } +} + +/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ -- 2.36.3
Re: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
Yes. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-05-29 12:36 To: juzhe.zh...@rivai.ai CC: Kito.cheng; Robin Dapp; gcc-patches; jeffreyalaw; palmer; palmer; pan2.li Subject: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization Ok, and just make sure this only appear for trunk, right? juzhe.zh...@rivai.ai 於 2023年5月29日 週一,12:19寫道: This patch is fixing VSETVL PASS bug. Ok for trunk ? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-26 11:01 To: gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization From: Juzhe-Zhong Fix bug reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974 PR target/109974 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109974.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 30 ++- .../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 9847d649d1d..fe55f4ccd30 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2) return false; if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2))) return false; - gcc_assert (insn1->uses ().size () == insn2->uses ().size ()); + /* RTL_SSA uses include REG_NOTE. Consider this following case: + + insn1 RTL: + (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159]) + (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201]) + (reg:DI 14 a4 [276]))) 408 {*umindi3} + (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201]) + (const_int 2 [0x2])) + (nil))) + The RTL_SSA uses of this instruction has 2 uses: + 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice. + 2. (reg:DI 14 a4 [276]) - once. + + insn2 RTL: + (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160]) + (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199]) + (reg:DI 14 a4 [276]))) 408 {*umindi3} + (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) + (const_int 2 [0x2])) + (nil))) + The RTL_SSA uses of this instruction has 3 uses: + 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once + 2. (reg:DI 14 a4 [276]) - once + 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once + + Return false when insn1->uses ().size () != insn2->uses ().size () + */ + if (insn1->uses ().size () != insn2->uses ().size ()) +return false; for (size_t i = 0; i < insn1->uses ().size (); i++) if (insn1->uses ()[i] != insn2->uses ()[i]) return false; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c new file mode 100644 index 000..06a8562ebab --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3" } */ + +#include + +void +func (int8_t *__restrict x, int64_t *__restrict y, int n) +{ + for (int i = 0, j = 0; i < n; i++, j +=2 ) + { +x[i + 0] += 1; +y[j + 0] += 1; +y[j + 1] += 2; + } +} + +/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ -- 2.36.3
Re: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support
Hi, this patch is same implementation as FMA which has been merged. Ok for trunk? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-29 14:53 To: gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support From: Juzhe-Zhong Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support. gcc/ChangeLog: * config/riscv/autovec.md (fnma4): New pattern. (*fnma): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test. --- gcc/config/riscv/autovec.md | 45 .../riscv/rvv/autovec/ternop/ternop-4.c | 28 + .../riscv/rvv/autovec/ternop/ternop-5.c | 34 ++ .../riscv/rvv/autovec/ternop/ternop-6.c | 33 ++ .../riscv/rvv/autovec/ternop/ternop_run-4.c | 84 ++ .../riscv/rvv/autovec/ternop/ternop_run-5.c | 104 ++ .../riscv/rvv/autovec/ternop/ternop_run-6.c | 104 ++ 7 files changed, 432 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index eff3e484fb4..a1028d71467 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -606,3 +606,48 @@ } [(set_attr "type" "vimuladd") (set_attr "mode" "")]) + +;; - +;; [INT] VNMSAC and VNMSUB +;; - +;; Includes: +;; - vnmsac +;; - vnmsub +;; - + +(define_expand "fnma4" + [(parallel +[(set (match_operand:VI 0 "register_operand" "=vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr") + (mult:VI + (match_operand:VI 1 "register_operand" " vr") + (match_operand:VI 2 "register_operand" " vr" + (clobber (match_scratch:SI 4))])] + "TARGET_VECTOR" + {}) + +(define_insn_and_split "*fnma" + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") + (minus:VI + (match_operand:VI 3 "register_operand" " vr, 0, vr") + (mult:VI + (match_operand:VI 1 "register_operand" " %0, vr, vr") + (match_operand:VI 2 "register_operand" " vr, vr, vr" + (clobber (match_scratch:SI 4 "=r,r,r"))] + "TARGET_VECTOR" + "#" + "&& reload_completed" + [(const_int 0)] + { +PUT_MODE (operands[4], Pmode); +riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); +if (which_alternative == 2) + emit_insn (gen_rtx_SET (operands[0], operands[3])); +rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul (mode), +riscv_vector::RVV_TERNOP, ops, operands[4]); +DONE; + } + [(set_attr "type" "vimuladd") + (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c new file mode 100644 index 000..22d11de89a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */ + +#include + +#define TEST_TYPE(TYPE) \ + __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ + TYPE *__restrict a, \ + TYPE *__restrict b, int n) \ + { \ +for (int i = 0; i < n; i++) \ + dst[i] += -(a[i] *
Re: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support
Ok for trunk ? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-29 12:35 To: gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support From: Juzhe-Zhong Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion now since it's not depending on FRM and we don't need mode switching support for this ('rtz' conversions independent FRM). gcc/ChangeLog: * config/riscv/autovec.md (2): New pattern. * config/riscv/iterators.md: New attribute. * config/riscv/vector-iterators.md: New attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test. --- gcc/config/riscv/autovec.md | 23 gcc/config/riscv/iterators.md | 4 +- gcc/config/riscv/vector-iterators.md | 5 ++ .../rvv/autovec/conversions/vfcvt_rtz-run.c | 52 +++ .../autovec/conversions/vfcvt_rtz-rv32gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-rv64gcv.c | 6 +++ .../autovec/conversions/vfcvt_rtz-template.h | 15 ++ 7 files changed, 110 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index b24867ae4d0..3989ffb26ee 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -478,6 +478,29 @@ DONE; }) +;; = +;; == Conversions +;; = + +;; - +;; [INT<-FP] Conversions +;; - +;; Includes: +;; - vfcvt.rtz.xu.f.v +;; - vfcvt.rtz.x.f.v +;; - + +(define_expand "2" + [(set (match_operand: 0 "register_operand") + (any_fix: + (match_operand:VF 1 "register_operand")))] + "TARGET_VECTOR" +{ + insn_code icode = code_for_pred (, mode); + riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands); + DONE; +}) + ;; = ;; == Unary arithmetic ;; = diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 8afe98e4410..d374a10810c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -225,7 +225,9 @@ (ss_minus "sssub") (us_minus "ussub") (sign_extend "extend") - (zero_extend "zero_extend")]) + (zero_extend "zero_extend") + (fix "fix_trunc") + (unsigned_fix "fixuns_trunc")]) ;; code attributes (define_code_attr or_optab [(ior "ior") diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 70fb5b80b1b..937ec3c7f67 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1208,6 +1208,11 @@ (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI") ]) +(define_mode_attr vconvert [ + (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si") + (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di") +]) + (define_mode_attr VNCONVERT [ (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI") (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF") diff --git a/gcc/testsuite/gcc.target/riscv/r
Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>> /* Return true if MODE is true VLS mode. */ >> bool >> vls_mode_p (machine_mode mode) >> { >> switch (mode) >> { >> case E_V4SImode: >> case E_V2DImode: >> case E_V8HImode: >> case E_V16QImode: >> return true; >> default: >> return false; >> } >> } To be consistent, you should put these into riscv-vector-switching.def. It can make the function easier extend,change it like this: change name into riscv_v_ext_vls_mode_p bool riscv_v_ext_vls_mode_p (machine_mode mode) { #define VLS_ENTRY(MODE, REQUIREMENT, ...) \ case MODE##mode: \ return REQUIREMENT; switch (mode) { #include "riscv-vector-switch.def" default: return false; } return false; } Then in riscv-vector-switch.def VLS_ENTRY (V4SI... VLS_ENTRY (V2DI.. ... In the future, we extend more VLS modes in riscv-vector-switch.def >>(define_insn_and_split "3" >> [(set (match_operand:VLS 0 "register_operand" "=vr") >> (any_int_binop_no_shift:VLS >>(match_operand:VLS 1 "register_operand" "vr") >>(match_operand:VLS 2 "register_operand" "vr")))] >> "TARGET_VECTOR" >> "#" >> "reload_completed" >> [(const_int 0)] >>+{ >> machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode); >> riscv_vector::vls_insn_expander ( >>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP, >>operands, mode, vla_mode); >> DONE; >>}) This pattern can work for current VLS modes so far since they are within 0~31, if we add more VLSmodes such as V32QImode, V64QImode, it can't work . I am ok with this, but I should remind you early. >> # VLS test >>gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vls/*.\[cS\]]] \ >> "" $CFLAGS >>Add tests with -march=rv64gcv_zvl256b to see whether your testcase can >>generate LMUL = mf2 vsetvliand -march=rv64gcv_zvl2048 make sure your testcase >>will not go into the VLS modes (2048 * 1 / 8 > 128) For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS modes.I wonder how these VLS modes emit correct VSETVL?For example in vector.md: (define_attr "sew" "" (cond [(eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI,\ VNx1BI,VNx2BI,VNx4BI,VNx8BI,VNx16BI,VNx32BI,VNx64BI,\ VNx128QI,VNx128BI,VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,\ VNx2x16QI,VNx3x16QI,VNx4x16QI,VNx5x16QI,VNx6x16QI,VNx7x16QI,VNx8x16QI,\ VNx2x8QI,VNx3x8QI,VNx4x8QI,VNx5x8QI,VNx6x8QI,VNx7x8QI,VNx8x8QI,\ VNx2x4QI,VNx3x4QI,VNx4x4QI,VNx5x4QI,VNx6x4QI,VNx7x4QI,VNx8x4QI,\ VNx2x2QI,VNx3x2QI,VNx4x2QI,VNx5x2QI,VNx6x2QI,VNx7x2QI,VNx8x2QI,\ VNx2x1QI,VNx3x1QI,VNx4x1QI,VNx5x1QI,VNx6x1QI,VNx7x1QI,VNx8x1QI") (const_int 8) (eq_attr "mode" "VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,\ VNx2x32HI,VNx2x16HI,VNx3x16HI,VNx4x16HI,\ VNx2x8HI,VNx3x8HI,VNx4x8HI,VNx5x8HI,VNx6x8HI,VNx7x8HI,VNx8x8HI,\ VNx2x4HI,VNx3x4HI,VNx4x4HI,VNx5x4HI,VNx6x4HI,VNx7x4HI,VNx8x4HI,\ VNx2x2HI,VNx3x2HI,VNx4x2HI,VNx5x2HI,VNx6x2HI,VNx7x2HI,VNx8x2HI,\ VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI") (const_int 16) (eq_attr "mode" "VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,\ VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,\ VNx2x16SI,VNx2x8SI,VNx3x8SI,VNx4x8SI,\ VNx2x4SI,VNx3x4SI,VNx4x4SI,VNx5x4SI,VNx6x4SI,VNx7x4SI,VNx8x4SI,\ VNx2x2SI,VNx3x2SI,VNx4x2SI,VNx5x2SI,VNx6x2SI,VNx7x2SI,VNx8x2SI,\ VNx2x1SI,VNx3x1SI,VNx4x1SI,VNx5x1SI,VNx6x1SI,VNx7x1SI,VNx8x1SI,\ VNx2x16SF,VNx2x8SF,VNx3x8SF,VNx4x8SF,\ VNx2x4SF,VNx3x4SF,VNx4x4SF,VNx5x4SF,VNx6x4SF,VNx7x4SF,VNx8x4SF,\ VNx2x2SF,VNx3x2SF,VNx4x2SF,VNx5x2SF,VNx6x2SF,VNx7x2SF,VNx8x2SF,\ VNx2x1SF,VNx3x1SF,VNx4x1SF,VNx5x1SF,VNx6x1SF,VNx7x1SF,VNx8x1SF") (const_int 32) (eq_attr "mode" "VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,\ VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,\ VNx2x8DI,VNx2x4DI,VNx3x4DI,VNx4x4DI,\ VNx2x2DI,VNx3x2DI,VNx4x2DI,VNx5x2DI,VNx6x2DI,VNx7x2DI,VNx8x2DI,\ VNx2x1DI,VNx3x1DI,VNx4x1DI,VNx5x1DI,VNx6x1DI,VNx7x1DI,VNx8x1DI,\ VNx2x8DF,VNx2x4DF,VNx3x4DF,VNx4x4DF,\ VNx2x2DF,VNx3x2DF,VNx4x2DF,VNx5x2DF,VNx6x2DF,VNx7x2DF,VNx8x2DF,\ VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF") (const_int 64)] (const_int INVALID_ATTRIBUTE))) juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-05-30 14:06 To: gcc-patches; palmer; kito.cheng; juzh
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
Ok. LGTM as long as you change the patch as I suggested. Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-05-30 14:51 To: juzhe.zh...@rivai.ai CC: gcc-patches; palmer; kito.cheng; jeffreyalaw; Robin Dapp; pan2.li Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V > >> /* Return true if MODE is true VLS mode. */ > >> bool > >> vls_mode_p (machine_mode mode) > >> { > >> switch (mode) > >> { > >> case E_V4SImode: > >> case E_V2DImode: > >> case E_V8HImode: > >> case E_V16QImode: > >> return true; > >> default: > >> return false; > >> } > >> } > > To be consistent, you should put these into riscv-vector-switching.def. > It can make the function easier extend,change it like this: > change name into riscv_v_ext_vls_mode_p > > bool > riscv_v_ext_vls_mode_p (machine_mode mode) > { > #define VLS_ENTRY(MODE, REQUIREMENT, ...) > \ > case MODE##mode: > \ > return REQUIREMENT; > switch (mode) > { > #include "riscv-vector-switch.def" > default: > return false; > } > return false; > } > > Then in riscv-vector-switch.def > VLS_ENTRY (V4SI... > VLS_ENTRY (V2DI.. > ... > In the future, we extend more VLS modes in riscv-vector-switch.def Good point, we should make this more consistent :) > >>(define_insn_and_split "3" > >> [(set (match_operand:VLS 0 "register_operand" "=vr") > >> (any_int_binop_no_shift:VLS > >> (match_operand:VLS 1 "register_operand" "vr") > >> (match_operand:VLS 2 "register_operand" "vr")))] > >> "TARGET_VECTOR" > >> "#" > >> "reload_completed" > >> [(const_int 0)] > >>+{ > >> machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode); > >> riscv_vector::vls_insn_expander ( > >>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP, > >>operands, mode, vla_mode); > >> DONE; > >>}) > > This pattern can work for current VLS modes so far since they are within > 0~31, if we add more VLSmodes such as V32QImode, V64QImode, > it can't work . I am ok with this, but I should remind you early. Yeah, I Know the problem, my thought is we will have another set of VLS patterns for those NUNITS >= 32, and require one clobber with GPR. > Add tests with -march=rv64gcv_zvl256b to see whether your testcase can > generate LMUL = mf2 vsetvli > > and -march=rv64gcv_zvl2048 make sure your testcase will not go into the VLS > modes (2048 * 1 / 8 > 128) I guess I should make a loop to test those combinations instead of spearted file but with different options. > > > For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS > modes. > > I wonder how these VLS modes emit correct VSETVL? That's the magic I made here, I split the pattern after RA, but before vsetvli, and convert all operands to VLA mode and use VLA pattern, so that we don't need to modify any line of vsetvli stuff.
Re: [PATCH] VECT: Add SELECT_VL support
Hi, this patch is bootstrapped PASS. Ok for trunk ? Thanks. juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-25 23:26 To: gcc-patches CC: richard.sandiford; rguenther; Ju-Zhe Zhong Subject: [PATCH] VECT: Add SELECT_VL support From: Ju-Zhe Zhong This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code: # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i -_36 = MIN_EXPR ; +_36 = (MIN_EXPR | SELECT_VL) ; ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... @@ -551,9 +551,14 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, /* Create decrement IV. */ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, insert_after, &index_before_incr, &index_after_incr); - gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, - index_before_incr, - nitems_step)); + tree len = NULL_TREE; + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + len = gimple_build (header_seq, IFN_SELECT_VL, iv_type, + index_before_incr, nitems_step); + else + len = gimple_build (header_seq, MIN_EXPR, iv_type, index_before_incr, + nitems_step); + gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len)); *iv_step = step; return index_after_incr; } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5b7a0da0034..f67340976c8 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), using_decrementing_iv_p (false), +using_select_vl_p (false), epil_using_partial_vectors_p (false), partial_load_store_bias (0), peeling_for_gaps (false), @@ -2737,6 +2738,14 @@ start_over: LOOP_VINFO_VECT_FACTOR (loop_vinfo LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; + /* If we're using decrement IV and SELECT_VL is supported by the target. + Use output of SELECT_VL to adjust IV of loop control and data reference. + Note: We only use SELECT_VL on single-rgroup control. */ + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + && LOOP_VINFO_LENS (loop_vinfo).length () == 1 + && !slp) +LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; + /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 127b987cd62..8e8b0f71a4a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo, return iv_step; } +/* Prepare the pointer IVs which needs to be updated by a variable amount. + Such variable amount is the outcome of .SELECT_VL. In this case, we can + allow each iteration process the flexible number of elements as long as + the number <= vf elments. + + Return data reference according to SELECT_VL. + If new statements are needed, insert them before GSI. */ + +static tree +get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info, + tree aggr_type, class loop *at_loop, tree offset, + tree *dummy, gimple_stmt_iterator *gsi, + bool simd_lane_access_p, vec_loop_lens *loop_lens, + dr_vec_info *dr_info, + vect_memory_access_type memory_access_type) +{ + loop_vec_info loop_vinfo = dyn_cast (vinfo); + tree step = vect_dr_behavior (vinfo, dr_info)->step; + + /* TODO: We don't support gather/scatter or load_lanes/store_lanes for pointer + IVs are updated by variable amount but we will support them in the future. + */ + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER + && memory_access_type != VMAT_LOAD_STORE_LANES); + + /* When we support SELECT_VL pattern, we dynamic adjust + the memory address by .SELECT_VL result. + + The result of .SELECT_VL is the number of elements to + be processed of each iteration. So the memory address + adjustment operation should be: + + bytesize = GET_MODE_SIZE (element_mode (aggr_type)); + addr = addr + .SELECT_VL (ARG..) * bytesize; + */ + gimple *ptr_incr; + tree loop_len += vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0, 0); + tree len_type = TREE_TYPE (loop_len); + poly_uint64 bytesize = GET_MODE_SIZE (
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>> why is the conversion after register allocation always >> safe? I do worry about this issue too. I just notice : + case MEM: + operands[i] = change_address (operands[i], vla_mode, NULL_RTX); I am not sure whether it is safe. >> Couldn't we "lower" the fixed-length vectors to VLA at some point and >> how does everything relate to fixed-vlmax? I can answer you why we need this patch (I call it fixed-vlmin). You can take a look at this example: https://godbolt.org/z/3jYqoM84h This is how LLVM works. This example, you can see GCC need --param=riscv-autovec-preference=fixed-vlmax -march=rv64gcv (same as mrvv-vector-bits=128). However, LLVM doesn't need to specify the vector-length. The benefits: 1. We don't need to specify actual real vector length, then we can vectorize this example. 2. GCC codegen can only run on CPU with vector length=128. However, LLVM can run on any RVV CPU with vector length >= 128. Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-05-30 15:27 To: Kito Cheng; gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; pan2.li CC: rdapp.gcc Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V Hi Kito, > GNU vector extensions is widly used around this world, and this patch > enable that with RISC-V vector extensions, this can help people > leverage existing code base with RVV, and also can write vector programs in a > familiar way. > > The idea of VLS code gen support is emulate VLS operation by VLA operation > with > specific length. > > Key design point is we defer the mode conversion (From VLS to VLA mode) after > register allocation, it come with several advantages: > - VLS pattern is much friendly for most optimization pass like combine. > - Register allocator can spill/restore exact size of VLS type instead of > whole register. > > This is compatible with VLA vectorization. > > Only support move and binary part of operation patterns. On a high-level: Why do we need to do it this way and not any other way? :) Some more comments/explanations would definitely help, i.e. prior art on aarch64, what exactly is easier for combine and friends now (no undef and so on) and, importantly, why is the conversion after register allocation always safe? Couldn't we "lower" the fixed-length vectors to VLA at some point and how does everything relate to fixed-vlmax? Essentially this is a "separate" backend similar to ARM NEON but we share most of the things and possibly grow it in the future? What would the alternative be? That said, couldn't we reuse the existing binop tests? If you don't like them change the existing ones as well and reuse then? > +/* Return the minimal containable VLA mode for MODE. */ > + > +machine_mode > +minimal_vla_mode (machine_mode mode) > +{ > + gcc_assert (GET_MODE_NUNITS (mode).is_constant ()); > + unsigned type_size = GET_MODE_NUNITS (mode).to_constant (); Couldn't you use .require () right away? Same in some other hunks. Regards Robin
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
Hi, Richi. >> but ideally the user would be able to specify -mrvv-size=32 for an >> implementation with 32 byte vectors and then vector lowering would make use >> of vectors up to 32 bytes? Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on GNU vectors. You can take a look this example: https://godbolt.org/z/3jYqoM84h GCC need to specify the mrvv size to enable GNU vectors and the codegen only can run on CPU with vector-length = 128bit. However, LLVM doesn't need to specify the vector length, and the codegen can run on any CPU with RVV vector-length >= 128 bits. This is what this patch want to do. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 15:13 To: Kito Cheng CC: gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; rdapp.gcc; pan2.li Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V On Tue, May 30, 2023 at 8:07 AM Kito Cheng via Gcc-patches wrote: > > GNU vector extensions is widly used around this world, and this patch > enable that with RISC-V vector extensions, this can help people > leverage existing code base with RVV, and also can write vector programs in a > familiar way. > > The idea of VLS code gen support is emulate VLS operation by VLA operation > with > specific length. In the patch you added fixed 16 bytes vector modes, correct? I've never looked at how ARM deals with the GNU vector extensions but I suppose they get mapped to NEON and not SVE so basically behave the same way here. But I do wonder about the efficiency for RVV where there doesn't exist a complementary fixed-length ISA. Shouldn't vector lowering (tree-vect-generic.cc) be enhanced to support lowering fixed-length vectors to variable length ones with (variable) fixed length instead? From your patch I second-guess the RVV specification requires 16 byte vectors to be available (or will your patch split the insns?) but ideally the user would be able to specify -mrvv-size=32 for an implementation with 32 byte vectors and then vector lowering would make use of vectors up to 32 bytes? Also vector lowering will split smaller vectors not equal to the fixed size to scalars unless you add all fixed length modes smaller than 16 bytes as well. > Key design point is we defer the mode conversion (From VLS to VLA mode) after > register allocation, it come with several advantages: > - VLS pattern is much friendly for most optimization pass like combine. > - Register allocator can spill/restore exact size of VLS type instead of > whole register. > > This is compatible with VLA vectorization. > > Only support move and binary part of operation patterns. > > gcc/ChangeLog: > > * config/riscv/riscv-modes.def: Introduce VLS modes. > * config/riscv/riscv-protos.h (riscv_vector::minimal_vls_mode): New. > (riscv_vector::vls_insn_expander): New. > (riscv_vector::vls_mode_p): New. > * config/riscv/riscv-v.cc (riscv_vector::minimal_vls_mode): New. > (riscv_vector::vls_mode_p): New. > (riscv_vector::vls_insn_expander): New. > (riscv_vector::update_vls_mode): New. > * config/riscv/riscv.cc (riscv_v_ext_mode_p): New. > (riscv_v_adjust_nunits): Handle VLS type. > (riscv_hard_regno_nregs): Ditto. > (riscv_hard_regno_mode_ok): Ditto. > (riscv_regmode_natural_size): Ditto. > * config/riscv/vector-iterators.md (VLS): New. > (VM): Handle VLS type. > (vel): Ditto. > * config/riscv/vector.md: Include vector-vls.md. > * config/riscv/vector-vls.md: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/rvv.exp: Add vls folder. > * gcc.target/riscv/rvv/vls/binop-template.h: New test. > * gcc.target/riscv/rvv/vls/binop-v.c: New test. > * gcc.target/riscv/rvv/vls/binop-zve32x.c: New test. > * gcc.target/riscv/rvv/vls/binop-zve64x.c: New test. > * gcc.target/riscv/rvv/vls/move-template.h: New test. > * gcc.target/riscv/rvv/vls/move-v.c: New test. > * gcc.target/riscv/rvv/vls/move-zve32x.c: New test. > * gcc.target/riscv/rvv/vls/move-zve64x.c: New test. > * gcc.target/riscv/rvv/vls/load-store-template.h: New test. > * gcc.target/riscv/rvv/vls/load-store-v.c: New test. > * gcc.target/riscv/rvv/vls/load-store-zve32x.c: New test. > * gcc.target/riscv/rvv/vls/load-store-zve64x.c: New test. > * gcc.target/riscv/rvv/vls/vls-types.h: New test. > --- > gcc/config/riscv/riscv-modes.def | 3 + > gcc/config/riscv/riscv-protos.h | 4 ++ > gcc/config/riscv/riscv-v.cc | 67 +++ > gcc/config/riscv/riscv.cc | 27 +++- &g
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
In the future, we will definitely mixing VLA and VLS-vlmin together in a codegen and it will not cause any issues. For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am not sure since my SELECT_VL patch is not finished, I will check if can work when I am working in SELECT_VL patch). >> In general I don't have a good overview of which optimizations we gain by >> such an approach or rather which ones are prevented by VLA altogether? These patches VLS modes can help for SLP auto-vectorization. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-05-30 17:05 To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V >>> but ideally the user would be able to specify -mrvv-size=32 for an >>> implementation with 32 byte vectors and then vector lowering would make use >>> of vectors up to 32 bytes? > > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on > GNU vectors. > You can take a look this example: > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> > > GCC need to specify the mrvv size to enable GNU vectors and the codegen only > can run on CPU with vector-length = 128bit. > However, LLVM doesn't need to specify the vector length, and the codegen can > run on any CPU with RVV vector-length >= 128 bits. > > This is what this patch want to do. > > Thanks. I think Richard's question was rather if it wasn't better to do it more generically and lower vectors to what either the current cpu or what the user specified rather than just 16-byte vectors (i.e. indeed a fixed vlmin and not a fixed vlmin == fixed vlmax). This patch assumes everything is fixed for optimization purposes and then switches over to variable-length when nothing can be changed anymore. That is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime? We would need to make sure that no pass after reload makes use of VLA properties at all. In general I don't have a good overview of which optimizations we gain by such an approach or rather which ones are prevented by VLA altogether? What's the idea for the future? Still use LEN_LOAD et al. (and masking) with "fixed vlmin"? Wouldn't we select different IVs with this patch than what we would have for pure VLA? Regards Robin
Re: Re: decremnt IV patch create fails on PowerPC
Ok. It seems that for this conditions: + /* If we're vectorizing a loop that uses length "controls" and + can iterate more than once, we apply decrementing IV approach + in loop control. */ + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (loop_vinfo +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; I should add direct_supportted_p (SELECT_VL...) to this is that right? I have send SELECT_VL patch. I will add this in next SELECT_VL patch. Let's wait Richard's more comments. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:22 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford; linkw Subject: Re: Re: decremnt IV patch create fails on PowerPC On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote: > Hi, Richi. Thanks for your analysis and helps. > > >> We could simply retain the original > >> incrementing IV for loop control and add the decrementing > >> IV for computing LEN in addition to that and leave IVOPTs > >> sorting out to eventually merge them (or not). > > I am not sure how to do that. Could you give me more informations? > > I somehow understand your concern is that variable amount of IV will make > IVOPT fails. > > I have seen similar situation in LLVM (when apply variable IV, > they failed to interleave the vectorize code). I am not sure whether they > are the same reason for that. > > For RVV, we not only want decrement IV style in vectorization but also > we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM > also only apply get_vector_length in single vector length). > > >>You can do some testing with a cross compiler, alternatively > >>there are powerpc machines in the GCC compile farm. > > It seems that Power is ok with decrement IV since most cases are improved. Well, but Power never will have SELECT_VL so at least for !SELECT_VL targets you should avoid having an IV with variable decrement. As I said it should be easy to rewrite decrement IV to use a constant increment (when not using SELECT_VL) and testing the pre-decrement value in the exit test. Richard. > I think Richard may help to explain decrement IV more clearly. > > Thanks > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-05-26 14:46 > To: ??? > CC: gcc-patches; richard.sandiford; linkw > Subject: Re: decremnt IV patch create fails on PowerPC > On Fri, 26 May 2023, ??? wrote: > > > Yesterday's patch has been approved (decremnt IV support): > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html > > > > However, it creates fails on PowerPC: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 > > > > I am really sorry for causing inconvinience. > > > > I wonder as we disccussed: > > + /* If we're vectorizing a loop that uses length "controls" and > > + can iterate more than once, we apply decrementing IV approach > > + in loop control. */ > > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > > + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), > > + LOOP_VINFO_VECT_FACTOR (loop_vinfo > > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; > > > > This conditions can not disable decrement IV on PowerPC. > > Should I add a target hook for it? > > No. I've put some analysis in the PR. To me the question is > why (without that SELECT_VL case) we need a decrementing IV > _for the loop control_? We could simply retain the original > incrementing IV for loop control and add the decrementing > IV for computing LEN in addition to that and leave IVOPTs > sorting out to eventually merge them (or not). > > Alternatively avoid the variable decrement as I wrote in the > PR and do the exit test based on the previous IV value. > > But as said all this won't work for the SELECT_VL case, but > then it's availability is something to key off rather than a > new target hook? > > > The patch I can only do bootstrap and regression on X86. > > I didn't have an environment to test PowerPC. I am really sorry. > > You can do some testing with a cross compiler, alternatively > there are powerpc machines in the GCC compile farm. > > Richard. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>> For the future it would be then good to have the vectorizer >>re-vectorize loops with >>VLS vector uses to VLA style? Not really, this patch is just using a magic convert VLS vector into VLA stype since it can avoid defining the RVV patterns with VLS modes and avoid a lot of work. There is no benefits in case of convert VLS into VLS And I don't even consider it's safe. especially this code: + case MEM: + operands[i] = change_address (operands[i], vla_mode, NULL_RTX); I feel it is unsafe code. Actually, my original plan is to define new RVV patterns with new VLS modes (The patterns are same as VLA patterns, just modes are different). Then emit codegen this VLS RVV patterns. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:29 To: juzhe.zh...@rivai.ai CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai wrote: > > In the future, we will definitely mixing VLA and VLS-vlmin together in a > codegen and it will not cause any issues. > For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am > not sure since my SELECT_VL patch is not > finished, I will check if can work when I am working in SELECT_VL patch). For the future it would be then good to have the vectorizer re-vectorize loops with VLS vector uses to VLA style? I think there's a PR with a draft patch from a few years ago attached (from me) somewhere. Currently the vectorizer will give up when seeing vector operations in a loop but ideally those should simply be SLPed. > >> In general I don't have a good overview of which optimizations we gain by > >> such an approach or rather which ones are prevented by VLA altogether? > These patches VLS modes can help for SLP auto-vectorization. > > ____ > juzhe.zh...@rivai.ai > > > From: Robin Dapp > Date: 2023-05-30 17:05 > To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng > CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li > Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V > >>> but ideally the user would be able to specify -mrvv-size=32 for an > >>> implementation with 32 byte vectors and then vector lowering would make > >>> use > >>> of vectors up to 32 bytes? > > > > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization > > on GNU vectors. > > You can take a look this example: > > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> > > > > GCC need to specify the mrvv size to enable GNU vectors and the codegen > > only can run on CPU with vector-length = 128bit. > > However, LLVM doesn't need to specify the vector length, and the codegen > > can run on any CPU with RVV vector-length >= 128 bits. > > > > This is what this patch want to do. > > > > Thanks. > I think Richard's question was rather if it wasn't better to do it more > generically and lower vectors to what either the current cpu or what the > user specified rather than just 16-byte vectors (i.e. indeed a fixed > vlmin and not a fixed vlmin == fixed vlmax). > > This patch assumes everything is fixed for optimization purposes and then > switches over to variable-length when nothing can be changed anymore. That > is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime? > We would need to make sure that no pass after reload makes use of VLA > properties at all. > > In general I don't have a good overview of which optimizations we gain by > such an approach or rather which ones are prevented by VLA altogether? > What's the idea for the future? Still use LEN_LOAD et al. (and masking) > with "fixed vlmin"? Wouldn't we select different IVs with this patch than > what we would have for pure VLA? > > Regards > Robin >
Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
I think I prefer doing VLS mode like these: This is current VLA patterns: (define_insn "@pred_" [(set (match_operand:VI 0 "register_operand" "=vd, vd, vr, vr, vd, vd, vr, vr, vd, vd, vr, vr") (if_then_else:VI (unspec: [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1") (match_operand 5 "vector_length_operand"" rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") (match_operand 6 "const_int_operand"" i, i, i, i, i, i, i, i, i, i, i, i") (match_operand 7 "const_int_operand"" i, i, i, i, i, i, i, i, i, i, i, i") (match_operand 8 "const_int_operand"" i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (any_int_binop:VI (match_operand:VI 3 "" "") (match_operand:VI 4 "" "")) (match_operand:VI 2 "vector_merge_operand" "vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))] "TARGET_VECTOR" "@ v.vv\t%0,%3,%4%p1 v.vv\t%0,%3,%4%p1 v.vv\t%0,%3,%4%p1 v.vv\t%0,%3,%4%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1 v\t%0,%p1" [(set_attr "type" "") (set_attr "mode" "")]) (define_mode_iterator VI [ (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128") (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128") (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64") (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") ]) You can see there is no VLS modes in "VI". Now to support VLS, I think we should extend "VI" iterator: (define_mode_iterator VI [ (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128") (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128") (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64") (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") V4SI V2DI V8HI V16QI ]) Then codegen directly to this VLS patterns without any conversion. This is the safe way to deal with VLS patterns. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:29 To: juzhe.zh...@rivai.ai CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai wrote: > > In the future, we will definitely mixing VLA and VLS-vlmin together in a > codegen and it will not cause any issues. > For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am > not sure since my SELECT_VL patch is not > finished, I will check if can work when I am working in SELECT_VL patch). For the future it would be then good to have the vectorizer re-vectorize loops with VLS vector uses to VLA style? I think there's a PR with a draft patch from a few years ago attached (from me) somewhere. Currently the vectorizer will give up when seeing vector operations in a loop but ideally those should simply be SLPed. > >> In general I don't have a good overview of which optimizations we gain by > >> such an approach or rather which ones are prevented by VLA altogether? > These patches VLS modes can help for SLP auto-vectorization. > > > juzhe.zh...@rivai.ai > > > From: Robin Dapp > Date: 2023-05-30 17:05 > To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng > CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li > Subject: Re: [PATCH] RISC-V: Basic V
Re: Re: decremnt IV patch create fails on PowerPC
>> No, since powerpc is fine with decrementing VL it should also use it. >>Instead you should make sure to produce SCEV analyzable IVs when >>possible (when SELECT_VL is not or cannot be used). Ok. Would you mind giving me the guideline how to rewrite the decrement IV? Since I am not familiar with SCEV and I am not sure how to do that SCEV can analysis the decrement IV. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:50 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford; linkw Subject: Re: Re: decremnt IV patch create fails on PowerPC On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote: > Ok. > > It seems that for this conditions: > > + /* If we're vectorizing a loop that uses length "controls" and > + can iterate more than once, we apply decrementing IV approach > + in loop control. */ > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), > + LOOP_VINFO_VECT_FACTOR (loop_vinfo > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; > > I should add direct_supportted_p (SELECT_VL...) to this is that right? No, since powerpc is fine with decrementing VL it should also use it. Instead you should make sure to produce SCEV analyzable IVs when possible (when SELECT_VL is not or cannot be used). Richard. > I have send SELECT_VL patch. I will add this in next SELECT_VL patch. > > Let's wait Richard's more comments. > > Thanks. > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-05-30 17:22 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; richard.sandiford; linkw > Subject: Re: Re: decremnt IV patch create fails on PowerPC > On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote: > > > Hi, Richi. Thanks for your analysis and helps. > > > > >> We could simply retain the original > > >> incrementing IV for loop control and add the decrementing > > >> IV for computing LEN in addition to that and leave IVOPTs > > >> sorting out to eventually merge them (or not). > > > > I am not sure how to do that. Could you give me more informations? > > > > I somehow understand your concern is that variable amount of IV will make > > IVOPT fails. > > > > I have seen similar situation in LLVM (when apply variable IV, > > they failed to interleave the vectorize code). I am not sure whether they > > are the same reason for that. > > > > For RVV, we not only want decrement IV style in vectorization but also > > we want to apply SELECT_VL in single-rgroup which is most happen cases > > (LLVM also only apply get_vector_length in single vector length). > > > > >>You can do some testing with a cross compiler, alternatively > > >>there are powerpc machines in the GCC compile farm. > > > > It seems that Power is ok with decrement IV since most cases are improved. > > Well, but Power never will have SELECT_VL so at least for !SELECT_VL > targets you should avoid having an IV with variable decrement. As > I said it should be easy to rewrite decrement IV to use a constant > increment (when not using SELECT_VL) and testing the pre-decrement > value in the exit test. > > Richard. > > I think Richard may help to explain decrement IV more clearly. > > > > Thanks > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-05-26 14:46 > > To: ??? > > CC: gcc-patches; richard.sandiford; linkw > > Subject: Re: decremnt IV patch create fails on PowerPC > > On Fri, 26 May 2023, ??? wrote: > > > > > Yesterday's patch has been approved (decremnt IV support): > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html > > > > > > However, it creates fails on PowerPC: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 > > > > > > I am really sorry for causing inconvinience. > > > > > > I wonder as we disccussed: > > > + /* If we're vectorizing a loop that uses length "controls" and > > > + can iterate more than once, we apply decrementing IV approach > > > + in loop control. */ > > > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > > > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > > > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > > &
Re: Re: decremnt IV patch create fails on PowerPC
>> No, I said the current scheme does sth along >> do { >>remain -= MIN (vf, remain); >> } while (remain != 0); >> and I suggest to instead do >> do { >>old_remain = remain; >>len = MIN (vf, remain); >>remain -= vf; >> } while (old_remain >= vf); >> basically since only the last iteration will have len < vf we can >> ignore that remain -= vf will underflow there if we appropriately >> rewrite the exit test to use the pre-decrement value. Oh, I understand you now. I will definitely have a try and send a patch. Thank you so much. By the way, could you take a look at SELECT_VL patch? I guess you want to defer it to Richard and I will wait but still I think your comment is very important. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 18:00 To: Kewen.Lin CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford Subject: Re: decremnt IV patch create fails on PowerPC On Tue, 30 May 2023, Kewen.Lin wrote: > on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote: > > Ok. > > > > It seems that for this conditions: > > > > + /* If we're vectorizing a loop that uses length "controls" and > > + can iterate more than once, we apply decrementing IV approach > > + in loop control. */ > > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > > + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), > > + LOOP_VINFO_VECT_FACTOR (loop_vinfo > > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; > > > > > > I should add direct_supportted_p (SELECT_VL...) to this is that right? > > I guess no, with this condition any targets without SELECT_VL are unable > to leverage the new decrement scheme for lengths, as your reply in PR109971 > you didn't meant to disable it. IIUC, what Richi suggested is to introduce > one new IV just like the previous one which has non-variable step, then it's > SCEV-ed and some analysis based on it can do a good job. No, I said the current scheme does sth along do { remain -= MIN (vf, remain); } while (remain != 0); and I suggest to instead do do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); basically since only the last iteration will have len < vf we can ignore that remain -= vf will underflow there if we appropriately rewrite the exit test to use the pre-decrement value. > Since this is mainly for targets without SELECT_VL capability, I can follow > up this if you don't mind. > > BR, > Kewen > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Re: Re: decremnt IV patch create fails on PowerPC
Hi, Richi. I have send patch by following your suggestion and change the decrement IV follow: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html It works well in RVV. Could you take a look at it? If it's ok, I will send patch of SELECT_VL base on this. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:50 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford; linkw Subject: Re: Re: decremnt IV patch create fails on PowerPC On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote: > Ok. > > It seems that for this conditions: > > + /* If we're vectorizing a loop that uses length "controls" and > + can iterate more than once, we apply decrementing IV approach > + in loop control. */ > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), > + LOOP_VINFO_VECT_FACTOR (loop_vinfo > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; > > I should add direct_supportted_p (SELECT_VL...) to this is that right? No, since powerpc is fine with decrementing VL it should also use it. Instead you should make sure to produce SCEV analyzable IVs when possible (when SELECT_VL is not or cannot be used). Richard. > I have send SELECT_VL patch. I will add this in next SELECT_VL patch. > > Let's wait Richard's more comments. > > Thanks. > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-05-30 17:22 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; richard.sandiford; linkw > Subject: Re: Re: decremnt IV patch create fails on PowerPC > On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote: > > > Hi, Richi. Thanks for your analysis and helps. > > > > >> We could simply retain the original > > >> incrementing IV for loop control and add the decrementing > > >> IV for computing LEN in addition to that and leave IVOPTs > > >> sorting out to eventually merge them (or not). > > > > I am not sure how to do that. Could you give me more informations? > > > > I somehow understand your concern is that variable amount of IV will make > > IVOPT fails. > > > > I have seen similar situation in LLVM (when apply variable IV, > > they failed to interleave the vectorize code). I am not sure whether they > > are the same reason for that. > > > > For RVV, we not only want decrement IV style in vectorization but also > > we want to apply SELECT_VL in single-rgroup which is most happen cases > > (LLVM also only apply get_vector_length in single vector length). > > > > >>You can do some testing with a cross compiler, alternatively > > >>there are powerpc machines in the GCC compile farm. > > > > It seems that Power is ok with decrement IV since most cases are improved. > > Well, but Power never will have SELECT_VL so at least for !SELECT_VL > targets you should avoid having an IV with variable decrement. As > I said it should be easy to rewrite decrement IV to use a constant > increment (when not using SELECT_VL) and testing the pre-decrement > value in the exit test. > > Richard. > > I think Richard may help to explain decrement IV more clearly. > > > > Thanks > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-05-26 14:46 > > To: ??? > > CC: gcc-patches; richard.sandiford; linkw > > Subject: Re: decremnt IV patch create fails on PowerPC > > On Fri, 26 May 2023, ??? wrote: > > > > > Yesterday's patch has been approved (decremnt IV support): > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html > > > > > > However, it creates fails on PowerPC: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 > > > > > > I am really sorry for causing inconvinience. > > > > > > I wonder as we disccussed: > > > + /* If we're vectorizing a loop that uses length "controls" and > > > + can iterate more than once, we apply decrementing IV approach > > > + in loop control. */ > > > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > > > + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () > > > + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 > > > + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > > +&
Re: Re: [PATCH] VECT: Change flow of decrement IV
Before this patch: foo: ble a2,zero,.L5 csrr a3,vlenb srli a4,a3,2 .L3: minu a5,a2,a4 vsetvli zero,a5,e32,m1,ta,ma vle32.v v2,0(a1) vle32.v v1,0(a0) vsetvli t1,zero,e32,m1,ta,ma vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a3 add a0,a0,a3 sub a2,a2,a5 bne a2,zero,.L3 .L5: ret After this patch: foo: ble a2,zero,.L5 csrr a3,vlenb srli a4,a3,2 neg a7,a4 -->>>additional instruction .L3: minu a5,a2,a4 vsetvli zero,a5,e32,m1,ta,ma vle32.v v2,0(a1) vle32.v v1,0(a0) vsetvli t1,zero,e32,m1,ta,ma mv a6,a2 -->>>additional instruction vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a3 add a0,a0,a3 add a2,a2,a7 bgtu a6,a4,.L3 .L5: ret There is 1 more instruction in preheader and 1 more instruction in loop. But I think it's OK for RVV since we will definitely be using SELECT_VL so this issue will gone. As long as this flow is better to power (SCEV)。 juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-05-30 19:31 To: juzhe.zhong CC: gcc-patches; rguenther; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Follow Richi's suggestion, I change current decrement IV flow from: > > do { >remain -= MIN (vf, remain); > } while (remain != 0); > > into: > > do { >old_remain = remain; >len = MIN (vf, remain); >remain -= vf; > } while (old_remain >= vf); > > to enhance SCEV. > > ALL tests (decrement IV) of RVV are passed. How does it affect RVV code quality? I thought you specifically chose the previous approach because code quality was better that way. Richard
Re: Re: [PATCH] VECT: Change flow of decrement IV
>> How does it affect RVV code quality? I thought you specifically chose >> the previous approach because code quality was better that way. Yes, previous way is better for RVV. But as I said, we will definitely use SELECT_VL then in SELECT_VL, we will using remain - step (produced by SELET_VL). juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-05-30 19:31 To: juzhe.zhong CC: gcc-patches; rguenther; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Follow Richi's suggestion, I change current decrement IV flow from: > > do { >remain -= MIN (vf, remain); > } while (remain != 0); > > into: > > do { >old_remain = remain; >len = MIN (vf, remain); >remain -= vf; > } while (old_remain >= vf); > > to enhance SCEV. > > ALL tests (decrement IV) of RVV are passed. How does it affect RVV code quality? I thought you specifically chose the previous approach because code quality was better that way. Richard
Re: Re: [PATCH] VECT: Change flow of decrement IV
Hi,all. I have posted my several investigations: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html Turns out when "niters is a constant value and vf is a constant value" This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase from IBM's testsuite for example) and I think this patch can fix IBM's cunroll issue. Even though it will produce a 'mv' instruction in some ohter cases for RVV, I think Gain > Pain overal. Actually, for current flow: step = MIN () ... remain = remain - step. I don't know how difficult to extend SCEV/IVOPTS to fix this issue. So, could you make a decision for this patch? I wonder whether we should apply the approach of this patch (the codes can be refined after well reviewed) or we should extend SCEV/IVOPTS ? Thanks. juzhe.zh...@rivai.ai From: 钟居哲 Date: 2023-05-30 23:05 To: rguenther CC: richard.sandiford; gcc-patches; linkw Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV More information of power's testcase: Before this patch: test_npeel_int16_t: lui a4,%hi(.LANCHOR0+130) lui a3,%hi(.LANCHOR1) addi a3,a3,%lo(.LANCHOR1) addi a4,a4,%lo(.LANCHOR0+130) li a5,58 li a2,16 vsetivli zero,16,e16,m1,ta,ma vl1re16.v v3,0(a3) vid.v v1 .L5: minu a3,a5,a2 vsetvli zero,a3,e16,m1,ta,ma sub a5,a5,a3 vse16.v v1,0(a4) vsetivli zero,16,e16,m1,ta,ma addi a4,a4,32 vadd.vv v1,v1,v3 bne a5,zero,.L5 ret After this patch: test_npeel_int16_t: lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) li a1,16 vsetivli zero,16,e16,m1,ta,ma addi a2,a5,130 vid.v v1 addi a3,a5,162 vadd.vx v4,v1,a1 addi a4,a5,194 li a1,32 vadd.vx v3,v1,a1 vse16.v v1,0(a2) vse16.v v4,0(a3) vse16.v v3,0(a4) addi a5,a5,226 li a1,48 vadd.vx v2,v1,a1 vsetivli zero,10,e16,m1,ta,ma vse16.v v2,0(a5) ret It's obvious, previously, power's testcase in RVV side can not unroll, but after this patch, in RVV side, it can unroll now. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 20:33 To: juzhe.zhong CC: Richard Sandiford; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV On Tue, 30 May 2023, juzhe.zhong wrote: > This patch will generate the number of rgroup ?mov? instructions inside the > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more > instruction in loop. If this patch is necessary? I think I should find a way > to fix it. That's odd, you only need to adjust the IV which is used in the exit test, not all the others. > Replied Message > From > Richard Sandiford > Date > 05/30/2023 19:41 > To > juzhe.zh...@rivai.ai > Cc > gcc-patches, > rguenther, > linkw > Subject > Re: [PATCH] VECT: Change flow of decrement IV > "juzhe.zh...@rivai.ai" writes: > > Before this patch: > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > sub a2,a2,a5 > > bne a2,zero,.L3 > > .L5: > > ret > > > > After this patch: > > > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > neg a7,a4 -->>>additional instruction > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > mv a6,a2 -->>>additional instruction > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > add a2,a2,a7 > > bgtu a6,a4,.L3 > > .L5: > > ret > > > > There is 1 more instruction in preheader and 1 more instruction in loop. > > But I think it's OK for RVV since we will definitely be using SELECT_VL so > this issue will gone. > > But what about cases where you won't be using SELECT_VL, such as SLP? > > Richard > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Re: Re: [PATCH] VECT: Change flow of decrement IV
Hi, Richi. >> Note with SELECT_VL all bets will be off since as I understand the >> value it gives can vary from iteration to iteration (but we know >> a lower and maybe an upper bound?) Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), vlmax], can be any value between the range depending on the hardware implementation. >> So I think we should patch this up in the vectorizer itself like with >> your patch. I'm going to wait for Richards input though since he >> seems to disagree. According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, Kewen is happy with this patch, turns out this patch can fix power's issue. So, Let's wait for Richard's comments. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-31 14:41 To: juzhe.zh...@rivai.ai CC: richard.sandiford; gcc-patches; linkw Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote: > Hi?all. I have posted my several investigations: > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html > > Turns out when "niters is a constant value and vf is a constant value" > This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase > from IBM's testsuite for example) and I think this patch can fix IBM's > cunroll issue. > Even though it will produce a 'mv' instruction in some ohter cases for RVV, I > think Gain > Pain overal. > > Actually, for current flow: > > step = MIN () > ... > remain = remain - step. > > I don't know how difficult to extend SCEV/IVOPTS to fix this issue. > So, could you make a decision for this patch? > > I wonder whether we should apply the approach of this patch (the codes can be > refined after well reviewed) or > we should extend SCEV/IVOPTS ? I don't think we can do anything in SCEV for this which means we'd need to special-case this in niter analysis, in IVOPTs and any other passes that might be affected (and not fixed by handling it in niter analysis). While improving niter analysis would be good (the user could write this pattern as well) I do not have time to try implementing that (I have no idea how ugly or robust it is going to be). So I think we should patch this up in the vectorizer itself like with your patch. I'm going to wait for Richards input though since he seems to disagree. Note with SELECT_VL all bets will be off since as I understand the value it gives can vary from iteration to iteration (but we know a lower and maybe an upper bound?) Thanks, Richard. > Thanks. > > > juzhe.zh...@rivai.ai > > From: ??? > Date: 2023-05-30 23:05 > To: rguenther > CC: richard.sandiford; gcc-patches; linkw > Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV > More information of power's testcase: > > Before this patch: > test_npeel_int16_t: > lui a4,%hi(.LANCHOR0+130) > lui a3,%hi(.LANCHOR1) > addi a3,a3,%lo(.LANCHOR1) > addi a4,a4,%lo(.LANCHOR0+130) > li a5,58 > li a2,16 > vsetivli zero,16,e16,m1,ta,ma > vl1re16.v v3,0(a3) > vid.v v1 > .L5: > minu a3,a5,a2 > vsetvli zero,a3,e16,m1,ta,ma > sub a5,a5,a3 > vse16.v v1,0(a4) > vsetivli zero,16,e16,m1,ta,ma > addi a4,a4,32 > vadd.vv v1,v1,v3 > bne a5,zero,.L5 > ret > > After this patch: > test_npeel_int16_t: > lui a5,%hi(.LANCHOR0) > addi a5,a5,%lo(.LANCHOR0) > li a1,16 > vsetivli zero,16,e16,m1,ta,ma > addi a2,a5,130 > vid.v v1 > addi a3,a5,162 > vadd.vx v4,v1,a1 > addi a4,a5,194 > li a1,32 > vadd.vx v3,v1,a1 > vse16.v v1,0(a2) > vse16.v v4,0(a3) > vse16.v v3,0(a4) > addi a5,a5,226 > li a1,48 > vadd.vx v2,v1,a1 > vsetivli zero,10,e16,m1,ta,ma > vse16.v v2,0(a5) > ret > > It's obvious, previously, power's testcase in RVV side can not unroll, but > after this patch, in RVV side, it can unroll now. > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-05-30 20:33 > To: juzhe.zhong > CC: Richard Sandiford; gcc-patches; linkw > Subject: Re: [PATCH] VECT: Change flow of decrement IV > On Tue, 30 May 2023, juzhe.zhong wrote: > > > This patch will generate the number of rgroup ?mov? instructions inside the > > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 > > more > > instruction in loop. If this patch is necessary? I think I should find a way > > to fix it. > > That's odd, you only need to adjust the IV which is used in the exit test, > not all the others. > > > Replied Message > > From >
Re: Re: [PATCH] VECT: Change flow of decrement IV
Hi, Richard. >> I don't object though. It just feels like we're giving up easily. >> And that's a bit frustrating, since this potential problem was flagged >> ahead of time. I can take a look at it. Would you mind giving me some hints? Should I do this in which PASS ? "ivopts" PASS? Is that right that we can enhance analysis when we see the statement as follows: remain = remain - step and step is coming from a MIN_EXPR (remain, vf). Then what we need to do? Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-05-31 15:28 To: Richard Biener CC: juzhe.zhong\@rivai.ai; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV Richard Biener writes: > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote: > >> Hi?all. I have posted my several investigations: >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html >> >> Turns out when "niters is a constant value and vf is a constant value" >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase >> from IBM's testsuite for example) and I think this patch can fix IBM's >> cunroll issue. >> Even though it will produce a 'mv' instruction in some ohter cases for RVV, >> I think Gain > Pain overal. >> >> Actually, for current flow: >> >> step = MIN () >> ... >> remain = remain - step. >> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue. >> So, could you make a decision for this patch? >> >> I wonder whether we should apply the approach of this patch (the codes can >> be refined after well reviewed) or >> we should extend SCEV/IVOPTS ? > > I don't think we can do anything in SCEV for this which means we'd > need to special-case this in niter analysis, in IVOPTs and any other > passes that might be affected (and not fixed by handling it in niter > analysis). While improving niter analysis would be good (the user > could write this pattern as well) I do not have time to try > implementing that (I have no idea how ugly or robust it is going to be). > > So I think we should patch this up in the vectorizer itself like with > your patch. I'm going to wait for Richards input though since he > seems to disagree. I think my main disagreement is that the IV phi can be analysed as a SCEV with sufficient work (realising that the MIN result is always VF when the latch is executed). That SCEV might be useful “as is” for things like IVOPTS, without specific work in those passes. (Although perhaps not too useful, since most other IVs will be upcounting.) I don't object though. It just feels like we're giving up easily. And that's a bit frustrating, since this potential problem was flagged ahead of time. > Note with SELECT_VL all bets will be off since as I understand the > value it gives can vary from iteration to iteration (but we know > a lower and maybe an upper bound?) Right. All IVs will have a variable step for SELECT_VL. Thanks, Richard
Re: Re: [PATCH] VECT: Change flow of decrement IV
>> I'm just saying that to go forward the vectorizer change looks >>more promising (also considering the pace RISC-V people are working at >>...) Yeah, RVV needs a lot of middle-end support: SELECT_VL, LEN_MASK_LOAD/LEN_MASK_STORE,.etc LEN_ADD for RVV reduction support like COND_ADD for ARM SVE...etc SELECT_VL is still pending. Without support in middle-end, GCC can not support powerful auto-vectorization (Performance will be much worse than RVV LLVM). And unfortunately, I am the only guy working on middle-end support of RVV auto-vectorization. :) I think we can make this patch merged and record the enhancement of SCEV in bugzilla to see we can improve that in the future. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-31 15:38 To: Richard Sandiford CC: juzhe.zh...@rivai.ai; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV On Wed, 31 May 2023, Richard Sandiford wrote: > Richard Biener writes: > > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote: > > > >> Hi?all. I have posted my several investigations: > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html > >> > >> Turns out when "niters is a constant value and vf is a constant value" > >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase > >> from IBM's testsuite for example) and I think this patch can fix IBM's > >> cunroll issue. > >> Even though it will produce a 'mv' instruction in some ohter cases for > >> RVV, I think Gain > Pain overal. > >> > >> Actually, for current flow: > >> > >> step = MIN () > >> ... > >> remain = remain - step. > >> > >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue. > >> So, could you make a decision for this patch? > >> > >> I wonder whether we should apply the approach of this patch (the codes can > >> be refined after well reviewed) or > >> we should extend SCEV/IVOPTS ? > > > > I don't think we can do anything in SCEV for this which means we'd > > need to special-case this in niter analysis, in IVOPTs and any other > > passes that might be affected (and not fixed by handling it in niter > > analysis). While improving niter analysis would be good (the user > > could write this pattern as well) I do not have time to try > > implementing that (I have no idea how ugly or robust it is going to be). > > > > So I think we should patch this up in the vectorizer itself like with > > your patch. I'm going to wait for Richards input though since he > > seems to disagree. > > I think my main disagreement is that the IV phi can be analysed > as a SCEV with sufficient work (realising that the MIN result is > always VF when the latch is executed). That SCEV might be useful > ?as is? for things like IVOPTS, without specific work in those passes. > (Although perhaps not too useful, since most other IVs will be upcounting.) I think we'd need another API for SCEV there then, analyze_scalar_evolution_for_latch () so we can disregard the value on the exit edges then. That means we'd still need to touch all users and decide whether it's safe to use that or not. > I don't object though. It just feels like we're giving up easily. > And that's a bit frustrating, since this potential problem was flagged > ahead of time. Well, I expect that massaging SCEV and niter analysis will take up quite some developer time while avoiding the situation in the vectorizer is possible (and would fix the observed regressions). We can always improve later here and I'd suggest to file an enhancement bugreport with a simple C testcase using this kind of iteration. I'm just saying that to go forward the vectorizer change looks more promising (also considering the pace RISC-V people are working at ...) Richard. > > Note with SELECT_VL all bets will be off since as I understand the > > value it gives can vary from iteration to iteration (but we know > > a lower and maybe an upper bound?) > > Right. All IVs will have a variable step for SELECT_VL. > > Thanks, > Richard >
Re: Re: [PATCH] VECT: Change flow of decrement IV
Oh, it's correct fix. Thanks for catching this. juzhe.zh...@rivai.ai From: Kewen.Lin Date: 2023-05-31 15:38 To: juzhe.zh...@rivai.ai CC: richard.sandiford; gcc-patches; rguenther Subject: Re: [PATCH] VECT: Change flow of decrement IV > Hi, Richi. > >>> Note with SELECT_VL all bets will be off since as I understand the >>> value it gives can vary from iteration to iteration (but we know >>> a lower and maybe an upper bound?) > Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), > vlmax], > can be any value between the range depending on the hardware implementation. > >>> So I think we should patch this up in the vectorizer itself like with >>> your patch. I'm going to wait for Richards input though since he >>> seems to disagree. > > According tohttps://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, > <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971,> > Kewen is happy with this patch, turns out this patch can fix power's issue. Yeah, the exposed degradation and failures can be fixed by this patch. I'd expect both approaches (this patch or extending niter analysis and others) should work for the exposed issues. A new finding is that my SPEC2017 rerun with this patch exposed some verification failures, I made a regression test on Power10, it showed a few failures too (mainly from fortran). By looking into one of them (case gfortran.dg/array_alloc_2.f90), I think the patch needs some adjustment on chosen code according to exit_edge->flags like: diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index ef28711c58f..5d518460b6d 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -892,8 +892,9 @@ vect_set_loop_condition_partial_vectors (class loop *loop, if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) { gcc_assert (compare_step); - cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step, - NULL_TREE, NULL_TREE); + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; + cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE, + NULL_TREE); gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); } else I'm running regression testing again based on this adjustment, will see if it can fix all exposed failures. BR, Kewen > So, Let's wait for Richard's comments. > > Thanks. > ------ > juzhe.zh...@rivai.ai > > > *From:* Richard Biener <mailto:rguent...@suse.de> > *Date:* 2023-05-31 14:41 > *To:* juzhe.zh...@rivai.ai <mailto:juzhe.zh...@rivai.ai> > *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; gcc-patches > <mailto:gcc-patches@gcc.gnu.org>; linkw <mailto:li...@linux.ibm.com> > *Subject:* Re: Re: [PATCH] VECT: Change flow of decrement IV > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote: > > > Hi?all. I have posted my several investigations: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html > > > > Turns out when "niters is a constant value and vf is a constant value" > > This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take > tesecase from IBM's testsuite for example) and I think this patch can fix > IBM's cunroll issue. > > Even though it will produce a 'mv' instruction in some ohter cases for > RVV, I think Gain > Pain overal. > > > > Actually, for current flow: > > > > step = MIN () > > ... > > remain = remain - step. > > > > I
Re: Re: [PATCH] VECT: Change flow of decrement IV
Thanks Richard. Seems that this patch's approach is ok to trunk? Maybe the only thing we should do is to wait Kewen's testing feedback, am I right ? Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-05-31 17:01 To: Richard Biener via Gcc-patches CC: Richard Biener; juzhe.zhong\@rivai.ai; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV Richard Biener via Gcc-patches writes: > On Wed, 31 May 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote: >> > >> >> Hi?all. I have posted my several investigations: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html >> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html >> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html >> >> >> >> Turns out when "niters is a constant value and vf is a constant value" >> >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take >> >> tesecase from IBM's testsuite for example) and I think this patch can fix >> >> IBM's cunroll issue. >> >> Even though it will produce a 'mv' instruction in some ohter cases for >> >> RVV, I think Gain > Pain overal. >> >> >> >> Actually, for current flow: >> >> >> >> step = MIN () >> >> ... >> >> remain = remain - step. >> >> >> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue. >> >> So, could you make a decision for this patch? >> >> >> >> I wonder whether we should apply the approach of this patch (the codes >> >> can be refined after well reviewed) or >> >> we should extend SCEV/IVOPTS ? >> > >> > I don't think we can do anything in SCEV for this which means we'd >> > need to special-case this in niter analysis, in IVOPTs and any other >> > passes that might be affected (and not fixed by handling it in niter >> > analysis). While improving niter analysis would be good (the user >> > could write this pattern as well) I do not have time to try >> > implementing that (I have no idea how ugly or robust it is going to be). >> > >> > So I think we should patch this up in the vectorizer itself like with >> > your patch. I'm going to wait for Richards input though since he >> > seems to disagree. >> >> I think my main disagreement is that the IV phi can be analysed >> as a SCEV with sufficient work (realising that the MIN result is >> always VF when the latch is executed). That SCEV might be useful >> ?as is? for things like IVOPTS, without specific work in those passes. >> (Although perhaps not too useful, since most other IVs will be upcounting.) > > I think we'd need another API for SCEV there then, > analyze_scalar_evolution_for_latch () so we can disregard the > value on the exit edges then. That means we'd still need to touch > all users and decide whether it's safe to use that or not. I'd expect the phi for the IV with the constant step to have the same value as the phi for the IV with a MIN step. I realise that the phi isn't the thing that matters for niters, but I'd expect IVOPTS to consider both the phi and the adjusted value to be candidates. Only the phi can be a candidate with the MIN step, but it feels like it should still be a candidate, even with current interfaces. You know this stuff much better than I do though, so I^m almost certainly oversimplifying/overlooking things. Like I say, I don't object to the vectoriser change, so please don't go down a rabbit hole on my account. :) Thanks, Richard
Re: [PATCH V2] VECT: Change flow of decrement IV
Bootstrapped and Regression on X86 no surprise different. Looking forward Kewen's test report for this patch. Thanks. juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-05-31 23:08 To: gcc-patches CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong Subject: [PATCH V2] VECT: Change flow of decrement IV From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kewen. This patch will need to wait for Kewen's test feedback. Testing on X86 is on-going Co-Authored by: Kewen Lin gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto. --- gcc/tree-vect-loop-manip.cc | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index acf3642ceb2..3f735945e67 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator loop_cond_gsi, rgroup_controls *rgc, tree niters, tree niters_skip, bool might_wrap_p, - tree *iv_step) + tree *iv_step, tree *compare_step) { tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... -ivtmp_35 = ivtmp_9 - _36; +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; ... -if (ivtmp_35 != 0) +if (ivtmp_9 > POLY_INT_CST [4, 4]) goto ; [83.33%] else goto ; [16.67%] @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, - insert_after, &index_before_incr, &index_after_incr); + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, + &incr_gsi, insert_after, &index_before_incr, + &index_after_incr); gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, index_before_incr, nitems_step)); *iv_step = step; - return index_after_incr; + *compare_step = nitems_step; + return index_before_incr; } /* Create increment IV. */ @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, arbitrarily pick the last. */ tree test_ctrl = NULL_TREE; tree iv_step = NULL_TREE; + tree compare_step = NULL_TREE; rgroup_controls *rgc; rgroup_controls *iv_rgc = nullptr; unsigned int i; @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, &preheader_seq, &header_seq, loop_cond_gsi, rgc, niters, niters_skip, might_wrap_p, - &iv_step); + &iv_step, &compare_step); iv_rgc = rgc; } @@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); - tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; - tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); - gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl, - NULL_TREE, NULL_TREE); + gcond *cond_stmt; + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) +{ + gcc_assert (compare_step); + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; + cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE, + NULL_TREE); +} + else +{ + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); + cond_stmt + = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE); +} gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* The loop iterates (NITERS - 1) / VF + 1 times. -- 2.36.3
Re: Re: [PATCH V2] VECT: Change flow of decrement IV
Thanks kewen. I have send V3 patch. Could you comment that ? I want to make sure you do support that patch. Thanks. juzhe.zh...@rivai.ai From: Kewen.Lin Date: 2023-06-01 12:32 To: juzhe.zh...@rivai.ai CC: richard.sandiford; rguenther; gcc-patches Subject: Re: [PATCH V2] VECT: Change flow of decrement IV Hi Juzhe, on 2023/6/1 08:31, juzhe.zh...@rivai.ai wrote: > Bootstrapped and Regression on X86 no surprise different. > > Looking forward Kewen's test report for this patch. > This patch can be bootstrapped and regress-tested on powerpc64-linux-gnu P9 and powerpc64le-linux-gnu P9/P10. Also SPEC2017 int/fp bmks build and run successfully with it on powerpc64le-linux-gnu P10 (with an explicit parameter --param=vect-partial-vector-usage=2). It can fix the 510.parest_r -5% degradation, and it speed-ed up 525.x264_r +1%, 521.wrf_r +2.03%, 544.nab_r +1.27% and 549.fotonik3d_r +3.22%, but it degraded 503.bwaves_r -4%, we have some heuristics on load and load pct. for 503.bwaves_r on Power, I suspected it's related, by considering vect-partial-vector-usage=2 isn't default on Power and this can fix exposed failures and parest_r degradation, I think the bwaves_r degradation should not block this. For bwaves_r degradation, I'll have a further look later, open a PR if it's an actual issue rather than just costing heuristics having no effects. btw, it would be better to add one PR marker line to associate this with PR109971, something like: PR tree-optimization/109971 Thanks! BR, Kewen > Thanks. > -- > juzhe.zh...@rivai.ai > > > *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai> > *Date:* 2023-05-31 23:08 > *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org> > *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther > <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong > <mailto:juzhe.zh...@rivai.ai> > *Subject:* [PATCH V2] VECT: Change flow of decrement IV > From: Ju-Zhe Zhong > > Follow Richi's suggestion, I change current decrement IV flow from: > > do { >remain -= MIN (vf, remain); > } while (remain != 0); > > into: > > do { >old_remain = remain; >len = MIN (vf, remain); >remain -= vf; > } while (old_remain >= vf); > > to enhance SCEV. > > Include fixes from kewen. > > > This patch will need to wait for Kewen's test feedback. > > Testing on X86 is on-going > > Co-Authored by: Kewen Lin > > gcc/ChangeLog: > > * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): > Change decrement IV flow. > (vect_set_loop_condition_partial_vectors): Ditto. > > --- > gcc/tree-vect-loop-manip.cc | 36 +--- > 1 file changed, 25 insertions(+), 11 deletions(-) > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index acf3642ceb2..3f735945e67 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, > loop_vec_info loop_vinfo, > gimple_stmt_iterator loop_cond_gsi, > rgroup_controls *rgc, tree niters, > tree niters_skip, bool might_wrap_p, > - tree *iv_step) > + tree *iv_step, tree *compare_step) > { >tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); >tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); > @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, > loop_vec_info loop_vinfo, >... >vect__4.8_28 = .LEN_LOAD (_17, 32B
Re: [PATCH V3] VECT: Change flow of decrement IV
This patch is no difference from V2. Just add PR tree-optimization/109971 as Kewen's suggested. Already bootstrapped and Regression on X86 no difference. Ok for trunk ? juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-06-01 12:36 To: gcc-patches CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong Subject: [PATCH V3] VECT: Change flow of decrement IV From: Ju-Zhe Zhong Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kewen. This patch will need to wait for Kewen's test feedback. Testing on X86 is on-going Co-Authored by: Kewen Lin PR tree-optimization/109971 gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto. --- gcc/tree-vect-loop-manip.cc | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index acf3642ceb2..3f735945e67 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator loop_cond_gsi, rgroup_controls *rgc, tree niters, tree niters_skip, bool might_wrap_p, - tree *iv_step) + tree *iv_step, tree *compare_step) { tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, ... vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); ... -ivtmp_35 = ivtmp_9 - _36; +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; ... -if (ivtmp_35 != 0) +if (ivtmp_9 > POLY_INT_CST [4, 4]) goto ; [83.33%] else goto ; [16.67%] @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, tree step = rgc->controls.length () == 1 ? rgc->controls[0] : make_ssa_name (iv_type); /* Create decrement IV. */ - create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, - insert_after, &index_before_incr, &index_after_incr); + create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop, + &incr_gsi, insert_after, &index_before_incr, + &index_after_incr); gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, index_before_incr, nitems_step)); *iv_step = step; - return index_after_incr; + *compare_step = nitems_step; + return index_before_incr; } /* Create increment IV. */ @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, arbitrarily pick the last. */ tree test_ctrl = NULL_TREE; tree iv_step = NULL_TREE; + tree compare_step = NULL_TREE; rgroup_controls *rgc; rgroup_controls *iv_rgc = nullptr; unsigned int i; @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, &preheader_seq, &header_seq, loop_cond_gsi, rgc, niters, niters_skip, might_wrap_p, - &iv_step); + &iv_step, &compare_step); iv_rgc = rgc; } @@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* Get a boolean result that tells us whether to iterate. */ edge exit_edge = single_exit (loop); - tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; - tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); - gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl, - NULL_TREE, NULL_TREE); + gcond *cond_stmt; + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) +{ + gcc_assert (compare_step); + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR; + cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE, + NULL_TREE); +} + else +{ + tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR; + tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl)); + cond_stmt + = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE); +} gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* The loop iterates (NITERS - 1) / VF + 1 times. -- 2.36.3
Re: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.
I plan to implement BF16 vector in GCC but still waiting for ISA ratified since GCC policy doesn't allow un-ratified ISA. Currently, we are working on INT8,INT16,INT32,INT64,FP16,FP32,FP64 auto-vectorizaiton. It should very simple BF16 in current vector framework in GCC. Thanks. juzhe.zh...@rivai.ai From: Li, Pan2 Date: 2023-06-01 14:57 To: juzhe.zh...@rivai.ai Subject: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm. FYI. -Original Message- From: Gcc-patches On Behalf Of Jin Ma via Gcc-patches Sent: Thursday, June 1, 2023 2:51 PM To: gcc-patches@gcc.gnu.org Cc: shi...@iscas.ac.cn; kito.ch...@gmail.com; Jin Ma Subject: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm. hi, Are there any new developments about Zfb? Are there any plans to implement the Zvfbfmin and Zvfbfwma expansion? I see that Zfb is being reviewed in llvm, maybe we should do the same on gcc. Ref: https://reviews.llvm.org/D151313 https://reviews.llvm.org/D150929
Re: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode.
LGTM. We are waiting for FP16 vector to start floating-point auto-vectorizations Thanks so much. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-06-01 15:17 To: gcc-patches CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang Subject: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode. From: Pan Li This patch would like to introduce the built-in type vfloat16m{f}*_t, as well as their machine mode VNx*HF. They depend on architecture zvfhmin or zvfh. When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will be true. The underlying PATCH will implement the zvfhmin extension based on this. Signed-off-by: Pan Li gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin and zvfh. * config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16. (main): Disable FP16 tuple. * config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro. (TARGET_VECTOR_ELEN_FP_16): Ditto. * config/riscv/riscv-vector-builtins.cc (check_required_extensions): Add FP16. * config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type. (vfloat16mf2_t): Ditto. (vfloat16m1_t): Ditto. (vfloat16m2_t): Ditto. (vfloat16m4_t): Ditto. (vfloat16m8_t): Ditto. * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16): New macro. * config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16 machine mode based on TARGET_VECTOR_ELEN_FP_16. --- gcc/common/config/riscv/riscv-common.cc| 2 ++ gcc/config/riscv/genrvv-type-indexer.cc| 7 +-- gcc/config/riscv/riscv-opts.h | 4 gcc/config/riscv/riscv-vector-builtins.cc | 2 ++ gcc/config/riscv/riscv-vector-builtins.def | 20 +++ gcc/config/riscv/riscv-vector-builtins.h | 1 + gcc/config/riscv/riscv-vector-switch.def | 23 ++ 7 files changed, 49 insertions(+), 10 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index e6ed3df9ea6..3247d526c0a 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -1248,6 +1248,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = {"zve64x", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64}, {"zve64f", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_32}, {"zve64d", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_64}, + {"zvfhmin", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_16}, + {"zvfh", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_16}, {"zvl32b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL32B}, {"zvl64b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL64B}, diff --git a/gcc/config/riscv/genrvv-type-indexer.cc b/gcc/config/riscv/genrvv-type-indexer.cc index 18e1b375396..8fc93ceaab4 100644 --- a/gcc/config/riscv/genrvv-type-indexer.cc +++ b/gcc/config/riscv/genrvv-type-indexer.cc @@ -54,7 +54,7 @@ valid_type (unsigned sew, int lmul_log2, bool float_p) case 8: return lmul_log2 >= -3 && !float_p; case 16: - return lmul_log2 >= -2 && !float_p; + return lmul_log2 >= -2; case 32: return lmul_log2 >= -1; case 64: @@ -73,6 +73,9 @@ valid_type (unsigned sew, int lmul_log2, unsigned nf, bool float_p) if (nf > 8 || nf < 1) return false; + if (sew == 16 && nf != 1 && float_p) // Disable FP16 tuple in temporarily. +return false; + switch (lmul_log2) { case 1: @@ -342,7 +345,7 @@ main (int argc, const char **argv) fprintf (fp, ")\n"); } // Build for vfloat - for (unsigned sew : {32, 64}) + for (unsigned sew : {16, 32, 64}) for (int lmul_log2 : {-3, -2, -1, 0, 1, 2, 3}) for (unsigned nf : {1, 2, 3, 4, 5, 6, 7, 8}) { diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 5f387d0e393..208a557b8ff 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -154,6 +154,8 @@ enum riscv_entity #define MASK_VECTOR_ELEN_64(1 << 1) #define MASK_VECTOR_ELEN_FP_32 (1 << 2) #define MASK_VECTOR_ELEN_FP_64 (1 << 3) +/* Align the bit index to riscv-vector-builtins.h. */ +#define MASK_VECTOR_ELEN_FP_16 (1 << 6) #define TARGET_VECTOR_ELEN_32 \ ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_32) != 0) @@ -163,6 +165,8 @@ enum riscv_entity ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_32) != 0) #define TARGET_VECTOR_ELEN_FP_64 \ ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_64) != 0) +#define TARGET_VECTOR_ELEN_FP_16 \ + ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_16) != 0) #define MASK_ZVL32B(1 << 0) #define MASK_ZVL64B(1 << 1) diff --git a/gcc/config/riscv/riscv-vector-builtins.cc b/gcc/config/riscv/riscv-vector-builtins.cc index 9fea70709fd..43bf6d8f262 100644 --- a/gcc
Re: Re: [PATCH V3] VECT: Change flow of decrement IV
Thanks Kewen. Let's wait for Richard and Richi. juzhe.zh...@rivai.ai From: Kewen.Lin Date: 2023-06-01 13:24 To: juzhe.zh...@rivai.ai CC: richard.sandiford; rguenther; gcc-patches Subject: Re: [PATCH V3] VECT: Change flow of decrement IV Hi, on 2023/6/1 13:00, juzhe.zh...@rivai.ai wrote: > This patch is no difference from V2. I support this patch based on the testing and SPEC2017 evaluation results on Power (see my comments on patch v2). > Just add PR tree-optimization/109971 as Kewen's suggested. Thanks for adding that, I was expecting you will add that when you are committing it, not really requesting one new version. :) btw, the PR marker(s) will trigger scripts to comment some commit info (commit link, commit log) into the specified PR(s), people can find some connections between PRs and (fixing or progressing forward) commits easily. BR, Kewen > > Already bootstrapped and Regression on X86 no difference. > > Ok for trunk ? > -- > juzhe.zh...@rivai.ai > > > *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai> > *Date:* 2023-06-01 12:36 > *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org> > *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther > <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong > <mailto:juzhe.zh...@rivai.ai> > *Subject:* [PATCH V3] VECT: Change flow of decrement IV > From: Ju-Zhe Zhong > > Follow Richi's suggestion, I change current decrement IV flow from: > > do { >remain -= MIN (vf, remain); > } while (remain != 0); > > into: > > do { >old_remain = remain; >len = MIN (vf, remain); >remain -= vf; > } while (old_remain >= vf); > > to enhance SCEV. > > Include fixes from kewen. > > > This patch will need to wait for Kewen's test feedback. > > Testing on X86 is on-going > > Co-Authored by: Kewen Lin > > PR tree-optimization/109971 > > gcc/ChangeLog: > > * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): > Change decrement IV flow. > (vect_set_loop_condition_partial_vectors): Ditto. > > --- > gcc/tree-vect-loop-manip.cc | 36 +--- > 1 file changed, 25 insertions(+), 11 deletions(-) > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index acf3642ceb2..3f735945e67 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, > loop_vec_info loop_vinfo, > gimple_stmt_iterator loop_cond_gsi, > rgroup_controls *rgc, tree niters, > tree niters_skip, bool might_wrap_p, > - tree *iv_step) > + tree *iv_step, tree *compare_step) > { >tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); >tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); > @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, > loop_vec_info loop_vinfo, >... >vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); >... > -ivtmp_35 = ivtmp_9 - _36; > +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4]; >... > -if (ivtmp_35 != 0) > +if (ivtmp_9 > POLY_INT_CST [4, 4]) > goto ; [83.33%] >else > goto ; [16.67%] > @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, > loop_vec_info loop_vinfo, >tree step = rgc->controls.length () == 1 ? rgc->controls[0] >: make_ssa_name (iv
Re: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
Hi, forget about this patch. Just go directly the V2 patch with same title. That's the last patch I fine tune for integer widening auto-vectorization. Thanks. juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-06-01 15:31 To: gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations From: Juzhe-Zhong This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } In such complicate case, the operand is not single used, used by multiple statements. GCC combine optimization will iterate the combination of the operands. First round -> combine one of the operand and change vsext + vmul into vwmul.wv Second round -> combine the other operand and change vwmul.wv into vwmul.vv Notice when I add a pseudo vwmul.wv pattern, it makes vwmulsu.vv testcase fail since GCC prefer such pattern order: (mul: (zero_extend) (sign_exted)) So change vwmulsu.vv instruction operands order. gcc/ChangeLog: * config/riscv/vector.md: Shift zero_extend and sign_extend order. * config/riscv/autovec-opt.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test. --- gcc/config/riscv/autovec-opt.md | 56 +++ gcc/config/riscv/vector.md| 9 +-- .../riscv/rvv/autovec/widen/widen-7.c | 27 + .../rvv/autovec/widen/widen-complicate-3.c| 32 +++ .../riscv/rvv/autovec/widen/widen_run-7.c | 34 +++ 5 files changed, 154 insertions(+), 4 deletions(-) create mode 100644 gcc/config/riscv/autovec-opt.md create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md new file mode 100644 index 000..5b7dc9bef8c --- /dev/null +++ b/gcc/config/riscv/autovec-opt.md @@ -0,0 +1,56 @@ +;; Machine description for optimization of RVV auto-vectorization. +;; Copyright (C) 2023 Free Software Foundation, Inc. +;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd. + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. + +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +;; We don't have vwmul.wv instruction like vwadd.wv in RVV. +;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance +;; optimization of instructions combine. +(define_insn_and_split "@pred_single_widen_mul" + [(set (match_operand:VWEXTI 0 "register_operand" "=&vr,&vr") + (if_then_else:VWEXTI + (unspec: + [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") + (match_operand 5 "vector_length_operand" " rK, rK") + (match_operand 6 "const_int_operand" "i,i") + (match_operand 7 "const_int_operand" "i,i") + (match_operand 8 "const_int_operand" "i,i") + (reg:SI VL_REGNUM) + (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) + (mult:VWEXTI + (any_extend:VWEXTI + (match_operand: 4 "register_operand" " vr, vr")) + (match_operand:VWEXTI 3 "register_operand" " vr, vr")) + (match_operand:VWEXTI 2 "vector_merge_operand" " vu,0")))] + "TARGET_VECTOR" + &quo
Re: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid
Oh. Yes. Thanks for catching this! Will send V2 soon. juzhe.zh...@rivai.ai From: KuanLin Chen Date: 2023-06-02 09:26 To: gcc-patches; juzhe.zhong CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw Subject: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid Hi Juzhe, I think fault_load_def::get_name should remove "instance.pred == PRED_TYPE_mu", right? 於 2023年6月2日 週五 上午7:05寫道: > > From: Juzhe-Zhong > > Base on these: > https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 > > Add _mu C++ overloaded intrinsics for load && viota && vid. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded > intrinsics. > > --- > gcc/config/riscv/riscv-vector-builtins-bases.cc | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc > b/gcc/config/riscv/riscv-vector-builtins-bases.cc > index a8113f6602b..498c6ba042e 100644 > --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc > +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc > @@ -164,7 +164,7 @@ public: >{ > if (STORE_P || LST_TYPE == LST_INDEXED) >return true; > -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; > +return pred != PRED_TYPE_none; >} > >rtx expand (function_expander &e) const override > @@ -963,7 +963,7 @@ public: >bool can_be_overloaded_p (enum predication_type_index pred) const override >{ > return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum > - || pred == PRED_TYPE_tumu; > + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; >} > >rtx expand (function_expander &e) const override > @@ -979,7 +979,7 @@ public: >bool can_be_overloaded_p (enum predication_type_index pred) const override >{ > return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum > - || pred == PRED_TYPE_tumu; > + || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu; >} > >rtx expand (function_expander &e) const override > @@ -1749,7 +1749,7 @@ public: > >bool can_be_overloaded_p (enum predication_type_index pred) const override >{ > -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; > +return pred != PRED_TYPE_none; >} > >rtx expand (function_expander &e) const override > @@ -1794,7 +1794,7 @@ public: > >bool can_be_overloaded_p (enum predication_type_index pred) const override >{ > -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu; > +return pred != PRED_TYPE_none; >} > >rtx expand (function_expander &e) const override > -- > 2.36.1 >
Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
Hi, Robin. >> I like the code examples in general but find them hard to read >> at lengths > 5-10 or so. Could we condense this a bit? Ok, Do I need to send V2 ? Or condense the commit log when merged the patch? >> I'm a bit wary about getting the costs >> right for combine patterns but we can deal with this later. No, you don't need to worry about combining extensions and I don't think we need cost to adjust extensions combining. For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since it will increase scalar register pressures. So, for such combining, I would like take a another approach to combine this pattern carefully with accurate register pressure calculation. However, for this patch. vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better. I don't think it is possible that using vwadd.vv will be worse. Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-02 15:01 To: juzhe.zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations Hi Juzhe, > ... >vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a4) > vsetvli t3,zero,e16,m2,ta,ma > vsext.vf2 v6,v1 > vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a5) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a0,t4 > vzext.vf2 v4,v1 > vmul.vv v2,v4,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > vle8.v v1,0(a6) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a1,t4 > vzext.vf2 v2,v1 > vmul.vv v4,v2,v4 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v4,0(t0) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a2,t4 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > add t0,a3,t4 > vle8.v v1,0(a7) > vsetvli t3,zero,e16,m2,ta,ma > sub t6,t6,t1 > vsext.vf2 v2,v1 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > ... > > After this patch: > ... > vsetvli zero,t1,e8,mf2,ta,ma > vle8.v v1,0(a4) > vle8.v v3,0(a5) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a0,t3 > vwmulsu.vv v2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > vle8.v v2,0(a6) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a1,t3 > vwmulu.vv v4,v3,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v4,0(t0) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a2,t3 > vwmulsu.vv v3,v1,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v3,0(t0) > add t0,a3,t3 > vle8.v v3,0(a7) > vsetvli t6,zero,e8,mf2,ta,ma > sub t4,t4,t1 > vwmul.vvv2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > ... I like the code examples in general but find them hard to read at lengths > 5-10 or so. Could we condense this a bit? > +(include "autovec-opt.md") ACK for this. We discussed before that not cluttering the regular autovec.md with combine-targeted patterns too much so I'm in favor of the separate file. In total looks good to me. I'm a bit wary about getting the costs right for combine patterns but we can deal with this later. Regards Robin
Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
Thanks. I am gonna wait for Jeff or Kito final approve. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-02 15:18 To: juzhe.zh...@rivai.ai; gcc-patches CC: rdapp.gcc; kito.cheng; Kito.cheng; palmer; palmer; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations >>> I like the code examples in general but find them hard to read >>> at lengths > 5-10 or so. Could we condense this a bit? > Ok, Do I need to send V2 ? Or condense the commit log when merged the patch? Sure, just condense a bit. No need for V2. Regards Robin