from:"juzhe.zh...@rivai.ai"

Re: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread juzhe.zh...@rivai.ai

Thanks for reporting it.

I think we may need to change it into:
+ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target {! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_strided5 && vect_load_lanes } } } */

Could you verify it whether it work for you ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Andrew Stubbs
Date: 2023-10-06 22:29
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com
Subject: Re: [PATCH] test: Isolate slp-1.c check of target supports 
vect_strided5
On 15/09/2023 10:16, Juzhe-Zhong wrote:
> This test failed in RISC-V:
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> 
> Because this loop:
>/* SLP with unrolling by 8.  */
>for (i = 0; i < N; i++)
>  {
>out[i*5] = 8;
>out[i*5 + 1] = 7;
>out[i*5 + 2] = 81;
>out[i*5 + 3] = 28;
>out[i*5 + 4] = 18;
>  }
> 
> is using vect_load_lanes with array size = 5.
> instead of SLP.
> 
> When we adjust the COST of LANES load store, then it will use SLP.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/slp-1.c: Add vect_stried5.
> 
> ---
>   gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-1.c
> index 82e4f6469fb..d4a13f12df6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
> @@ -122,5 +122,5 @@ int main (void)
>   }
>   
>   /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> } } */
> -
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target {! vect_strided5 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" 
> { target vect_strided5 } } } */
 
This patch causes a test regression on amdgcn because vect_strided5 is 
true (because check_effective_target_vect_fully_masked is true), but the 
testcase still gives the message 4 times. Perhaps because amdgcn uses 
masking and not vect_load_lanes?
 
Andrew

Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case

2023-10-06 Thread juzhe.zh...@rivai.ai

OK.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-10-07 11:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
From: xuli 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler 
times.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
---
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c   | 10 +-
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c   | 10 +-
2 files changed, 10 insertions(+), 10 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
index c566f8a4751..2ec9487a6c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
@@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" 
no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
index d0e75258188..bcafce36895 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
@@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times

Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-06 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
new file mode 100644
index 000..c542278c1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include

Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai

These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
new file mode 100644
index 000..c542278c1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run {

Re: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai

Also I have reverted your commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=066a43ce72ab6559ba14af9628df19daa0b85cdf

Plz test the patch and verify it doesn't cause any FAILs if the toolchain 
doesn't have "zvfh_zfh".




juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-10-07 17:49
To: pan2.li; gcc-patches
CC: pan2.li; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __bu

Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread juzhe.zh...@rivai.ai

Hi, Jeff.

Address your comments and fix on V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632239.html 

I think it look reasonable good for a long term maintenance now.

Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-10-07 23:09
To: Juzhe-Zhong; gcc-patches
CC: rguenther; rdapp.gcc
Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
 
 
On 10/7/23 05:45, Juzhe-Zhong wrote:
> This patch fixes the following dumple FAILs:
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_SUB" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_SUB" 1
> 
> For RVV, the expected dumple IR is COND_LEN_* pattern.
> 
> Also, we are still failing at this check:
> 
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_LEN_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_LEN_SUB"
> 
> Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
> 
> @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
> fix patch.
> 
> Ok for trunk ?
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
> * gcc.dg/vect/vect-cond-arith-4.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-5.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.
 
Just wondering if that'll be better from a long term maintenance standpoint.
 
Jeff

Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai

Hi, Richi and Robin.

Turns out COND(_LEN)?_ADD can't work.

Is this patch Ok ? Or do you have another solution to change the dump check for 
RVV?

Thanks.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-08 09:33
To: gcc-patches
CC: rguenther; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1
 
For RVV, the expected dumple IR is COND_LEN_* pattern.
 
Also, we are still failing at this check:
 
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"
 
Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
 
@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.
 
Ok for trunk ?
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.
 
---
gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++--
gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 
gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 
gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 
4 files changed, 14 insertions(+), 14 deletions(-)
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..3832a660023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,5 @@ neg_xi (double *x)
   return res_3;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \

Re: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai

Yes. We do have && enable char -> long conversion (vsext.vf8/vzext.vf8)

Thanks for the comment, I will adapt test as you suggested.

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-10-09 15:31
To: Jeff Law
CC: Juzhe-Zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV
On Sun, 8 Oct 2023, Jeff Law wrote:

> 
> 
> On 10/8/23 05:35, Juzhe-Zhong wrote:
> > RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this
> > case well.
> > So, adjust dump check for RVV.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.
> I'd hoped to avoid a bunch of risc-v special casing in the generic part of the
> testsuite.  Basically the more we have target specific conditionals rather
> than conditionals using properties, the more likely we are to keep revisiting
> this stuff over time and possibly for other architectures as well.
> 
> What is it about risc-v's vector support that allows it to optimize this case?
> Is it the same property that allows us to handle the outer loop vectorization
> tests that you changed in another patch?

I suspect for VLA vectorization we can use direct conversion from
char to long long here?  I also notice the testcase uses 'char',
not specifying its sign.  So either of [sz]extVxyzDIVxyzQI is possibly
provided by RISCV?  (or possibly via some intermediate types in a
multi-step conversion)

For non-VLA and with the single vector size restriction we'd need
unpacking.

So it might be better

{ target { vect_unpack || { vect_vla && vect_sext_char_longlong } } }

where I think neither vect_vla nor vect_sext_char_longlong exists.

Richard - didn't you run into similar things with SVE?

Richard.

> Neither an ACK nor NAK right now.
> 
> Jeff
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes

2023-10-09 Thread juzhe.zh...@rivai.ai

>> But you gobble the "or .." into an existing -mstrict-align flag - are
>> you sure all implementations are
>> self-consistent with handling non-vector memory instructions and
>> vector memory instructions here?
>> At least the above wording doesn't seem to impose such requirement.

RVV ISA： 
"Support for misaligned vector memory accesses is independent of an 
implementation’s support for misaligned scalar memory accesses."
Support misalign vector memory access is independent on scalar memory access.
I think this patch (using -mno-strict-align) is not appropriate, which means I 
need additional compile option.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 16:01
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes
On Sun, Oct 8, 2023 at 9:22 AM Juzhe-Zhong  wrote:
>
> Previously, I removed the movmisalign pattern to fix the execution FAILs in 
> this commit:
> https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
>
> I was thinking that RVV doesn't allow misaligned at the beginning so I 
> removed that pattern.
> However, after deep investigation && reading RVV ISA again and experiment on 
> SPIKE,
> I realized I was wrong.
>
> RVV ISA reference: 
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
>
> "If an element accessed by a vector memory instruction is not naturally 
> aligned to the size of the element,
>  either the element is transferred successfully or an address misaligned 
> exception is raised on that element."
 
But you gobble the "or .." into an existing -mstrict-align flag - are
you sure all implementations are
self-consistent with handling non-vector memory instructions and
vector memory instructions here?
At least the above wording doesn't seem to impose such requirement.
 
> It's obvious that RVV ISA does allow misaligned vector load/store.
>
> And experiment and confirm on SPIKE:
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
> z   ra 00010158 sp 003ffb40 gp 
> 00012c48
> tp  t0 000110da t1 000f t2 
> 
> s0 00013460 s1  a0 00012ef5 a1 
> 00012018
> a2 00012a71 a3 000d a4 0004 a5 
> 00012a71
> a6 00012a71 a7 00012018 s2  s3 
> 
> s4  s5  s6  s7 
> 
> s8  s9  sA  sB 
> 
> t3  t4  t5  t6 
> 
> pc 00010258 va/inst 020660a7 sr 80026620
> Store/AMO access fault!
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
>
> We can see SPIKE can pass previous *FAILED* execution tests with specifying 
> --misaligned to SPIKE.
>
> So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the 
> investigations I have done since
> it can improve multiple vectorization tests and fix dumple FAILs.
>
> This patch fixes these following dump FAILs:
>
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimi

Re: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai

Thanks Richi.

I will try to figure out a better way to adapt the tests without adding riscv* 
specific targets variant.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 16:17
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV
On Sun, 8 Oct 2023, Juzhe-Zhong wrote:
 
> Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop 
> vectorizations.
 
How so?  I think this maybe goes with the other similar change.
 
That is, when we already have specific target checks adding riscv-*-* 
looks sensible but when we don't we should figure if there's a capability
we can (add and) test instead.
 
> Fix these following XPASS FAILs:
> 
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV.
> * gcc.dg/vect/no-scevccp-outer-17.c: Ditto.
> * gcc.dg/vect/no-scevccp-outer-19.c: Ditto.
> * gcc.dg/vect/no-scevccp-outer-21.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> index c7c2fa8a504..12179949e00 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> index ba904a6c03e..86554a98169 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> index 5cd4049d08c..624b54accf4 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> @@ -49,4 +49,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> index 72e53c2bfb0..b30a5d78819 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! { vect_pack_trunc } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen

2023-10-09 Thread juzhe.zh...@rivai.ai

Remove these functions:

+static void
+emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx sll_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, sll_ops);
+}
+
+static void
+emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx srl_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, srl_ops);
+}
+
+static void
+emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx or_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred (IOR, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, or_ops);
+}
+

Instead, 

For sll, you should use :
rtx tmp
= expand_binop (Pmode, ashl_optab, op_1,
gen_int_mode (8, Pmode), NULL_RTX, 0,
OPTAB_DIRECT);

For srl, you should use:
rtx tmp
= expand_binop (Pmode, lshiftrt_optab, op_1,
gen_int_mode (8, Pmode), NULL_RTX, 0,
OPTAB_DIRECT);


For or, you should use:
expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0,
   OPTAB_DIRECT);



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-09 16:51
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen
From: Pan Li 
 
This patch would like to refine the code gen for the bswap16.
 
We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.
 
  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 sllia2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub a4,a4,a5
7 vse16.v v1,0(a3)
8 add a0,a0,a2
9 add a3,a3,a2
  bne a4,zero,.L2
 
But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.
 
  .L5
1 vle8.v  v2,0(a5)
2 addia5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addia4,a4,32
  bne a5,a6,.L5
 
Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vec_sll_scalar): New help func
impl for emit vsll.vi/vsll.vx
(emit_vec_srl_scalar): Likewise for vsrl.vi/vsrl.vx.
(emit_vec_or): Likewise for vor.vv.
(shuffle_bswap_pattern): New func impl for shuffle bswap.
(expand_vec_perm_const_1): Add shuffle bswap pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 117 ++
.../riscv/rvv/autovec/unop/bswap16-0.c|  17 +++
.../riscv/rvv/autovec/unop/bswap16-run-0.c|  44 +++
.../riscv/rvv/autovec/vls/bswap16-0.c |  34 +
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |   4 +-
5 files changed, 214 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 23633a2a74d..3e3b5f2e797 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -878,6 +878,33 @@ emit_vlmax_decompress_insn (rtx target, rtx op0, rtx op1, 
rtx mask)
   emit_vlmax_masked_gather_mu_insn (target, op1, sel, mask);
}
+static void
+emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx sll_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, sll_ops);
+}
+
+static void
+emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx srl_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, srl_ops);
+}
+
+static void
+emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx or_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred (IOR, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, or_ops);
+}
+
/* Emit merge instruction.  */
static machine_mode
@@ -3030,6 +3057,94 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_bswap_pattern (struct expand_vec_perm_d *d)
+{
+  HOST_WIDE_INT diff;
+  unsigned i, size, step;
+
+  if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff)
+return false;
+
+  step = diff + 1;
+  size = step * GET_MODE_UNIT_BITSIZE

Re: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai

Thanks Robin. Could you send V3 to Richi ? And commit it if Richi is ok with 
that.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-09 18:26
To: Andreas Schwab; juzhe.zhong
CC: rdapp.gcc; gcc-patches; rguenther; jeffreyalaw
Subject: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
On 10/9/23 09:32, Andreas Schwab wrote:
> On Okt 09 2023, juzhe.zh...@rivai.ai wrote:
> 
>> Turns out COND(_LEN)?_ADD can't work.
> 
> It should work though.  Tcl regexps are a superset of POSIX EREs.
> 
 
The problem is that COND(_LEN)?_ADD matches two times against
COND_LEN_ADD and a scan-tree-dump-times 1 will fail.  So for those
checks in vect-cond-arith-6.c we either need to switch to
scan-tree-dump or change the pattern to "\.(?:COND|COND_LEN)_ADD".
 
Juzhe, something like the attached works for me.
 
Regards
Robin
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe642a0..7d26dbedc5e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -52,8 +52,8 @@ main (void)
   return 0;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
vect_double_cond_arith } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
vect_double_cond_arith } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
index ec3d9db4202..f7daa13685c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
@@ -54,8 +54,8 @@ main (void)
   return 0;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
index 2aeebd44f83..a80c30a50b2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
@@ -56,8 +56,8 @@ main (void)
}
/* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { 
target vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
vect_double_cond_arith } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
vect_double_cond_arith } } } */

Re: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai

>> OK.  
Thanks.  Committed.

>> Note load/store-lanes is specifically pre-empting SLP if all
>> loads/stores of a SLP intance can support that.  Not sure if this
>> heuristic is good for load/store lanes with high stride?

Yeah, I understand your concern. 
Em, I am sure too.
But RVV ISA define lanes load/store from 2 to 8 and LLVM already supported.
I think we can fully support them, then let RISC-V COST model decide it whether 
it is profitable or not.

Also, I found RVV can vectorize a TSVC case with stride = 5 
lane_load/lane_store:

tsvc-s353.c:

-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v 
} } } } */

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632213.html

So, I think overall it is beneficial we support high stride lane load/store 
which can help us vectorize more cases.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 20:41
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for 
RVV
On Mon, 9 Oct 2023, Juzhe-Zhong wrote:
 
> Reference: https://godbolt.org/z/G9jzf5Grh
> 
> RVV is able to vectorize this case using SLP. However, with 
> -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6.
 
OK.  Note load/store-lanes is specifically pre-empting SLP if all
loads/stores of a SLP intance can support that.  Not sure if this
heuristic is good for load/store lanes with high stride?
 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c 
> b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> index 7c7acd5bab6..96751faae7f 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> @@ -18,4 +18,4 @@ foo (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { ! vect_strided6 } } } } */
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen

2023-10-09 Thread juzhe.zh...@rivai.ai

LGTM now.

Thanks.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-09 21:09
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen
From: Pan Li 
 
Update in v2
 
* Remove emit helper functions.
* Take expand_binop instead.
 
Original log:
 
This patch would like to refine the code gen for the bswap16.
 
We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.
 
  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 sllia2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub a4,a4,a5
7 vse16.v v1,0(a3)
8 add a0,a0,a2
9 add a3,a3,a2
  bne a4,zero,.L2
 
But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.
 
  .L5
1 vle8.v  v2,0(a5)
2 addia5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addia4,a4,32
  bne a5,a6,.L5
 
Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
for shuffle bswap.
(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 91 +++
.../riscv/rvv/autovec/unop/bswap16-0.c| 17 
.../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 +
.../riscv/rvv/autovec/vls/bswap16-0.c | 34 +++
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |  4 +-
5 files changed, 188 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 23633a2a74d..c72e411f125 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3030,6 +3030,95 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_bswap_pattern (struct expand_vec_perm_d *d)
+{
+  HOST_WIDE_INT diff;
+  unsigned i, size, step;
+
+  if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff)
+return false;
+
+  step = diff + 1;
+  size = step * GET_MODE_UNIT_BITSIZE (d->vmode);
+
+  switch (size)
+{
+case 16:
+  break;
+case 32:
+case 64:
+  /* We will have VEC_PERM_EXPR after rtl expand when invoking
+ __builtin_bswap. It will generate about 9 instructions in
+ loop as below, no matter it is bswap16, bswap32 or bswap64.
+.L2:
+ 1 vle16.v v4,0(a0)
+ 2 vmv.v.x v2,a7
+ 3 vand.vv v2,v6,v2
+ 4 sllia2,a5,1
+ 5 vrgatherei16.vv v1,v4,v2
+ 6 sub a4,a4,a5
+ 7 vse16.v v1,0(a3)
+ 8 add a0,a0,a2
+ 9 add a3,a3,a2
+bne a4,zero,.L2
+
+ But for bswap16 we may have a even simple code gen, which
+ has only 7 instructions in loop as below.
+.L5
+ 1 vle8.v  v2,0(a5)
+ 2 addia5,a5,32
+ 3 vsrl.vi v4,v2,8
+ 4 vsll.vi v2,v2,8
+ 5 vor.vv  v4,v4,v2
+ 6 vse8.v  v4,0(a4)
+ 7 addia4,a4,32
+bne a5,a6,.L5
+
+ Unfortunately, the instructions in loop will grow to 13 and 24
+ for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn)
+ for both the bswap64 and bswap32, but take shift and or (7 insn)
+ for bswap16.
+   */
+default:
+  return false;
+}
+
+  for (i = 0; i < step; i++)
+if (!d->perm.series_p (i, step, diff - i, step))
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  machine_mode vhi_mode;
+  poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2);
+
+  if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode))
+return false;
+
+  /* Step-1: Move op0 to src with VHI mode.  */
+  rtx src = gen_reg_rtx (vhi_mode);
+  emit_move_insn (src, gen_lowpart (vhi_mode, d->op0));
+
+  /* Step-2: Shift right 8 bits to dest.  */
+  rtx dest = expand_binop (vhi_mode, lshr_optab, src, gen_int_mode (8, Pmode),
+NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-3: Shift left 8 bits to src.  */
+  src = expand_binop (vhi_mode, ashl_optab, src, gen_int_mode (8, Pmode),
+   NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-4: Logic Or dest and src to dest.  */
+  dest = expand_binop (vhi_mode, ior_optab, dest, src,
+NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-5: Move src to target with VQI mode.  */
+  emit_move_insn (d->target, gen_lowpart

Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-09 Thread juzhe.zh...@rivai.ai

Oh. I realize this patch increase FAIL that I recently fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html 

This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer will 
failed at first time but succeed at 2nd time), 
then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2  (ARM SVE 2 times 
because they have vec_pack_trunc_optab).

I think the root cause of RVV failing at multiple tests of "vect" is that we 
don't enable vec_pack/vec_unpack/... stuff, 
we still succeed at vectorizations and we want to enable tests of them 
(Mostly just using different approach to vectorize it (cause dump FAIL) because 
of some changing I have done previously in the middle-end).

So enabling "vec_pack" for RVV will fix some FAILs but increase some other 
FAILs.

CC to Richi to see more reasonable suggestions.



juzhe.zh...@rivai.ai
 
发件人： Maciej W. Rozycki
发送时间： 2023-10-10 06:38
收件人： 钟居哲
抄送： gcc-patches; Jeff Law; rdapp.gcc; kito.cheng
主题： Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Tue, 10 Oct 2023, 钟居哲 wrote:
 
> Btw, could you rebase to the trunk and run regression again?
 
Full regression-testing takes roughly 40 hours here and I do not normally
update the tree midway through my work so as not to add variables and end 
up chasing a moving target, especially with such an unstable state that we 
have ended up with recently with the RISC-V port.  Since I'm done with 
this part I can refresh and schedule another run if you are curious as to 
how it looks like from my side.  For the C subset alone it'll take less.
 
  Maciej

Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-10 Thread juzhe.zh...@rivai.ai

Great ! I am gonna wait for Richi's  approval.



juzhe.zh...@rivai.ai
 
From: Andrew Stubbs
Date: 2023-10-10 17:40
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de; jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV
On 10/10/2023 02:39, Juzhe-Zhong wrote:
> Here is the reference comparing dump IR between ARM SVE and RVV.
> 
> https://godbolt.org/z/zqess8Gss
> 
> We can see RVV has one more dump IR:
> optimized: basic block part vectorized using 128 byte vectors
> since RVV has 1024 bit vectors.
> 
> The codegen is reasonable good.
> 
> However, I saw GCN also has 1024 bit vector.
> This patch may cause this case FAIL in GCN port ?
> 
> Hi, GCN folk, could you check this patch in GCN port for me ?
 
This patch *fixes* an existing test fail on GCN. :)
 
It's probably one of the many I've never had time to analyze (and 
optimizing more than expected makes it low priority).
 
LGTM
 
Andrew

Re: [PATCH v2 0/4] RISC-V target attribute

2023-10-10 Thread juzhe.zh...@rivai.ai

LGTM on my side.
IMHO, we need to support attribute (rvv_vector_bits) which depend on this 
patch, am I right?

If yes, will you support this feature in GCC-14 release?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-10-10 12:13
To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong
Subject: [PATCH v2 0/4] RISC-V target attribute
This patch set implement target attribute for RISC-V target, which is similar 
to other target like x86 or ARM, let user able to set some local setting per 
function without changing global settings.
 
We support arch, tune and cpu first, and we will support other target attribute 
later, this version DOES NOT include multi-version function support yet, that 
is future work, probably work for GCC 15.
 
The full proposal is put in RISC-V C-API document[1], which has discussed with 
RISC-V LLVM community, so we have consistent syntax and semantics. 
 
[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35
 
v2 changelog:
- Resolve awk multi-dimensional issue.
- Tweak code format
- Tweak testcases

Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-10 Thread juzhe.zh...@rivai.ai

It's weird. Could you give me the FAILs report?

juzhe.zh...@rivai.ai

From: Maciej W. Rozycki
Date: 2023-10-10 18:18
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng
Subject: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Mon, 9 Oct 2023, Maciej W. Rozycki wrote:

> > Btw, could you rebase to the trunk and run regression again?
> 
>  Full regression-testing takes roughly 40 hours here and I do not normally
> update the tree midway through my work so as not to add variables and end 
> up chasing a moving target, especially with such an unstable state that we 
> have ended up with recently with the RISC-V port.  Since I'm done with 
> this part I can refresh and schedule another run if you are curious as to 
> how it looks like from my side.  For the C subset alone it'll take less.

After 10 hours I have now got:

=== gcc Summary ===

# of expected passes 194576
# of unexpected failures 600
# of unexpected successes 11
# of expected failures 1631
# of unresolved testcases 120
# of unsupported tests 3828

as at commit cc5033721553 ("Fixes for profile count/probability 
maintenance"), which is slightly better, but still far from your 92 FAILs.  
NB I ran this testing with `--param=riscv-autovec-preference=scalable'; I 
guess I could have mentioned it.

  Maciej

Re: Re: [PATCH] RISC-V: Enable full coverage vect tests

2023-10-11 Thread juzhe.zh...@rivai.ai

Thanks. Committed.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-10-11 14:54
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enable full coverage vect tests
Hi Juzhe,

seems OK to me.  We don't support most of the patterns directly
but as we can and want to vectorize them it makes sens to enable
the tests.

Regards
Robin

Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-11 Thread juzhe.zh...@rivai.ai

Hi, Maciej.

I have enable all vectorization test on RVV which is committed:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632598.html 

But I have added every test with:
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v])
As you said, you think we don't need to add check_effective_target_riscv_v 
every time.

So, feel free to adjust it (remove check_effective_target_riscv_v) and send a 
patch. 
But I hope you can adjust each set of tests carefully to make every thing 
consistent.

Thanks.


juzhe.zh...@rivai.ai
 
From: Maciej W. Rozycki
Date: 2023-10-11 05:35
To: juzhe.zhong
CC: gcc-patches; jeffreyalaw; Robin Dapp; Kito.cheng
Subject: Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Tue, 10 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> It's weird. Could you give me the FAILs report?
 
I keep forgetting that I have a piece of code in my board description 
files that makes the testsuite leave output files in place, which helps 
much when debugging failures (although it's not a perfect solution for 
test cases like those verified at different optimisation levels where the 
output filename is reused and consequently subsequent outputs overwrite 
earlier ones; something to improve perhaps).  Unfortunately the presence 
of output files confuses some test cases and makes them fail; arguably a 
test case bug.  None of the offending test cases are directly related to 
RISC-V development, so I just ignore the presence of these failures and 
only focus on regressions and progressions between testsuite runs.
 
Here are fresh results with the testsuite output tree made tidy:
 
=== gcc Summary ===
 
# of expected passes 194602
# of unexpected failures 145
# of unexpected successes 11
# of expected failures 1631
# of unresolved testcases 120
# of unsupported tests 3828
 
It probably makes no sense to clutter the mailing list with my FAIL and 
UNRESOLVED results; I can send them off-list if you find them useful.
 
  Maciej

Re: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization

2023-10-11 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-11 16:49
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lrint/lrintf auto vectorization.
 
* long lrint (double) for rv64
* long lrintf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lrint (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrint (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld  fa5,0(a1)
  fcvt.l.d a5,fa5,dyn
  sd   a5,-8(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lrint2): New pattern
for lrint/lintf.
* config/riscv/riscv-protos.h (expand_vec_lrint): New func decl
for expanding lint.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl
for vfcvt.x.f.v.
(expand_vec_lrint): New function impl for expanding lint.
* config/riscv/vector-iterators.md: New mode attr and iterator.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for
CVT like test case.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 +++
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 20 ++
gcc/config/riscv/vector-iterators.md  | 69 +++
.../riscv/rvv/autovec/unop/math-lrint-0.c | 14 
.../riscv/rvv/autovec/unop/math-lrint-1.c | 14 
.../riscv/rvv/autovec/unop/math-lrint-run-0.c | 63 +
.../riscv/rvv/autovec/unop/math-lrint-run-1.c | 63 +
.../riscv/rvv/autovec/unop/test-math.h| 24 +++
.../gcc.target/riscv/rvv/autovec/vls/def.h|  9 +++
.../riscv/rvv/autovec/vls/math-lrint-0.c  | 30 
.../riscv/rvv/autovec/vls/math-lrint-1.c  | 30 
12 files changed, 348 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 53e9d34eea1..dc76a01d82c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2239,6 +2239,7 @@ (define_expand "avg3_ceil"
;; - round/roundf
;; - trunc/truncf
;; - roundeven/roundevenf
+;; - lrint/lrintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2309,3 +2310,13 @@ (define_expand "roundeven2"
 DONE;
   }
)
+
+(define_expand "lrint2"
+  [(match_operand: 0 "register_operand")
+   (match_operand:V_VLS_FCONVERTL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43426a5326b..f6bd15b47b0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -474,6 +474,7 @@ void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c72e411f125..64f99d85d91 100644
---

Re: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread juzhe.zh...@rivai.ai

Refine the codes in V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632619.html 



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-11 17:03
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter
I suddenly I made a mistake that was lucky un-exposed.
 
https://godbolt.org/z/c3jzrh7or
 
GCC is using 32 bit index offset:
 
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v1,(a1),v1
 
This is wrong since v1 may overflow 32bit after vsll.vi.
 
After this patch:
 
vsext.vf2 v8,v4
vsll.vi v8,v8,2
vluxei64.v v8,(a1),v8
 
Same as Clang.
 
Regression passed. Ok for trunk ?
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Fix offset bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug.
(gather_scatter_valid_offset_p): New function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
 
---
gcc/config/riscv/autovec.md   | 28 +--
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 16 +--
.../autovec/gather-scatter/offset_extend-1.c  | 14 ++
4 files changed, 42 insertions(+), 17 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 41bff3a318f..07607bff71e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -104,7 +104,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -119,7 +119,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -134,7 +134,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -153,7 +153,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -172,7 +172,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -187,7 +187,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands,

Re: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread juzhe.zh...@rivai.ai

Oh. Yes.

Address comment:
V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632623.html 

Use if (inner_offsize < BITS_PER_WORD)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-11 17:50
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter
Hi Juzhe,
 
good that you noticed it now,  I should have caught that
in the review back then...
 
One thing, though:
 
> +  if (inner_offsize < GET_MODE_BITSIZE (GET_MODE (ptr)).to_constant ())
 
Shouldn't ptr always be Pmode i.e. the bitsize == XLEN?
 
Rest LGTM.
 
Regards
Robin

RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread juzhe.zh...@rivai.ai

../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector 
Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX Built-in 
Functions' differ
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in 
Functions' is next for `CORE-V Built-in Functions' in menu but not in sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector 
Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in 
sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in 
Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector 
Intrinsics' differ
../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' 
lacks menu item for `CORE-V Built-in Functions' despite being its Up target
../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in 
Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V Vector 
Intrinsics' differ
In file included from ../../../../gcc/gcc/gensupport.cc:26:0:
../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to hold 
all values of ‘enum rtx_code’
 #define RTX_CODE_BITSIZE 8
  ^
../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro ‘RTX_CODE_BITSIZE’
   ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE;
 ^~~~

make[2]: *** [Makefile:3534: doc/gcc.info] Error 1
make[2]: *** Waiting for unfinished jobs
rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod cpp.pod 
gcov.pod lto-dump.pod
make[2]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc'
make[1]: *** [Makefile:4648: all-gcc] Error 2
make[1]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1'
make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2



juzhe.zh...@rivai.ai

Re: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread juzhe.zh...@rivai.ai

Plz revert it. It blocks development of all targets.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2023-10-12 09:03
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; Kito.cheng; kito.cheng; Robin Dapp
Subject: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions
On Wed, Oct 11, 2023 at 6:01 PM juzhe.zh...@rivai.ai
 wrote:
>
> ../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector 
> Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX 
> Built-in Functions' differ
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in 
> Functions' is next for `CORE-V Built-in Functions' in menu but not in 
> sectioning
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector 
> Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in 
> sectioning
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in 
> Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector 
> Intrinsics' differ
> ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' 
> lacks menu item for `CORE-V Built-in Functions' despite being its Up target
> ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in 
> Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V 
> Vector Intrinsics' differ
> In file included from ../../../../gcc/gcc/gensupport.cc:26:0:
> ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to 
> hold all values of ‘enum rtx_code’
>  #define RTX_CODE_BITSIZE 8
>   ^
> ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro 
> ‘RTX_CODE_BITSIZE’
>ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE;
>  ^~~~
>
> make[2]: *** [Makefile:3534: doc/gcc.info] Error 1
> make[2]: *** Waiting for unfinished jobs
> rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod 
> cpp.pod gcov.pod lto-dump.pod
> make[2]: Leaving directory 
> '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc'
> make[1]: *** [Makefile:4648: all-gcc] Error 2
> make[1]: Leaving directory 
> '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1'
> make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2
 
This is also recorded as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more
than just RISCV; it depends on the version of texinfo that is
installed too.
 
Thanks,
Andrew
 
>
> 
> juzhe.zh...@rivai.ai

Re: [PATCH v1] RISC-V: Support FP irintf auto vectorization

2023-10-11 Thread juzhe.zh...@rivai.ai

LGTM。 Thanks。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 09:52
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP irintf auto vectorization
From: Pan Li 
 
This patch would like to support the FP irintf auto vectorization.
 
* int irintf (float)
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on SF => SI.
 
Given we have code like:
 
void
test_irintf (int *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_irintf (in[i]);
}
 
Before this patch:
.L3:
  ...
  flw  fa5,0(a1)
  fcvt.w.s a5,fa5,dyn
  sw   a5,-4(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
.L3:
  ...
  vle32.v v1,0(a1)
  vfcvt.x.f.v v1,v1
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
 
The rest part like DF => SI/HF => SI will be covered by the hook
TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lrint2): Rename from.
(lrint2): Rename to.
* config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  9 ++-
gcc/config/riscv/vector-iterators.md  | 74 +--
.../riscv/rvv/autovec/unop/math-irint-0.c | 14 
.../riscv/rvv/autovec/unop/math-irint-run-0.c | 63 
.../riscv/rvv/autovec/vls/math-irint-0.c  | 30 
5 files changed, 149 insertions(+), 41 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-irint-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index dc76a01d82c..c3a51e22ceb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2240,6 +2240,7 @@ (define_expand "avg3_ceil"
;; - trunc/truncf
;; - roundeven/roundevenf
;; - lrint/lrintf
+;; - irintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2311,12 +2312,12 @@ (define_expand "roundeven2"
   }
)
-(define_expand "lrint2"
-  [(match_operand: 0 "register_operand")
-   (match_operand:V_VLS_FCONVERTL 1 "register_operand")]
+(define_expand "lrint2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
)
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index bb0c46ea30a..96ddd34c958 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3281,8 +3281,8 @@ (define_mode_attr vnnconvert [
   (V512DI "v512hf")
])
-;; L indicates convert to long
-(define_mode_attr VLCONVERT [
+;; Convert to int, long and long long
+(define_mode_attr V_I_L_LL_CONVERT [
   (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI")
   (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI")
@@ -3298,7 +3298,7 @@ (define_mode_attr VLCONVERT [
   (V512DF "V512DI")
])
-(define_mode_attr vlconvert [
+(define_mode_attr v_i_l_ll_convert [
   (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si")
   (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si")
@@ -3314,40 +3314,40 @@ (define_mode_attr vlconvert [
   (V512DF "v512di")
])
-(define_mode_iterator V_VLS_FCONVERTL [
-  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM4SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT && TARGET_MIN_VLEN > 
32")
-
-  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM4DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-
-  (V1SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 &

Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization

2023-10-11 Thread juzhe.zh...@rivai.ai

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 11:28
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP llrint auto vectorization
From: Pan Li 
 
This patch would like to support the FP llrint auto vectorization.
 
* long long llrint (double)
 
This will be the CVT from DF => DI from the standard name's perpsective,
which has been covered in previous PATCH(es). Thus, this patch only add
some test cases.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-llrint-0.c| 14 +
.../rvv/autovec/unop/math-llrint-run-0.c  | 63 +++
.../riscv/rvv/autovec/unop/test-math.h|  2 +
.../riscv/rvv/autovec/vls/math-llrint-0.c | 30 +
4 files changed, 109 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
new file mode 100644
index 000..2d90d232ba1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_int64_t___builtin_llrint:
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+*/
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
new file mode 100644
index 000..6b69f5568e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
@@ -0,0 +1,63 @@
+/* { dg-do run { target { riscv_v && rv64 } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+int64_t out[ARRAY_SIZE];
+int64_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint)
+TEST_ASSERT (int64_t)
+
+TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llrint (1.2), 1)
+TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llrint (-1.2), 2)
+TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llrint (0.5), 3)
+TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llrint (-0.5), 4)
+TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llrint (0.1), 5)
+TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llrint (-0.1), 6)
+TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llrint (3.0), 7)
+TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llrint (-3.0), 8)
+TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llrint 
(4503599627370495.5), 9)
+TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llrint 
(4503599627370497.0), 10)
+TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llrint 
(-4503599627370495.5), 11)
+TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llrint 
(-4503599627370496.0), 12)
+TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llrint (-0.0), 13)
+TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llrint (-0.0), 14)
+TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llrint 
(9223372036854774784.0), 15)
+TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, __builtin_llrint 
(9223372036854775808.0), 16)
+TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llrint 
(-9223372036854775808.0), 17)
+TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, __builtin_llrint 
(-9223372036854777856.0), 18)
+TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llrint 
(__builtin_inf ()), 19)
+TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llrint 
(-__builtin_inf ()), 20)
+TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (double, int64_t, 1, __builtin_llrint, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 2, __builtin_llrint, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 3, __builtin_llrint, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 4, __builtin_llrint, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t

Re: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization

2023-10-12 Thread juzhe.zh...@rivai.ai

OK




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 16:59
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lround/lroundf auto vectorization.
 
* long lround (double) for rv64
* long lroundf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lroundmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lround (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lround (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld  fa5,0(a1)
  fcvt.l.d a5,fa5,rmm
  sd   a5,-8(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
  frrm a6
  ...
  fsrmi4 // RMM
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrm a6
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lround2): New
pattern for lround/lroundf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lround): New func decl for expanding lround.
* config/riscv/riscv-v.cc (expand_vec_lround): New func impl
for expanding lround.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 +++
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 10 +++
.../riscv/rvv/autovec/unop/math-lround-0.c| 19 +
.../riscv/rvv/autovec/unop/math-lround-1.c| 19 +
.../rvv/autovec/unop/math-lround-run-0.c  | 72 +++
.../rvv/autovec/unop/math-lround-run-1.c  | 72 +++
.../riscv/rvv/autovec/vls/math-lround-0.c | 30 
.../riscv/rvv/autovec/vls/math-lround-1.c | 30 
9 files changed, 264 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ebc51ea69fd..33b11723c21 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2321,3 +2321,13 @@ (define_expand "lrint2"
 DONE;
   }
)
+
+(define_expand "lround2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8c9f7e0ab11..b7eeeb8f55d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+  UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -475,6 +476,7 @@ void expand_vec_round (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a75eb59eb43..b61c745678b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4122,4 +4122,14 @@ expand_vec_lrint (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_DYN, vec_fp_mode);
}
+v

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

I tree-vect-slp.cc:
vect_get_and_check_slp_defs
711: 

  tree type = TREE_TYPE (oprnd);
  dt = dts[i];
  if ((dt == vect_constant_def
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
  && (TREE_CODE (type) == BOOLEAN_TYPE
  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
  type)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: invalid type of def "
 "for variable-length SLP %T\n", oprnd);
  return -1;
}

Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
condition, then SLP failed:
Build SLP failed: invalid type of def




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:44
To: 钟居哲
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, ??? wrote:
 
> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD 
> (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build 
> MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> flow naturally.
> 
> Is it reasonable ?
 
What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?
 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 --
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > char swap = 0)
> >case IFN_MASK_GATHER_LOAD:
> >  return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > + /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >case IFN_MASK_STORE:
> >  return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >  
> >if (cfn == CFN_MASK_LOAD
> >|| cfn == CFN_GATHER_LOAD
> > -   || cfn == CFN_MASK_GATHER_LOAD)
&g

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Hi, Richi.

I restrict as you said into vect_external_def.

Then this condition made SLP failed:

-  if (mask_index >= 0
+  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
  &mask, NULL, &mask_dt, &mask_vectype))
return false;

So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not 
check scalar mask.

Then ICE here:

vect_slp_analyze_node_operations
if (child
  && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
  || SLP_TREE_DEF_TYPE (child) == vect_external_def)
  /* Perform usual caching, note code-generation still
 code-gens these nodes multiple times but we expect
 to CSE them later.  */
  && !visited_set.add (child))
{
  visited_vec.safe_push (child);
  /* ???  After auditing more code paths make a "default"
 and push the vector type from NODE to all children
 if it is not already set.  */
  /* Compute the number of vectors to be generated.  */
  tree vector_type = SLP_TREE_VECTYPE (child);
  if (!vector_type)
{
  /* For shifts with a scalar argument we don't need
 to cost or code-generate anything.
 ???  Represent this more explicitely.  */
  gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) 
> assert FAILed.
   == shift_vec_info_type)
  && j == 1);
  continue;
}

Could you help me with that?


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Oh. I see.

Here make vect_constant_def failed to SLP:

tree-vect-slp.cc:
vect_build_slp_tree_2
line 2354:

  if (oprnd_info->first_dt == vect_external_def
  || oprnd_info->first_dt == vect_constant_def)
{
  slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
  SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
  oprnd_info->ops = vNULL;
  children.safe_push (invnode);
  continue;
}

It seems that we handle vect_constant_def same as vect_external_def.
So failed to SLP ?



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> >

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

In tree-vect-stmts.cc

vect_check_scalar_mask

Failed here:

  /* If the caller is not prepared for adjusting an external/constant
 SLP mask vector type fail.  */
  if (slp_node
  && !mask_node
  && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "SLP mask argument is not vectorized.\n");
  return false;
}

If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
caller "vectorizable_load" ?

But I don't know how to adjust that.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> &g

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

The mask node is NULL since the caller :

  if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
  &mask, NULL, &mask_dt, &mask_vectype))
return false;

pass NULL to mask_node.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node
 
^^^
 
where's the mask_node?
 
>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RI

Re: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization

2023-10-12 Thread juzhe.zh...@rivai.ai

OK.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lfloor/lfloorf auto vectorization.
 
* long lfloor (double) for rv64
* long lfloorf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lfloormn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lfloor (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lfloor (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld fa5,0(a1)
  fcvt.l.da5,fa5,rdn
  sd  a5,-8(a0)
  ...
  bne a1,a4,.L3
 
After this patch:
  frrma6
  ...
  fsrmi   2 // RDN
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrma6
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lfloor2): New
pattern for lfloor/lfloorf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lfloor): New func decl for expanding lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl
for expanding lfloor.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 +++
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 10 +++
.../riscv/rvv/autovec/unop/math-lfloor-0.c| 19 +
.../riscv/rvv/autovec/unop/math-lfloor-1.c| 19 +
.../rvv/autovec/unop/math-lfloor-run-0.c  | 69 +++
.../rvv/autovec/unop/math-lfloor-run-1.c  | 69 +++
.../riscv/rvv/autovec/vls/math-lfloor-0.c | 30 
.../riscv/rvv/autovec/vls/math-lfloor-1.c | 30 
9 files changed, 259 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 267691a0095..c5b1e52cbf9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2242,6 +2242,7 @@ (define_expand "avg3_ceil"
;; - lrint/lrintf
;; - irintf
;; - lceil/lceilf
+;; - lfloor/lfloorf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2342,3 +2343,13 @@ (define_expand "lceil2"
 DONE;
   }
)
+
+(define_expand "lfloor2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ab65ab19524..49bdcdf2f93 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -304,6 +304,7 @@ enum insn_type : unsigned int
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
+  UNARY_OP_FRM_RDN = UNARY_OP | FRM_RDN_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -479,6 +480,7 @@ void expand_vec_roundeven (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/r

Re: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases

2023-10-12 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 10:22
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases
From: Pan Li 
 
Leverage stdint-gcc.h for the int64_t types instead of typedef.
Or we may have conflict with stdint-gcc.h in somewhere else.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include
stdint-gcc.h for int types.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t
typedef.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c | 1 +
.../gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c   | 1 +
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h | 2 --
3 files changed, 2 insertions(+), 2 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
index 2d90d232ba1..4bf125f8cc8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
@@ -2,6 +2,7 @@
/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
/* { dg-final { check-function-bodies "**" "" } } */
+#include 
#include "test-math.h"
/*
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
index 6b69f5568e9..409175a8dff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
@@ -1,6 +1,7 @@
/* { dg-do run { target { riscv_v && rv64 } } } */
/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+#include 
#include "test-math.h"
#define ARRAY_SIZE 128
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
index 3867bc50a14..a1c9d55bd48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
@@ -68,8 +68,6 @@
#define FRM_RMM 4
#define FRM_DYN 7
-typedef long long int64_t;
-
static inline void
set_rm (unsigned rm)
{
-- 
2.34.1

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Hi, Richi. 

As you suggest, I keep MAK_LEN_GATHER_LOAD (...,-1) format and support SLP for 
that in V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632846.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node
 
^^^
 
where's the mask_node?
 
>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-obje

Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

2023-10-12 Thread juzhe.zh...@rivai.ai

Thanks. Committed.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-10-13 14:01
To: Juzhe-Zhong
CC: GCC Patches; Jeff Law; Richard Biener
Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV
LGTM 

Juzhe-Zhong  於 2023年10月12日 週四 22:45 寫道：
Like ARM SVE and GCN, add RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Add RVV.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
index b348526b62f..f63b42a271a 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
@@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2)
 /* Disable for SVE because for long or variable-length vectors we don't
get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
because there we can vectorize the epilogue using mixed vector sizes.
-   Likewise for AMD GCN.  */
-/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! amdgcn*-*-* 
} } } } } */
+   Likewise for AMD GCN and RVV.  */
+/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! 
amdgcn*-*-* } && { ! riscv_v } } } } } } */
-- 
2.36.3

Re: [PATCH v1] RISC-V: Add test for FP llround auto vectorization

2023-10-12 Thread juzhe.zh...@rivai.ai

OK




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 14:15
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test for FP llround auto vectorization
From: Pan Li 
 
The below FP API are supported already by sharing the same standard
name, as well as the machine mode.
 
long long llround (double);
 
This patch would like to add the test cases for ensuring the correctness.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-llround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llround-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-llround-0.c   | 20 ++
.../rvv/autovec/unop/math-llround-run-0.c | 64 +++
.../riscv/rvv/autovec/vls/math-llround-0.c| 30 +
3 files changed, 114 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llround-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c
new file mode 100644
index 000..4f8b4553a91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-0.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+#include "test-math.h"
+
+/*
+** test_double_int64_t___builtin_llround:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+4
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c
new file mode 100644
index 000..c5b60847cc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c
@@ -0,0 +1,64 @@
+/* { dg-do run { target { riscv_v && rv64 } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include 
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+int64_t out[ARRAY_SIZE];
+int64_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround)
+TEST_ASSERT (int64_t)
+
+TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llround (1.2), 1)
+TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llround (-1.2), 2)
+TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llround (0.5), 3)
+TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llround (-0.5), 4)
+TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llround (0.1), 5)
+TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llround (-0.1), 6)
+TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llround (3.0), 7)
+TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llround (-3.0), 8)
+TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llround 
(4503599627370495.5), 9)
+TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llround 
(4503599627370497.0), 10)
+TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llround 
(-4503599627370495.5), 11)
+TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llround 
(-4503599627370496.0), 12)
+TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llround (-0.0), 13)
+TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llround (-0.0), 14)
+TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llround 
(9223372036854774784.0), 15)
+TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16)
+TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llround 
(-9223372036854775808.0), 17)
+TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18)
+TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llround 
(__builtin_inf ()), 19)
+TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llround 
(-__builtin_inf ()), 20)
+TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (double, int64_t, 1, __builtin_llround, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 2, __builtin_llround, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 3, __builtin_llround, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 4, __builtin_llround, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 5, __builtin_llround, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int6

Re: [PATCH v1] RISC-V: Add test for FP llceil auto vectorization

2023-10-13 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 15:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test for FP llceil auto vectorization
From: Pan Li 
 
The below FP API are supported already by sharing the same standard
name, as well as the machine mode.
 
long long llceil (double);
 
This patch would like to add the test cases for ensuring the
correctness.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-llceil-0.c| 20 ++
.../rvv/autovec/unop/math-llceil-run-0.c  | 64 +++
.../riscv/rvv/autovec/vls/math-llceil-0.c | 30 +
3 files changed, 114 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c
new file mode 100644
index 000..3480c3ea91d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+#include "test-math.h"
+
+/*
+** test_double_int64_t___builtin_llceil:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llceil)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c
new file mode 100644
index 000..5ccbe64ffb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c
@@ -0,0 +1,64 @@
+/* { dg-do run { target { riscv_v && rv64 } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include 
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+int64_t out[ARRAY_SIZE];
+int64_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llceil)
+TEST_ASSERT (int64_t)
+
+TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llceil (1.2), 1)
+TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llceil (-1.2), 2)
+TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llceil (0.5), 3)
+TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llceil (-0.5), 4)
+TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llceil (0.1), 5)
+TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llceil (-0.1), 6)
+TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llceil (3.0), 7)
+TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llceil (-3.0), 8)
+TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llceil 
(4503599627370495.5), 9)
+TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llceil 
(4503599627370497.0), 10)
+TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llceil 
(-4503599627370495.5), 11)
+TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llceil 
(-4503599627370496.0), 12)
+TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llceil (-0.0), 13)
+TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llceil (-0.0), 14)
+TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llceil 
(9223372036854774784.0), 15)
+TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16)
+TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llceil 
(-9223372036854775808.0), 17)
+TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18)
+TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llceil 
(__builtin_inf ()), 19)
+TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llceil 
(-__builtin_inf ()), 20)
+TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (double, int64_t, 1, __builtin_llceil, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 2, __builtin_llceil, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 3, __builtin_llceil, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 4, __builtin_llceil, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 5, __builtin_llceil, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 6, __builtin_llceil, in, out, ref, 
AR

Re: [PATCH v1] RISC-V: Add test for FP iceil auto vectorization

2023-10-13 Thread juzhe.zh...@rivai.ai

Ok



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 16:06
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test for FP iceil auto vectorization
From: Pan Li 
 
The below FP API are supported already by sharing the same standard
name, as well as the machine mode.
 
int iceil (float);
 
This patch would like to add the test cases for ensuring the
correctness.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-iceil-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-iceil-0.c | 19 ++
.../riscv/rvv/autovec/unop/math-iceil-run-0.c | 63 +++
.../riscv/rvv/autovec/vls/math-iceil-0.c  | 30 +
3 files changed, 112 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-iceil-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c
new file mode 100644
index 000..2d4a1d163d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_int___builtin_iceilf:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (float, int, __builtin_iceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c
new file mode 100644
index 000..714173a7f8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c
@@ -0,0 +1,63 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+int out[ARRAY_SIZE];
+int ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (float, int, __builtin_iceilf)
+TEST_ASSERT (int)
+
+TEST_INIT_CVT (float, 1.2, int, __builtin_iceilf (1.2), 1)
+TEST_INIT_CVT (float, -1.2, int, __builtin_iceilf (-1.2), 2)
+TEST_INIT_CVT (float, 0.5, int, __builtin_iceilf (0.5), 3)
+TEST_INIT_CVT (float, -0.5, int, __builtin_iceilf (-0.5), 4)
+TEST_INIT_CVT (float, 0.1, int, __builtin_iceilf (0.1), 5)
+TEST_INIT_CVT (float, -0.1, int, __builtin_iceilf (-0.1), 6)
+TEST_INIT_CVT (float, 3.0, int, __builtin_iceilf (3.0), 7)
+TEST_INIT_CVT (float, -3.0, int, __builtin_iceilf (-3.0), 8)
+TEST_INIT_CVT (float, 8388607.5, int, __builtin_iceilf (8388607.5), 9)
+TEST_INIT_CVT (float, 8388609.0, int, __builtin_iceilf (8388609.0), 10)
+TEST_INIT_CVT (float, -8388607.5, int, __builtin_iceilf (-8388607.5), 11)
+TEST_INIT_CVT (float, -8388609.0, int, __builtin_iceilf (-8388609.0), 12)
+TEST_INIT_CVT (float, 0.0, int, __builtin_iceilf (-0.0), 13)
+TEST_INIT_CVT (float, -0.0, int, __builtin_iceilf (-0.0), 14)
+TEST_INIT_CVT (float, 2147483520.0, int, __builtin_iceilf (2147483520.0), 15)
+TEST_INIT_CVT (float, 2147483648.0, int, 0x7fff, 16)
+TEST_INIT_CVT (float, -2147483648.0, int, __builtin_iceilf (-2147483648.0), 17)
+TEST_INIT_CVT (float, -2147483904.0, int, 0x8000, 18)
+TEST_INIT_CVT (float, __builtin_inf (), int, __builtin_iceilf (__builtin_inff 
()), 19)
+TEST_INIT_CVT (float, -__builtin_inf (), int, __builtin_iceilf 
(-__builtin_inff ()), 20)
+TEST_INIT_CVT (float, __builtin_nanf (""), int, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (float, int, 1, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 2, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 3, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 4, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 5, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 6, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 7, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 8, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 9, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 10, __builtin_iceilf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 11, __builtin_iceilf, in, out, r

Re: [PATCH v1] RISC-V: Add test for FP ifloor auto vectorization

2023-10-13 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 16:23
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test for FP ifloor auto vectorization
From: Pan Li 
 
The below FP API are supported already by sharing the same standard
name, as well as the machine mode.
 
int ifloor (float);
 
This patch would like to add the test cases for ensuring the
correctness.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ifloor-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ifloor-0.c| 19 ++
.../rvv/autovec/unop/math-ifloor-run-0.c  | 63 +++
.../riscv/rvv/autovec/vls/math-ifloor-0.c | 30 +
3 files changed, 112 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ifloor-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c
new file mode 100644
index 000..b9ec415d690
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_int___builtin_ifloorf:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+2
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (float, int, __builtin_ifloorf)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c
new file mode 100644
index 000..8ef4da0ea88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c
@@ -0,0 +1,63 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+int out[ARRAY_SIZE];
+int ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (float, int, __builtin_ifloorf)
+TEST_ASSERT (int)
+
+TEST_INIT_CVT (float, 1.2, int, __builtin_ifloorf (1.2), 1)
+TEST_INIT_CVT (float, -1.2, int, __builtin_ifloorf (-1.2), 2)
+TEST_INIT_CVT (float, 0.5, int, __builtin_ifloorf (0.5), 3)
+TEST_INIT_CVT (float, -0.5, int, __builtin_ifloorf (-0.5), 4)
+TEST_INIT_CVT (float, 0.1, int, __builtin_ifloorf (0.1), 5)
+TEST_INIT_CVT (float, -0.1, int, __builtin_ifloorf (-0.1), 6)
+TEST_INIT_CVT (float, 3.0, int, __builtin_ifloorf (3.0), 7)
+TEST_INIT_CVT (float, -3.0, int, __builtin_ifloorf (-3.0), 8)
+TEST_INIT_CVT (float, 8388607.5, int, __builtin_ifloorf (8388607.5), 9)
+TEST_INIT_CVT (float, 8388609.0, int, __builtin_ifloorf (8388609.0), 10)
+TEST_INIT_CVT (float, -8388607.5, int, __builtin_ifloorf (-8388607.5), 11)
+TEST_INIT_CVT (float, -8388609.0, int, __builtin_ifloorf (-8388609.0), 12)
+TEST_INIT_CVT (float, 0.0, int, __builtin_ifloorf (-0.0), 13)
+TEST_INIT_CVT (float, -0.0, int, __builtin_ifloorf (-0.0), 14)
+TEST_INIT_CVT (float, 2147483520.0, int, __builtin_ifloorf (2147483520.0), 15)
+TEST_INIT_CVT (float, 2147483648.0, int, 0x7fff, 16)
+TEST_INIT_CVT (float, -2147483648.0, int, __builtin_ifloorf (-2147483648.0), 
17)
+TEST_INIT_CVT (float, -2147483904.0, int, 0x8000, 18)
+TEST_INIT_CVT (float, __builtin_inf (), int, __builtin_ifloorf (__builtin_inff 
()), 19)
+TEST_INIT_CVT (float, -__builtin_inf (), int, __builtin_ifloorf 
(-__builtin_inff ()), 20)
+TEST_INIT_CVT (float, __builtin_nanf (""), int, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (float, int, 1, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 2, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 3, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 4, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 5, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 6, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 7, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 8, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 9, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 10, __builtin_ifloorf, in, out, ref, ARRAY_SIZE);
+  RUN

Re: [PATCH v1] RISC-V: Add test for FP llfloor auto vectorization

2023-10-13 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 17:49
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test for FP llfloor auto vectorization
From: Pan Li 
 
The below FP API are supported already by sharing the same standard
name, as well as the machine mode.
 
long long llfloor (double);
 
This patch would like to add the test cases for ensuring the
correctness.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llfloor-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-llfloor-0.c   | 20 ++
.../rvv/autovec/unop/math-llfloor-run-0.c | 64 +++
.../riscv/rvv/autovec/vls/math-llfloor-0.c| 30 +
3 files changed, 114 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llfloor-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c
new file mode 100644
index 000..4b10f966015
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+#include "test-math.h"
+
+/*
+** test_double_int64_t___builtin_llfloor:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+2
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llfloor)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c
new file mode 100644
index 000..22829132e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c
@@ -0,0 +1,64 @@
+/* { dg-do run { target { riscv_v && rv64 } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include 
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+int64_t out[ARRAY_SIZE];
+int64_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llfloor)
+TEST_ASSERT (int64_t)
+
+TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llfloor (1.2), 1)
+TEST_INIT_CVT (double, -1.2, int64_t, __builtin_llfloor (-1.2), 2)
+TEST_INIT_CVT (double, 0.5, int64_t, __builtin_llfloor (0.5), 3)
+TEST_INIT_CVT (double, -0.5, int64_t, __builtin_llfloor (-0.5), 4)
+TEST_INIT_CVT (double, 0.1, int64_t, __builtin_llfloor (0.1), 5)
+TEST_INIT_CVT (double, -0.1, int64_t, __builtin_llfloor (-0.1), 6)
+TEST_INIT_CVT (double, 3.0, int64_t, __builtin_llfloor (3.0), 7)
+TEST_INIT_CVT (double, -3.0, int64_t, __builtin_llfloor (-3.0), 8)
+TEST_INIT_CVT (double, 4503599627370495.5, int64_t, __builtin_llfloor 
(4503599627370495.5), 9)
+TEST_INIT_CVT (double, 4503599627370497.0, int64_t, __builtin_llfloor 
(4503599627370497.0), 10)
+TEST_INIT_CVT (double, -4503599627370495.5, int64_t, __builtin_llfloor 
(-4503599627370495.5), 11)
+TEST_INIT_CVT (double, -4503599627370496.0, int64_t, __builtin_llfloor 
(-4503599627370496.0), 12)
+TEST_INIT_CVT (double, 0.0, int64_t, __builtin_llfloor (-0.0), 13)
+TEST_INIT_CVT (double, -0.0, int64_t, __builtin_llfloor (-0.0), 14)
+TEST_INIT_CVT (double, 9223372036854774784.0, int64_t, __builtin_llfloor 
(9223372036854774784.0), 15)
+TEST_INIT_CVT (double, 9223372036854775808.0, int64_t, 0x7fff, 16)
+TEST_INIT_CVT (double, -9223372036854775808.0, int64_t, __builtin_llfloor 
(-9223372036854775808.0), 17)
+TEST_INIT_CVT (double, -9223372036854777856.0, int64_t, 0x8000, 18)
+TEST_INIT_CVT (double, __builtin_inf (), int64_t, __builtin_llfloor 
(__builtin_inf ()), 19)
+TEST_INIT_CVT (double, -__builtin_inf (), int64_t, __builtin_llfloor 
(-__builtin_inf ()), 20)
+TEST_INIT_CVT (double, __builtin_nan (""), int64_t, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (double, int64_t, 1, __builtin_llfloor, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 2, __builtin_llfloor, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 3, __builtin_llfloor, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 4, __builtin_llfloor, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int64_t, 5, __builtin_llfloor, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (double, int6

Re: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

2023-10-16 Thread juzhe.zh...@rivai.ai

Thanks Robin.

Committed.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-10-16 17:12
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller 
than VLS mode elements.
Hi Juzhe,

this LGTM.  I was first concerned whether we would want to
stop e.g. at LMUL = 1 and only continue with a specific flag but
actually this should be done via the costs.  If an implementation
wants to penalize or incentivize some behavior it can always
adjust the costs which should be sufficient.

Regards
Robin

Re: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread juzhe.zh...@rivai.ai

V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633120.html
with some bug fix.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-16 11:57
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model 
for non-adjacent load/store
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}
 
Before this patch:
 
bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret
 
We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:
 
_5 = *_4;
 
We didn't check whether it is a continguous load/store or gather/scatter 
load/store
 
Since it will be translate into:
 
   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)
 
It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.
 
So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.
 
The key of this patch is:
 
  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}
 
Add one more register consumption if it is not an adjacent load/store.
 
After this patch, it pick LMUL = 2 which is optimal:
 
bar:
ble a3,zero,.L4
csrr a6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srli a2,a6,1
vmv.v.x v4,a1
vid.v v12
slli a3,a3,1
vand.vi v0,v12,1
addi t1,a2,-1
vmseq.vi v0,v0,1
slli a6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minu a4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vv v2,v16,v6
bgtu a4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li a3,-1
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v6,v4,v2,v0
vredsum.vs v6,v6,v1
vmul.vx v0,v12,a3
vadd.vi v0,v0,-1
van

[PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-16 Thread juzhe.zh...@rivai.ai

Hi, Richard.

>> Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
>> it solve some of these problems?
Yes, IFN_COND_LEN make sense to RVV. We have vmerge instruction which depending 
on VL/AVL.

I must say my internal RVV GCC has IFN_LEN_VCOND_MASK which simplify

COND_LEN_ADD (mask, a, 0, b, len, bias) into LEN_VCOND_MASK (mask, a, b, len, 
bias)

I think upstream GCC could consider this approach.

Thanks.


juzhe.zh...@rivai.ai

Re: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread juzhe.zh...@rivai.ai

Hi, Richard.

>> slp_op and mask_vectype are only initialised when mask_index >= 0.
>>Shouldn't this code be under mask_index >= 0 too?
 
>>Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
>>node has a known vectype by this point.  I think a comment would be useful.

Address comment and I think we won't encounter mismatch mask_vectypes.

So, I changed code in V4 as follows:
+  if (mask_index >= 0 && slp_node)
+   {
+ bool match_p
+   = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
+ gcc_assert (match_p);
+   }

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633209.html 

Assert we always match mask_vectype.

Tested on RISC-V and bootstrap && regtest on X86 passed.

Could you confirm it ?


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-17 05:34
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Juzhe-Zhong  writes:
> This patch fixes this following FAILs in RISC-V regression:
>
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
> condtional mask).
>
>This situation we just need to leverage the current MASK_GATHER_LOAD which 
> can achieve SLP MASK_LEN_GATHER_LOAD.
>
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, 
> zero, -1)
>
>Current SLP check will failed on dummy mask -1, so we relax the check in 
> tree-vect-slp.cc and allow it to be materialized.
> 
> Consider this following case:
>
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   y[i * 2] = x[indices[i * 2]] + 1;
>   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>
> https://godbolt.org/z/WG3M3n7Mo
>
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> sllia6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v1,0(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v1
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> mv a6,a3
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v2,0(a0)
> add a2,a2,t1
> add a0,a0,t1
> add a3,a3,t3
> bgtu a6,a7,.L3
> .L5:
> ret
>
> Note that I found we are missing conditional mask gather_load SLP test, 
> Append a test for it in this patch.
 
Yeah, we're missing a target-independent test.  I'm afraid I used
aarch64-specific tests for a lot of this stuff, since (a) I wanted
to check the quality of the asm output and (b) it's very hard to write
gcc.dg/vect tests that don't fail on some target or other.  Thanks for
picking this up.
 
>
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_get_operand_map): Add M

Re: [PATCH] RISC-V: Fix failed testcase when use -cmodel=medany

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 17:57
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Fix failed testcase when use -cmodel=medany
This little path fix a failed testcase when use -cmodel=medany.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/cpymem-1.c: Split check.
 
---
gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c | 12 +++-
1 file changed, 11 insertions(+), 1 deletion(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
index 9bb4904e8e9..549d6648104 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
@@ -50,7 +50,7 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l)
Use extern here so that we get a known alignment, lest
DATA_ALIGNMENT force us to make the scan pattern accomodate
code for different alignments depending on word size.
-** f3:
+** f3: { target { any-opts "-mcmodel=medlow" } }
**lui\s+[ta][0-7],%hi\(a_a\)
**lui\s+[ta][0-7],%hi\(a_b\)
**addi\s+a4,[ta][0-7],%lo\(a_b\)
@@ -61,6 +61,16 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l)
**ret
*/
 
+/*
+** f3: { target { any-opts "-mcmodel=medany" } }
+**lla\s+[ta][0-7],a_b
+**vsetivli\s+zero,16,e32,m4,ta,ma
+**vle32.v\s+v\d+,0\([ta][0-7]\)
+**lla\s+[ta][0-7],a_a
+**vse32\.v\s+v\d+,0\([ta][0-7]\)
+**ret
+*/
+
extern struct { __INT32_TYPE__ a[16]; } a_a, a_b;
 
void f3 ()
--
2.36.3

Re: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

2023-10-17 Thread juzhe.zh...@rivai.ai

Committed.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-17 15:30
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug 
fix[PR111832]
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:
 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html
 
which is caused by assertion FAIL.
 
When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832
 
Now, we enable more tests in rvv.exp in this patch and fix the bug.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.
 
---
gcc/config/riscv/riscv-vector-costs.cc | 19 +--
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 10 --
2 files changed, 21 insertions(+), 8 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 33061efb1d0..af87388a1e4 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -154,6 +154,14 @@ compute_local_program_points (
 }
}
+static machine_mode
+get_biggest_mode (machine_mode mode1, machine_mode mode2)
+{
+  unsigned int mode1_size = GET_MODE_BITSIZE (mode1).to_constant ();
+  unsigned int mode2_size = GET_MODE_BITSIZE (mode2).to_constant ();
+  return mode1_size >= mode2_size ? mode1 : mode2;
+}
+
/* Compute local live ranges of each vectorized variable.
Note that we only compute local live ranges (within a block) since
local live ranges information is accurate enough for us to determine
@@ -201,12 +209,12 @@ compute_local_live_ranges (
{
  unsigned int point = program_point.point;
  gimple *stmt = program_point.stmt;
-   machine_mode mode = biggest_mode;
  tree lhs = gimple_get_lhs (stmt);
  if (lhs != NULL_TREE && is_gimple_reg (lhs)
  && !POINTER_TYPE_P (TREE_TYPE (lhs)))
{
-   mode = TYPE_MODE (TREE_TYPE (lhs));
+   biggest_mode = get_biggest_mode (biggest_mode,
+TYPE_MODE (TREE_TYPE (lhs)));
  bool existed_p = false;
  pair &live_range
= live_ranges->get_or_insert (lhs, &existed_p);
@@ -225,7 +233,9 @@ compute_local_live_ranges (
 the future.  */
  if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
{
-   mode = TYPE_MODE (TREE_TYPE (var));
+   biggest_mode
+ = get_biggest_mode (biggest_mode,
+ TYPE_MODE (TREE_TYPE (var)));
  bool existed_p = false;
  pair &live_range
= live_ranges->get_or_insert (var, &existed_p);
@@ -238,9 +248,6 @@ compute_local_live_ranges (
live_range = pair (0, point);
}
}
-   if (GET_MODE_SIZE (mode).to_constant ()
-   > GET_MODE_SIZE (biggest_mode).to_constant ())
- biggest_mode = mode;
}
  if (dump_enabled_p ())
for (hash_map::iterator iter = live_ranges->begin ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index ff76e17d0e6..674ba0d72b4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -58,10 +58,12 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O3 --param riscv-autovec-lmul=dynamic} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} \
-  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} ]
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic} ]
foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/partial/*.\[cS\]]] \
 "" "$op"
@@ -104,18 +106,22 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=dynamic -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=

Re: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager

2023-10-17 Thread juzhe.zh...@rivai.ai

+  demand_system dem;
+  auto_vec vector_block_infos;
+
+  /* data for avl reaching defintion.  */
+  sbitmap avl_regs;
+  sbitmap *avl_def_in;
+  sbitmap *avl_def_out;
+  sbitmap *reg_def_loc;
+
+  /* data for vsetvl info reaching defintion.  */
+  vsetvl_info unknow_info;
+  auto_vec vsetvl_def_exprs;
+  sbitmap *vsetvl_def_in;
+  sbitmap *vsetvl_def_out;
+
+  /* data for lcm */
+  auto_vec exprs;
+  sbitmap *avloc;
+  sbitmap *avin;
+  sbitmap *avout;
+  sbitmap *kill;
+  sbitmap *antloc;
+  sbitmap *transp;
+  sbitmap *insert;
+  sbitmap *del;
+  struct edge_list *edges;
+
+  auto_vec delete_list;

All of them add "m_" prefix.

earliest_fusion_worthwhile_p -> successors_probability_equal_p

calculate_dominance_info (CDI_POST_DOMINATORS); > remove
free_dominance_info (CDI_POST_DOMINATORS); ---> remove



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager
This sub-patch refactor vector_infos_manager to a pre_vsetvl class
which is responsible for the entire lazy vsetvl jobs. There is no need
to introduce a separate vsetvl infos manager, because vsetvl infos are
modified by the optimization code.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (vector_infos_manager::vector_infos_manager): 
Removed.
(class pre_vsetvl): New class.
(vector_infos_manager::create_expr): Removed.
(vector_infos_manager::get_expr_id): Removed.
(vector_infos_manager::all_same_ratio_p): Removed.
(vector_infos_manager::all_avail_in_compatible_p): Removed.
(vector_infos_manager::all_same_avl_p): Removed.
(vector_infos_manager::expr_set_num): Removed.
(vector_infos_manager::release): Removed.
(vector_infos_manager::create_bitmap_vectors): Removed.
(vector_infos_manager::free_bitmap_vectors): Removed.
(vector_infos_manager::dump): Removed.
* config/riscv/riscv-vsetvl.h (class vector_infos_manager): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 632 +--
gcc/config/riscv/riscv-vsetvl.h  |  75 
2 files changed, 257 insertions(+), 450 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index be40b6fdf4c..c219ad178bb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2390,402 +2390,284 @@ public:
   }
};
 
-vector_infos_manager::vector_infos_manager ()
+class pre_vsetvl
{
-  vector_edge_list = nullptr;
-  vector_kill = nullptr;
-  vector_del = nullptr;
-  vector_insert = nullptr;
-  vector_antic = nullptr;
-  vector_transp = nullptr;
-  vector_comp = nullptr;
-  vector_avin = nullptr;
-  vector_avout = nullptr;
-  vector_antin = nullptr;
-  vector_antout = nullptr;
-  vector_earliest = nullptr;
-  vector_insn_infos.safe_grow_cleared (get_max_uid ());
-  vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
-  if (!optimize)
-{
-  basic_block cfg_bb;
-  rtx_insn *rinsn;
-  FOR_ALL_BB_FN (cfg_bb, cfun)
- {
-   vector_block_infos[cfg_bb->index].local_dem = vector_insn_info ();
-   vector_block_infos[cfg_bb->index].reaching_out = vector_insn_info ();
-   FOR_BB_INSNS (cfg_bb, rinsn)
- vector_insn_infos[INSN_UID (rinsn)].parse_insn (rinsn);
- }
-}
-  else
-{
-  for (const bb_info *bb : crtl->ssa->bbs ())
- {
-   vector_block_infos[bb->index ()].local_dem = vector_insn_info ();
-   vector_block_infos[bb->index ()].reaching_out = vector_insn_info ();
-   for (insn_info *insn : bb->real_insns ())
- vector_insn_infos[insn->uid ()].parse_insn (insn);
-   vector_block_infos[bb->index ()].probability = profile_probability ();
- }
-}
-}
-
-void
-vector_infos_manager::create_expr (vector_insn_info &info)
-{
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return;
-  vector_exprs.safe_push (&info);
-}
-
-size_t
-vector_infos_manager::get_expr_id (const vector_insn_info &info) const
-{
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return i;
-  gcc_unreachable ();
-}
-
-auto_vec
-vector_infos_manager::get_all_available_exprs (
-  const vector_insn_info &info) const
-{
-  auto_vec available_list;
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (info.available_p (*vector_exprs[i]))
-  available_list.safe_push (i);
-  return available_list;
-}
-
-bool
-vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
-{
-  if (bitmap_empty_p (bitdata))
-return false;
-
-  int ratio = -1;
-  unsigned int bb_index;
-  sbitmap_iterator sbi;
-
-  EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-{
-  if (ratio == -1)
- ratio = vector_exprs[bb_index]->get_ratio ();
-  else if (vector_exprs[bb_index]->get_ratio () != ratio)
- return false;
-}
-  return true;
-}
-
-/* Return TRUE if the incoming vector configurat

Re: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl

2023-10-17 Thread juzhe.zh...@rivai.ai

LGMT this patch.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
This sub-patch remove the method about optimize vsetvl infos into
class pre_vsetvl.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): Removed.
(pass_vsetvl::get_block_info): Removed.
(pass_vsetvl::update_vector_info): Removed.
(pass_vsetvl::update_block_info): Removed.
(pass_vsetvl::simple_vsetvl): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
(pass_vsetvl::execute): Removed.
(make_pass_vsetvl): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 228 ---
1 file changed, 87 insertions(+), 141 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index c219ad178bb..3f07fde782f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2684,54 +2684,8 @@ const pass_data pass_data_vsetvl = {
class pass_vsetvl : public rtl_opt_pass
{
private:
-  vector_infos_manager *m_vector_manager;
-
-  const vector_insn_info &get_vector_info (const rtx_insn *) const;
-  const vector_insn_info &get_vector_info (const insn_info *) const;
-  const vector_block_info &get_block_info (const basic_block) const;
-  const vector_block_info &get_block_info (const bb_info *) const;
-  vector_block_info &get_block_info (const basic_block);
-  vector_block_info &get_block_info (const bb_info *);
-  void update_vector_info (const insn_info *, const vector_insn_info &);
-  void update_block_info (int, profile_probability, const vector_insn_info &);
-
-  void simple_vsetvl (void) const;
-  void lazy_vsetvl (void);
-
-  /* Phase 1.  */
-  void compute_local_backward_infos (const bb_info *);
-
-  /* Phase 2.  */
-  bool need_vsetvl (const vector_insn_info &, const vector_insn_info &) const;
-  void transfer_before (vector_insn_info &, insn_info *) const;
-  void transfer_after (vector_insn_info &, insn_info *) const;
-  void emit_local_forward_vsetvls (const bb_info *);
-
-  /* Phase 3.  */
-  bool earliest_fusion (void);
-  void vsetvl_fusion (void);
-
-  /* Phase 4.  */
-  void prune_expressions (void);
-  void compute_local_properties (void);
-  bool can_refine_vsetvl_p (const basic_block, const vector_insn_info &) const;
-  void refine_vsetvls (void) const;
-  void cleanup_vsetvls (void);
-  bool commit_vsetvls (void);
-  void pre_vsetvl (void);
-
-  /* Phase 5.  */
-  rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const;
-  void local_eliminate_vsetvl_insn (const bb_info *) const;
-  bool global_eliminate_vsetvl_insn (const bb_info *) const;
-  void ssa_post_optimization (void) const;
-
-  /* Phase 6.  */
-  void df_post_optimization (void) const;
-
-  void init (void);
-  void done (void);
-  void compute_probabilities (void);
+  void simple_vsetvl ();
+  void lazy_vsetvl ();
 
public:
   pass_vsetvl (gcc::context *ctxt) : rtl_opt_pass (pass_data_vsetvl, ctxt) {}
@@ -2741,69 +2695,11 @@ public:
   virtual unsigned int execute (function *) final override;
}; // class pass_vsetvl
 
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const rtx_insn *i) const
-{
-  return m_vector_manager->vector_insn_infos[INSN_UID (i)];
-}
-
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const insn_info *i) const
-{
-  return m_vector_manager->vector_insn_infos[i->uid ()];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-void
-pass_vsetvl::update_vector_info (const insn_info *i,
- const vector_insn_info &new_info)
-{
-  m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
-}
-
void
-pass_vsetvl::update_block_info (int index, profile_probability prob,
- const vector_insn_info &new_info)
-{
-  m_vector_manager->vector_block_infos[index].probability = prob;
-  if (m_vector_manager->vector_block_infos[index].local_dem
-  == m_vector_manager->vector_block_infos[index].reaching_out)
-m_vector_manager->vector_block_infos[index].local_dem = new_info;
-  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
-}
-
-/* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
-void
-pass_vsetvl::simple_vsetvl (void) const
+pass_vsetvl::simple_vsetvl ()
{

Re: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to vsetvl_block_info class

2023-10-17 Thread juzhe.zh...@rivai.ai

+  const vsetvl_info &get_header_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }

Change it into get_entry_info (be consistent with mode-switching naming which 
also uses LCM).

+  const vsetvl_info &get_footer_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }

Change it into get_exit_info (be consistent with mode-switching naming which 
also uses LCM).



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to 
vsetvl_block_info class
This sub-patch adjust vector_block_info codes and rename to
vsetvl_block_info.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (class vsetvl_block_info): New.
* config/riscv/riscv-vsetvl.h (struct vector_block_info): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 55 +++-
gcc/config/riscv/riscv-vsetvl.h  | 14 
2 files changed, 54 insertions(+), 15 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index b5ed1ea774a..d91b0272d9f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -85,7 +85,6 @@ along with GCC; see the file COPYING3.  If not see
#include "predict.h"
#include "profile-count.h"
#include "gcse.h"
-#include "riscv-vsetvl.h"
 
using namespace rtl_ssa;
using namespace riscv_vector;
@@ -1218,6 +1217,60 @@ public:
   }
};
 
+class vsetvl_block_info
+{
+public:
+  /* The static execute probability of the demand info.  */
+  profile_probability probability;
+
+  auto_vec infos;
+  vsetvl_info m_info;
+  bb_info *m_bb;
+
+  bool full_available;
+
+  vsetvl_block_info () : m_bb (nullptr), full_available (false)
+  {
+infos.safe_grow_cleared (0);
+m_info.set_empty ();
+  }
+  vsetvl_block_info (const vsetvl_block_info &other)
+: probability (other.probability), infos (other.infos.copy ()),
+  m_info (other.m_info), m_bb (other.m_bb)
+  {}
+
+  vsetvl_info &get_header_info ()
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }
+  vsetvl_info &get_footer_info ()
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }
+  const vsetvl_info &get_header_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }
+  const vsetvl_info &get_footer_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }
+
+  bool empty_p () const { return infos.is_empty () && !has_info (); }
+  bool has_info () const { return !m_info.empty_p (); }
+  void set_info (const vsetvl_info &info)
+  {
+gcc_assert (infos.is_empty ());
+m_info = info;
+m_info.set_bb (m_bb);
+  }
+  void set_empty_info () { m_info.set_empty (); }
+};
+
class demand_system
{
private:
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 96e36403af7..16c84e0684b 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -55,19 +55,5 @@ enum def_type
   CLOBBER_DEF = 1 << 4
};
 
-struct vector_block_info
-{
-  /* The local_dem vector insn_info of the block.  */
-  vector_insn_info local_dem;
-
-  /* The reaching_out vector insn_info of the block.  */
-  vector_insn_info reaching_out;
-
-  /* The static execute probability of the demand info.  */
-  profile_probability probability;
-
-  vector_block_info () = default;
-};
-
} // namespace riscv_vector
#endif
--
2.36.3

Re: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM on algorithm of local analysis.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2
This sub-patch combine phase 1 and 2 to use the new demand system and
delay the insert of vsetvl insn into phase 4.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): New.
(pass_vsetvl::compute_local_backward_infos): Removed.
(pass_vsetvl::need_vsetvl): Removed.
(pass_vsetvl::transfer_before): Removed.
(pass_vsetvl::transfer_after): Removed.
(pass_vsetvl::emit_local_forward_vsetvls): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 269 ++-
1 file changed, 123 insertions(+), 146 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 3f07fde782f..33bdcec04d8 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2669,6 +2669,129 @@ public:
   }
};
 
+void
+pre_vsetvl::fuse_local_vsetvl_info ()
+{
+  reg_def_loc
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), GP_REG_LAST + 1);
+  bitmap_vector_clear (reg_def_loc, last_basic_block_for_fn (cfun));
+  bitmap_ones (reg_def_loc[ENTRY_BLOCK_PTR_FOR_FN (cfun)->index]);
+
+  for (bb_info *bb : crtl->ssa->bbs ())
+{
+  auto &block_info = get_block_info (bb);
+  block_info.m_bb = bb;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "  Try fuse basic block %d\n", bb->index ());
+ }
+  auto_vec infos;
+  for (insn_info *insn : bb->real_nondebug_insns ())
+ {
+   vsetvl_info curr_info = vsetvl_info (insn);
+   if (curr_info.valid_p () || curr_info.unknown_p ())
+ infos.safe_push (curr_info);
+
+   /* Collecting GP registers modified by the current bb.  */
+   if (insn->is_real ())
+ for (def_info *def : insn->defs ())
+   if (def->is_reg () && GP_REG_P (def->regno ()))
+ bitmap_set_bit (reg_def_loc[bb->index ()], def->regno ());
+ }
+
+  vsetvl_info prev_info = vsetvl_info ();
+  prev_info.set_empty ();
+  for (auto &curr_info : infos)
+ {
+   if (prev_info.empty_p ())
+ prev_info = curr_info;
+   else if ((curr_info.unknown_p () && prev_info.valid_p ())
+|| (curr_info.valid_p () && prev_info.unknown_p ()))
+ {
+   block_info.infos.safe_push (prev_info);
+   prev_info = curr_info;
+ }
+   else if (curr_info.valid_p () && prev_info.valid_p ())
+ {
+   if (dem.available_with (prev_info, curr_info))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file,
+"Ignore curr info since prev info "
+"available with it:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "");
+   fprintf (dump_file, "\n");
+ }
+   if (!curr_info.use_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ delete_list.safe_push (curr_info);
+
+   if (curr_info.get_read_vl_insn ())
+ prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+ }
+   else if (dem.compatible_with (prev_info, curr_info))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "Fuse curr info since prev info "
+   "compatible with it:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "");
+ }
+   dem.merge_with (prev_info, curr_info);
+   if (curr_info.get_read_vl_insn ())
+ prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "  prev_info after fused: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "\n");
+ }
+ }
+   else
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file,
+"Cannot fuse uncompatible infos:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "   ");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "   ");
+ }
+   block_info.infos.safe_push (prev_info);
+   prev_info = curr_info;
+ }
+ }
+ }
+
+  if (prev_info.valid_p () || prev_info.unknown_p ())
+ block_info.infos.safe_push (prev_info);
+}
+
+  avl_regs = sbitmap_alloc (GP_REG_LA

Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow

2023-10-17 Thread juzhe.zh...@rivai.ai


compute_vsetvl_lcm_data -> compute_lcm_local_properties




juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data 
flow
This sub-patch add some helper functions for computing reaching defintion data
and three computational functions for different object. These three functions
are used by phase 2 and 3.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New.
(compute_reaching_defintion): New.
(pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_vsetvl_lcm_data): New.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 468 +++
1 file changed, 468 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 33bdcec04d8..b1269e8cf4f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -103,6 +103,121 @@ along with GCC; see the file COPYING3.  If not see
using namespace rtl_ssa;
using namespace riscv_vector;
 
+/* Set the bitmap DST to the union of SRC of predecessors of
+   basic block B.
+   It's a bit different from bitmap_union_of_preds in cfganal.cc. This function
+   takes into account the case where pred is ENTRY basic block. The main reason
+   for this difference is to make it easier to insert some special value into
+   the ENTRY base block. For example, vsetvl_info with a status of UNKNOW.  */
+static void
+bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b)
+{
+  unsigned int set_size = dst->size;
+  edge e;
+  unsigned ix;
+
+  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
+{
+  e = EDGE_PRED (b, ix);
+  bitmap_copy (dst, src[e->src->index]);
+  break;
+}
+
+  if (ix == EDGE_COUNT (b->preds))
+bitmap_clear (dst);
+  else
+for (ix++; ix < EDGE_COUNT (b->preds); ix++)
+  {
+ unsigned int i;
+ SBITMAP_ELT_TYPE *p, *r;
+
+ e = EDGE_PRED (b, ix);
+ p = src[e->src->index]->elms;
+ r = dst->elms;
+ for (i = 0; i < set_size; i++)
+   *r++ |= *p++;
+  }
+}
+
+/* Compute the reaching defintion in and out based on the gen and KILL
+   informations in each Base Blocks.
+   This function references the compute_avaiable implementation in lcm.cc  */
+static void
+compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in,
+ sbitmap *out)
+{
+  edge e;
+  basic_block *worklist, *qin, *qout, *qend, bb;
+  unsigned int qlen;
+  edge_iterator ei;
+
+  /* Allocate a worklist array/queue.  Entries are only added to the
+ list if they were not already on the list.  So the size is
+ bounded by the number of basic blocks.  */
+  qin = qout = worklist
+= XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+
+  /* Put every block on the worklist; this is necessary because of the
+ optimistic initialization of AVOUT above.  Use reverse postorder
+ to make the forward dataflow problem require less iterations.  */
+  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+  int n = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, false);
+  for (int i = 0; i < n; ++i)
+{
+  bb = BASIC_BLOCK_FOR_FN (cfun, rpo[i]);
+  *qin++ = bb;
+  bb->aux = bb;
+}
+  free (rpo);
+
+  qin = worklist;
+  qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS];
+  qlen = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
+
+  /* Mark blocks which are successors of the entry block so that we
+ can easily identify them below.  */
+  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
+e->dest->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  /* Iterate until the worklist is empty.  */
+  while (qlen)
+{
+  /* Take the first entry off the worklist.  */
+  bb = *qout++;
+  qlen--;
+
+  if (qout >= qend)
+ qout = worklist;
+
+  /* Do not clear the aux field for blocks which are successors of the
+ ENTRY block.  That way we never add then to the worklist again.  */
+  if (bb->aux != ENTRY_BLOCK_PTR_FOR_FN (cfun))
+ bb->aux = NULL;
+
+  bitmap_union_of_preds_with_entry (in[bb->index], out, bb);
+
+  if (bitmap_ior_and_compl (out[bb->index], gen[bb->index], in[bb->index],
+ kill[bb->index]))
+ /* If the out state of this block changed, then we need
+to add the successors of this block to the worklist
+if they are not already on the worklist.  */
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   if (!e->dest->aux && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
+ {
+   *qin++ = e->dest;
+   e->dest->aux = e;
+   qlen++;
+
+   if (qin >= qend)
+ qin = worklist;
+ }
+}
+
+  clear_aux_for_edges ();
+  clear_aux_for_blocks ();
+  free (worklist);
+}
+
stat

Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow

2023-10-17 Thread juzhe.zh...@rivai.ai

Copy and paste the original comments:

-/* Compute the local properties of each recorded expression.
-
-   Local properties are those that are defined by the block, irrespective of
-   other blocks.
-
-   An expression is transparent in a block if its operands are not modified
-   in the block.
-
-   An expression is computed (locally available) in a block if it is computed
-   at least once and expression would contain the same value if the
-   computation was moved to the end of the block.
-
-   An expression is locally anticipatable in a block if it is computed at
-   least once and expression would contain the same value if the computation
-   was moved to the beginning of the block.  */
-void
-pass_vsetvl::compute_local_properties (void)
-{
-  /* -  If T is locally available at the end of a block, then T' must be
-   available at the end of the same block. Since some optimization has
-   occurred earlier, T' might not be locally available, however, it must
-   have been previously computed on all paths. As a formula, T at AVLOC(B)
-   implies that T' at AVOUT(B).
-   An "available occurrence" is one that is the last occurrence in the
-   basic block and the operands are not modified by following statements in
-   the basic block [including this insn].
-
- -  If T is locally anticipated at the beginning of a block, then either
-   T', is locally anticipated or it is already available from previous
-   blocks. As a formula, this means that T at ANTLOC(B) implies that T' at
-   ANTLOC(B) at AVIN(B).
-   An "anticipatable occurrence" is one that is the first occurrence in the
-   basic block, the operands are not modified in the basic block prior
-   to the occurrence and the output is not used between the start of
-   the block and the occurrence.  */



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data 
flow
This sub-patch add some helper functions for computing reaching defintion data
and three computational functions for different object. These three functions
are used by phase 2 and 3.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New.
(compute_reaching_defintion): New.
(pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_vsetvl_lcm_data): New.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 468 +++
1 file changed, 468 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 33bdcec04d8..b1269e8cf4f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -103,6 +103,121 @@ along with GCC; see the file COPYING3.  If not see
using namespace rtl_ssa;
using namespace riscv_vector;
 
+/* Set the bitmap DST to the union of SRC of predecessors of
+   basic block B.
+   It's a bit different from bitmap_union_of_preds in cfganal.cc. This function
+   takes into account the case where pred is ENTRY basic block. The main reason
+   for this difference is to make it easier to insert some special value into
+   the ENTRY base block. For example, vsetvl_info with a status of UNKNOW.  */
+static void
+bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b)
+{
+  unsigned int set_size = dst->size;
+  edge e;
+  unsigned ix;
+
+  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
+{
+  e = EDGE_PRED (b, ix);
+  bitmap_copy (dst, src[e->src->index]);
+  break;
+}
+
+  if (ix == EDGE_COUNT (b->preds))
+bitmap_clear (dst);
+  else
+for (ix++; ix < EDGE_COUNT (b->preds); ix++)
+  {
+ unsigned int i;
+ SBITMAP_ELT_TYPE *p, *r;
+
+ e = EDGE_PRED (b, ix);
+ p = src[e->src->index]->elms;
+ r = dst->elms;
+ for (i = 0; i < set_size; i++)
+   *r++ |= *p++;
+  }
+}
+
+/* Compute the reaching defintion in and out based on the gen and KILL
+   informations in each Base Blocks.
+   This function references the compute_avaiable implementation in lcm.cc  */
+static void
+compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in,
+ sbitmap *out)
+{
+  edge e;
+  basic_block *worklist, *qin, *qout, *qend, bb;
+  unsigned int qlen;
+  edge_iterator ei;
+
+  /* Allocate a worklist array/queue.  Entries are only added to the
+ list if they were not already on the list.  So the size is
+ bounded by the number of basic blocks.  */
+  qin = qout = worklist
+= XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+
+  /* Put every block on the worklist; this is necessary because of the
+ optimistic initialization of AVOUT above.  Use reverse postorder
+ to make the forward dataflow problem require less iterations.

Re: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to 
pre_vsetvl class
This patch adjust move the code phase 2 and 3 from pass_vsetvl to
pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): New.
(pre_vsetvl::pre_global_vsetvl_info): New.
(pass_vsetvl::prune_expressions): Removed.
(pass_vsetvl::compute_local_properties): Removed.
(pass_vsetvl::earliest_fusion): Removed.
(pass_vsetvl::vsetvl_fusion): Removed.
(pass_vsetvl::pre_vsetvl): Removed.
(pass_vsetvl::compute_probabilities): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 829 +++
1 file changed, 398 insertions(+), 431 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index b1269e8cf4f..a112895a283 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3260,6 +3260,404 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 }
}
 
+bool
+pre_vsetvl::earliest_fuse_vsetvl_info ()
+{
+  compute_avl_def_data ();
+  compute_vsetvl_def_data ();
+  compute_vsetvl_lcm_data ();
+
+  unsigned num_exprs = exprs.length ();
+  struct edge_list *edges = create_edge_list ();
+  unsigned num_edges = NUM_EDGES (edges);
+  sbitmap *antin
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+  sbitmap *antout
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+
+  sbitmap *earliest = sbitmap_vector_alloc (num_edges, num_exprs);
+
+  compute_available (avloc, kill, avout, avin);
+  compute_antinout_edge (antloc, transp, antin, antout);
+  compute_earliest (edges, num_exprs, antin, antout, avout, kill, earliest);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "Expression List (%u):\n", num_exprs);
+  for (unsigned i = 0; i < num_exprs; i++)
+ {
+   const auto &info = *exprs[i];
+   fprintf (dump_file, "  Expr[%u]: ", i);
+   info.dump (dump_file, "");
+ }
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+ {
+   unsigned int i = bb->index ();
+   fprintf (dump_file, "  BB %u:\n", i);
+   fprintf (dump_file, "avloc: ");
+   dump_bitmap_file (dump_file, avloc[i]);
+   fprintf (dump_file, "kill: ");
+   dump_bitmap_file (dump_file, kill[i]);
+   fprintf (dump_file, "antloc: ");
+   dump_bitmap_file (dump_file, antloc[i]);
+   fprintf (dump_file, "transp: ");
+   dump_bitmap_file (dump_file, transp[i]);
+
+   fprintf (dump_file, "avin: ");
+   dump_bitmap_file (dump_file, avin[i]);
+   fprintf (dump_file, "avout: ");
+   dump_bitmap_file (dump_file, avout[i]);
+   fprintf (dump_file, "antin: ");
+   dump_bitmap_file (dump_file, antin[i]);
+   fprintf (dump_file, "antout: ");
+   dump_bitmap_file (dump_file, antout[i]);
+ }
+  fprintf (dump_file, "\n");
+  fprintf (dump_file, "  earliest:\n");
+  for (unsigned ed = 0; ed < num_edges; ed++)
+ {
+   edge eg = INDEX_EDGE (edges, ed);
+
+   if (bitmap_empty_p (earliest[ed]))
+ continue;
+   fprintf (dump_file, "Edge(bb %u -> bb %u): ", eg->src->index,
+eg->dest->index);
+   dump_bitmap_file (dump_file, earliest[ed]);
+ }
+  fprintf (dump_file, "\n");
+}
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Fused global info result:\n");
+}
+
+  bool changed = false;
+  for (unsigned ed = 0; ed < num_edges; ed++)
+{
+  sbitmap e = earliest[ed];
+  if (bitmap_empty_p (e))
+ continue;
+
+  unsigned int expr_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi)
+ {
+   vsetvl_info &curr_info = *exprs[expr_index];
+   if (!curr_info.valid_p ())
+ continue;
+
+   edge eg = INDEX_EDGE (edges, ed);
+   if (eg->probability == profile_probability::never ())
+ continue;
+   if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+   || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+ continue;
+
+   vsetvl_block_info &src_block_info = get_block_info (eg->src);
+   vsetvl_block_info &dest_block_info = get_block_info (eg->dest);
+
+   if (src_block_info.probability
+   == profile_probability::uninitialized ())
+ continue;
+
+   if (src_block_info.empty_p ())
+ {
+   vsetvl_info new_curr_info = curr_info;
+   new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
+   bool has_compatible_

Re: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn into Phase 4

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn 
into Phase 4
This sub-patch move the modification of rtl codes from pass_vsetvl
into pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): New.
(pass_vsetvl::can_refine_vsetvl_p): Removed.
(pass_vsetvl::refine_vsetvls): Removed.
(pass_vsetvl::cleanup_vsetvls): Removed.
(pass_vsetvl::commit_vsetvls): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 389 +++
1 file changed, 134 insertions(+), 255 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index a112895a283..5d84d290e9e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3658,6 +3658,140 @@ pre_vsetvl::pre_global_vsetvl_info ()
 }
}
 
+void
+pre_vsetvl::emit_vsetvl ()
+{
+  bool need_commit = false;
+
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  for (const auto &curr_info : get_block_info (bb).infos)
+ {
+   insn_info *insn = curr_info.get_insn ();
+   if (curr_info.ignore_p ())
+ {
+   if (vsetvl_insn_p (insn->rtl ()))
+ eliminate_insn (insn->rtl ());
+   continue;
+ }
+   else if (curr_info.valid_p ())
+ {
+   if (vsetvl_insn_p (insn->rtl ()))
+ {
+   const vsetvl_info temp = vsetvl_info (insn);
+   if (!(curr_info == temp))
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file, "\n  Change vsetvl info from: ");
+   temp.dump (dump_file, "");
+   fprintf (dump_file, "  to: ");
+   curr_info.dump (dump_file, "");
+ }
+   change_vsetvl_insn (insn, curr_info);
+ }
+ }
+   else
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"\n  Insert vsetvl info before insn %d: ",
+insn->uid ());
+   curr_info.dump (dump_file, "");
+ }
+   insert_vsetvl (EMIT_BEFORE, insn->rtl (), curr_info);
+ }
+ }
+ }
+}
+
+  for (const vsetvl_info &item : delete_list)
+{
+  gcc_assert (vsetvl_insn_p (item.get_insn ()->rtl ()));
+  eliminate_insn (item.get_insn ()->rtl ());
+}
+
+  /* Insert vsetvl as LCM suggest. */
+  for (int ed = 0; ed < NUM_EDGES (edges); ed++)
+{
+  edge eg = INDEX_EDGE (edges, ed);
+  sbitmap i = insert[ed];
+  if (bitmap_count_bits (i) < 1)
+ continue;
+
+  if (bitmap_count_bits (i) > 1)
+ /* For code with infinite loop (e.g. pr61634.c), The data flow is
+completely wrong.  */
+ continue;
+
+  gcc_assert (bitmap_count_bits (i) == 1);
+  unsigned expr_index = bitmap_first_set_bit (i);
+  const vsetvl_info &info = *exprs[expr_index];
+  gcc_assert (info.valid_p ());
+  if (dump_file)
+ {
+   fprintf (dump_file,
+"\n  Insert vsetvl info at edge(bb %u -> bb %u): ",
+eg->src->index, eg->dest->index);
+   info.dump (dump_file, "");
+ }
+  rtl_profile_for_edge (eg);
+  start_sequence ();
+
+  insn_info *insn = info.get_insn ();
+  insert_vsetvl (EMIT_DIRECT, insn->rtl (), info);
+  rtx_insn *rinsn = get_insns ();
+  end_sequence ();
+  default_rtl_profile ();
+
+  /* We should not get an abnormal edge here.  */
+  gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+  need_commit = true;
+  insert_insn_on_edge (rinsn, eg);
+}
+
+  /* Insert vsetvl info that was not deleted after lift up.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  const vsetvl_block_info &block_info = get_block_info (bb);
+  if (!block_info.has_info ())
+ continue;
+
+  const vsetvl_info &footer_info = block_info.get_footer_info ();
+  insn_info *insn = footer_info.get_insn ();
+
+  if (footer_info.ignore_p ())
+ continue;
+
+  edge eg;
+  edge_iterator eg_iterator;
+  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
+ {
+   gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+   if (dump_file)
+ {
+   fprintf (
+ dump_file,
+ "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
+ eg->src->index, eg->dest->index);
+   footer_info.dump (dump_file, "");
+ }
+   start_sequence ();
+   insert_vsetvl (EMIT_DIRECT, insn->rtl (), footer_info);
+   rtx_insn *rinsn = get_insns ();
+   end_sequence ();
+   default_rtl_profile ();
+   insert_insn_on_edge (rinsn, eg);
+   need_commit = true;
+ }
+}
+
+  if (need_commit)
+commit_edge_insertions ();
+}
+
+
const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
   "vsetvl", /* name */
@@ -3790,261 +3924,6 @@ make_pass_vsetvl (gcc::context *ctxt)
   return new pass_vsetvl (ctxt);
}
 
-
-/* Return true if VSETVL in the block can be refin

Re: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase
This sub-patch deletes partial post optimize code(which implement
in the main phase) and move the remain cleanup code to pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::cleaup): New.
(pre_vsetvl::remove_avl_operand): New.
(pre_vsetvl::remove_unused_dest_operand): New.
(pass_vsetvl::get_vsetvl_at_end): Removed.
(local_avl_compatible_p): Removed.
(pass_vsetvl::local_eliminate_vsetvl_insn): Removed.
(get_first_vsetvl_before_rvv_insns): Removed.
(pass_vsetvl::global_eliminate_vsetvl_insn): Removed.
(pass_vsetvl::ssa_post_optimization): Removed.
(has_no_uses): Removed.
(pass_vsetvl::df_post_optimization): Removed.
(pass_vsetvl::init): Removed.
(pass_vsetvl::done): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 675 ---
1 file changed, 76 insertions(+), 599 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 5d84d290e9e..ac636623b3f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3791,6 +3791,82 @@ pre_vsetvl::emit_vsetvl ()
 commit_edge_insertions ();
}
 
+void
+pre_vsetvl::cleaup ()
+{
+  remove_avl_operand ();
+  remove_unused_dest_operand ();
+}
+
+void
+pre_vsetvl::remove_avl_operand ()
+{
+  for (const bb_info *bb : crtl->ssa->bbs ())
+for (insn_info *insn : bb->real_nondebug_insns ())
+  {
+ rtx_insn *rinsn = insn->rtl ();
+ /* Erase the AVL operand from the instruction.  */
+ if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn)))
+   continue;
+ rtx avl = get_vl (rinsn);
+ if (count_regno_occurrences (rinsn, REGNO (avl)) == 1)
+   {
+ /* Get the list of uses for the new instruction.  */
+ auto attempt = crtl->ssa->new_change_attempt ();
+ insn_change change (insn);
+ /* Remove the use of the substituted value.  */
+ access_array_builder uses_builder (attempt);
+ uses_builder.reserve (insn->num_uses () - 1);
+ for (use_info *use : insn->uses ())
+   if (use != find_access (insn->uses (), REGNO (avl)))
+ uses_builder.quick_push (use);
+ use_array new_uses = use_array (uses_builder.finish ());
+ change.new_uses = new_uses;
+ change.move_range = insn->ebb ()->insn_range ();
+ rtx pat;
+ if (fault_first_load_p (rinsn))
+   pat = simplify_replace_rtx (PATTERN (rinsn), avl, const0_rtx);
+ else
+   {
+ rtx set = single_set (rinsn);
+ rtx src = simplify_replace_rtx (SET_SRC (set), avl, const0_rtx);
+ pat = gen_rtx_SET (SET_DEST (set), src);
+   }
+ bool ok = change_insn (crtl->ssa, change, insn, pat);
+ gcc_assert (ok);
+   }
+  }
+}
+
+void
+pre_vsetvl::remove_unused_dest_operand ()
+{
+  df_analyze ();
+  hash_set to_delete;
+  basic_block cfg_bb;
+  rtx_insn *rinsn;
+  FOR_ALL_BB_FN (cfg_bb, cfun)
+{
+  FOR_BB_INSNS (cfg_bb, rinsn)
+ {
+   if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn))
+ {
+   rtx vl = get_vl (rinsn);
+   vsetvl_info info = vsetvl_info (rinsn);
+   if (has_no_uses (cfg_bb, rinsn, REGNO (vl)))
+ {
+   if (!info.has_vlmax_avl ())
+ {
+   rtx new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info,
+ NULL_RTX);
+   validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat,
+false);
+ }
+ }
+ }
+ }
+}
+}
 
const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
@@ -3923,602 +3999,3 @@ make_pass_vsetvl (gcc::context *ctxt)
{
   return new pass_vsetvl (ctxt);
}
-
-/* Some instruction can not be accessed in RTL_SSA when we don't re-init
-   the new RTL_SSA framework but it is definetely at the END of the block.
-
-  Here we optimize the VSETVL is hoisted by LCM:
-
-   Before LCM:
- bb 1:
-   vsetvli a5,a2,e32,m1,ta,mu
- bb 2:
-   vsetvli zero,a5,e32,m1,ta,mu
-   ...
-
-   After LCM:
- bb 1:
-   vsetvli a5,a2,e32,m1,ta,mu
-   LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate
- bb 2:
-   ...
-   */
-rtx_insn *
-pass_vsetvl::get_vsetvl_at_end (const bb_info *bb, vector_insn_info *dem) const
-{
-  rtx_insn *end_vsetvl = BB_END (bb->cfg_bb ());
-  if (end_vsetvl && NONDEBUG_INSN_P (end_vsetvl))
-{
-  if (JUMP_P (end_vsetvl))
- end_vsetvl = PREV_INSN (end_vsetvl);
-
-  if (NONDEBUG_INSN_P (end_vsetvl)
-   && vsetvl_discard_result_insn_p (end_vsetvl))
- {
-   /* Only handle single succ. here, multiple succ. is much
-  more complicated.  */
-   if (single_succ_p (bb->cfg_bb ()))
- {
-   edge e = single_succ_edge (bb->cfg_bb ());
-   *dem = get_block_info (e->dest).local_dem;
-   return end_vsetvl;
- }
- }
-}
-  return nullptr;
-}
-
-/* This predicator should only used w

Re: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h
This sub-patch delete the unused header file riscv-vsetvl.h
since we no need export any function.
 
gcc/ChangeLog:
 
* config/riscv/t-riscv: Removed riscv-vsetvl.h
* config/riscv/riscv-vsetvl.h: Removed.
 
---
gcc/config/riscv/riscv-vsetvl.h | 59 -
gcc/config/riscv/t-riscv|  2 +-
2 files changed, 1 insertion(+), 60 deletions(-)
delete mode 100644 gcc/config/riscv/riscv-vsetvl.h
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
deleted file mode 100644
index 16c84e0684b..000
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ /dev/null
@@ -1,59 +0,0 @@
-/* VSETVL pass header for RISC-V 'V' Extension for GNU compiler.
-   Copyright (C) 2022-2023 Free Software Foundation, Inc.
-   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 3, or(at your option)
-any later version.
-
-GCC is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
-<http://www.gnu.org/licenses/>.  */
-
-#ifndef GCC_RISCV_VSETVL_H
-#define GCC_RISCV_VSETVL_H
-
-namespace riscv_vector {
-
-/* Classification of vsetvl instruction.  */
-enum vsetvl_type
-{
-  VSETVL_NORMAL,
-  VSETVL_VTYPE_CHANGE_ONLY,
-  VSETVL_DISCARD_RESULT,
-  NUM_VSETVL_TYPE
-};
-
-enum emit_type
-{
-  /* emit_insn directly.  */
-  EMIT_DIRECT,
-  EMIT_BEFORE,
-  EMIT_AFTER,
-};
-
-enum def_type
-{
-  REAL_SET = 1 << 0,
-  PHI_SET = 1 << 1,
-  BB_HEAD_SET = 1 << 2,
-  BB_END_SET = 1 << 3,
-  /* ??? TODO: In RTL_SSA framework, we have REAL_SET,
- PHI_SET, BB_HEAD_SET, BB_END_SET and
- CLOBBER_DEF def_info types. Currently,
- we conservatively do not optimize clobber
- def since we don't see the case that we
- need to optimize it.  */
-  CLOBBER_DEF = 1 << 4
-};
-
-} // namespace riscv_vector
-#endif
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index f137e1f17ef..dd17056fe82 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -64,7 +64,7 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
   $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
   insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
-  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
+  predict.h profile-count.h \
   $(srcdir)/config/riscv/riscv-vsetvl.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
--
2.36.3

Re: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL
This sub-patch reoriganize the functions that used to modify RTL.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (has_no_uses): Moved.
(validate_change_or_fail): Moved.
(gen_vsetvl_pat): Removed.
(emit_vsetvl_insn): Removed.
(eliminate_insn): Removed.
(change_insn): Removed.
(change_vsetvl_insn): New.
(pre_vsetvl::emit_vsetvl): New.
(pre_vsetvl::remove_avl_operand): Adjust.
(pre_vsetvl::remove_unused_dest_operand): Adjust.
(pass_vsetvl::simple_vsetvl): Adjust.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 443 ---
1 file changed, 176 insertions(+), 267 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d91b0272d9f..78816cbee15 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -680,6 +680,30 @@ get_bb_index (unsigned expr_id, unsigned num_bb)
   return expr_id % num_bb;
}
 
+/* Return true if the SET result is not used by any instructions.  */
+static bool
+has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
+{
+  if (bitmap_bit_p (df_get_live_out (cfg_bb), regno))
+return false;
+
+  rtx_insn *iter;
+  for (iter = NEXT_INSN (rinsn); iter && iter != NEXT_INSN (BB_END (cfg_bb));
+   iter = NEXT_INSN (iter))
+if (df_find_use (iter, regno_reg_rtx[regno]))
+  return false;
+
+  return true;
+}
+
+/* Change insn and Assert the change always happens.  */
+static void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
/* This flags indicates the minimum demand of the vl and vtype values by the
RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
instruction only needs the SEW/LMUL ratio to remain the same, and does not
@@ -1126,6 +1150,28 @@ public:
   }
   }
 
+  /* Returns the corresponding vsetvl rtx pat.  */
+  rtx get_vsetvl_pat (bool ignore_vl = false) const
+  {
+rtx avl = get_avl ();
+/* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+   set the value of avl to (const_int 0) so that VSETVL PASS will
+   insert vsetvl correctly.*/
+if (!get_avl ())
+  avl = GEN_INT (0);
+rtx sew = gen_int_mode (get_sew (), Pmode);
+rtx vlmul = gen_int_mode (get_vlmul (), Pmode);
+rtx ta = gen_int_mode (get_ta (), Pmode);
+rtx ma = gen_int_mode (get_ma (), Pmode);
+
+if (change_vtype_only_p ())
+  return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma);
+else if (has_reg_vl () && !ignore_vl)
+  return gen_vsetvl (Pmode, get_vl (), avl, sew, vlmul, ta, ma);
+else
+  return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma);
+  }
+
   bool operator== (const vsetvl_info &other) const
   {
 gcc_assert (!uninit_p () && !other.uninit_p ()
@@ -1938,199 +1984,6 @@ public:
   }
};
 
-/* Emit vsetvl instruction.  */
-static rtx
-gen_vsetvl_pat (enum vsetvl_type insn_type, const vsetvl_info &info, rtx vl)
-{
-  rtx avl = info.get_avl ();
-  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
- set the value of avl to (const_int 0) so that VSETVL PASS will
- insert vsetvl correctly.*/
-  if (!info.get_avl ())
-avl = GEN_INT (0);
-  rtx sew = gen_int_mode (info.get_sew (), Pmode);
-  rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
-  rtx ta = gen_int_mode (info.get_ta (), Pmode);
-  rtx ma = gen_int_mode (info.get_ma (), Pmode);
-
-  if (insn_type == VSETVL_NORMAL)
-{
-  gcc_assert (vl != NULL_RTX);
-  return gen_vsetvl (Pmode, vl, avl, sew, vlmul, ta, ma);
-}
-  else if (insn_type == VSETVL_VTYPE_CHANGE_ONLY)
-return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma);
-  else
-return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma);
-}
-
-static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vsetvl_info &info, rtx vl = NULL_RTX)
-{
-  rtx new_pat;
-  vsetvl_info new_info = info;
-  /* For vmv.x.s, use 0 for avl.  */
-  if (!info.get_avl ())
-{
-  new_info.set_avl (const0_rtx);
-  new_info.set_avl_def (nullptr);
-}
-
-  if (vl)
-new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
-  else
-{
-  if (vsetvl_insn_p (rinsn) && !info.change_vtype_only_p ())
- new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
-  else if (info.change_vtype_only_p ()
-|| INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
- new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
-  else
- new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, new_info, NULL_RTX);
-}
-  return new_pat;
-}
-
-static void
-emit_vsetvl_insn (enum vsetvl_type insn_type, enum emit_type em

Re: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:35
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases
This sub-patch adjust some testcases and add some bugfix
testcases.
 
PR target/111037
PR target/111234
PR target/111725
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust.
* gcc.target/riscv/rvv/base/pr111037-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here.
* gcc.target/riscv/rvv/base/pr111037-2.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test.
 
---
.../gcc.target/riscv/rvv/base/scalar_move-1.c |  2 +-
.../riscv/rvv/vsetvl/avl_single-104.c | 35 +++
.../riscv/rvv/vsetvl/avl_single-105.c | 23 
.../riscv/rvv/vsetvl/avl_single-23.c  |  7 ++--
.../riscv/rvv/vsetvl/avl_single-46.c  |  3 +-
.../riscv/rvv/vsetvl/avl_single-89.c  |  8 ++---
.../riscv/rvv/vsetvl/avl_single-95.c  |  2 +-
.../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  7 ++--
.../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |  2 +-
.../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |  2 +-
.../riscv/rvv/{base => vsetvl}/pr111037-1.c   |  0
.../riscv/rvv/{base => vsetvl}/pr111037-2.c   |  0
.../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  | 16 +
.../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  | 16 +
.../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +++---
.../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +++---
.../riscv/rvv/vsetvl/vlmax_conflict-12.c  |  1 -
.../riscv/rvv/vsetvl/vlmax_conflict-3.c   |  2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |  4 +--
.../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |  4 ++-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  2 +-
21 files changed, 125 insertions(+), 31 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%)
rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
index 18349132a88..c833d8989e9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
@@ -46,8 +46,8 @@ int32_t foo3 (int32_t *base, size_t vl)
** vl1re32\.v\tv[0-9]+,0\([a-x0-9]+\)
** vsetvli\tzero,[a-x0-9]+,e32,m1,t[au],m[au]
** vadd.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
-** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
+** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
** vmv.v.x\tv[0-9]+,\s*[a-x0-9]+
** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
** ret
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
new file mode 100644
index 000..fb3577dcb98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns 
-fno-schedule-insns2 -fno-tree-vectorize" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int cond, int vl, int *in, int *out, int n)
+{
+  if (cond > 30)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 ((int32_t *) in, vl);
+  __riscv_vse32_v_i32m1 ((int32_t *) out, v, vl);
+}
+  else if (cond < 10)
+{
+  vint8mf4_t v = __riscv_vle8_v_i8mf4 ((int8_t *) in, vl);
+  v = __riscv_vle8_v_i8mf4_tu (v, (int8_t *) in + 10, vl);
+  __riscv_vse8_v_i8mf4 ((int8_t *) out, v, vl);
+}
+  else
+{
+  vl = vl * 2;
+}
+
+  for (int i = 0; i

Re: [PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread juzhe.zh...@rivai.ai

Forget about this patch.

Commit log code example is wrong, fixed it in V2： 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633420.html

Thanks.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-18 18:21
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848
 
But it generate horrible register spillings.
 
The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.
 
So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain 
better optimizations:
 
1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.
 
Before this patch:
 
f3:
ble a4,zero,.L8
csrrt0,vlenb
sllit1,t0,4
csrra6,vlenb
sub sp,sp,t1
csrra5,vlenb
sllia6,a6,3
sllia5,a5,2
add a6,a6,sp
vsetvli a7,zero,e16,m8,ta,ma
sllia4,a4,3
vid.v   v8
addit6,a5,-1
vand.vi v8,v8,-2
neg t5,a5
vs8r.v  v8,0(sp)
vadd.vi v8,v8,1
vs8r.v  v8,0(a6)
j   .L4
.L12:
vsetvli a7,zero,e16,m8,ta,ma
.L4:
csrrt0,vlenb
sllit0,t0,3
vl8re16.v   v16,0(sp)
add t0,t0,sp
vmv.v.x v8,t6
mv  t1,a4
vand.vv v24,v16,v8
mv  a6,a4
vl8re16.v   v16,0(t0)
vand.vv v8,v16,v8
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v20,0(a2)
vle8.v  v16,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v24
vadd.vv v4,v16,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v20,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L12
csrrt0,vlenb
sllit1,t0,4
add sp,sp,t1
jr  ra
.L8:
ret
 
After this patch:
 
bar:
ble a3,zero,.L5
csrr a5,vlenb
csrr t1,vlenb
srli a5,a5,1
srli a7,t1,1
addi a5,a5,-1
vsetvli a4,zero,e32,m2,ta,ma
slli a3,a3,1
vmv.v.x v2,a5
vid.v v18
vmv.v.x v6,a1
vand.vi v10,v18,-2
vand.vi v0,v18,1
vadd.vi v16,v10,1
vmseq.vi v0,v0,1
vand.vv v10,v10,v2
vand.vv v16,v16,v2
slli t1,t1,1
vsetvli zero,a4,e32,m2,ta,ma
neg t3,a7
viota.m v4,v0
vsetvli a4,zero,e32,m2,ta,mu
vmv.v.x v8,a2
vrgather.vv v14,v6,v4
vrgather.vv v12,v8,v4
vmv.v.i v2,0
vrgather.vv v14,v8,v4,v0.t
vrgather.vv v12,v6,v4,v0.t
.L4:
mv a2,a3
mv a5,a3
bleu a3,a7,.L3
mv a5,a7
.L3:
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v6,0(a0)
vsetvli a6,zero,e32,m2,ta,ma
add a3,a3,t3
vrgather.vv v4,v6,v10
vrgather.vv v8,v6,v16
vsub.vv v4,v4,v12
add a0,a0,t1
vsetvli zero,a5,e32,m2,tu,ma
vadd.vv v2,v2,v4
vmacc.vv v2,v14,v8
bgtu a2,a7,.L4
li a5,-1
vsetvli a6,zero,e32,m2,ta,ma
li a4,0
vmv.v.i v4,0
vmul.vx v0,v18,a5
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vv v0,v0,v4
vand.vi v18,v18,1
vmerge.vvm v6,v4,v2,v0
vmseq.vv v18,v18,v4
vmv.s.x v1,a4
vmv1r.v v0,v18
vredsum.vs v6,v6,v1
vmerge.vvm v4,v4,v2,v0
vmv.x.s a0,v6
vredsum.vs v4,v4,v1
vmv.x.s a5,v4
addw a0,a0,a5
ret
.L5:
li a0,0
ret
 
Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c 
execution test
 
They failed are all because of bugs on VSETVL PASS:
 
10dd4:   0c707057vsetvli zero,zero,e8,mf2,ta,ma
   10dd8:   5e06b8d7vmv.v.i v17,13
   10ddc:   9ed030d7

Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-18 Thread juzhe.zh...@rivai.ai

More details of VSETVL bug:

Loop:
   10ddc:   9ed030d7vmv1r.v v1,v13
   10de0:   b21040d7vncvt.x.x.w v1,v1
   10de4:   5e0785d7vmv.v.v v11,v15
   10de8:   b700a5d7vmacc.vvv11,v1,v16
   10dec:   a6e8a0d7vmadd.vvv1,v17,v14
   10df0:   26b7b5d7vand.vi v11,v11,15
   10df4:   0c75f7d7vsetvli a5,a1,e8,mf2,ta,ma
   10df8:   0c707557vsetvli a0,zero,e8,mf2,ta,ma
   10dfc:   2617b0d7vand.vi v1,v1,15
   10e00:   0c75f057vsetvli zero,a1,e8,mf2,ta,ma
   10e04:   8d9dsub a1,a1,a5
   10e06:   020705a7vse8.v  v11,(a4)
   10e0a:   0c77f057vsetvli zero,a5,e8,mf2,ta,ma
   10e0e:   020685a7vse8.v  v11,(a3)
   10e12:   020600a7vse8.v  v1,(a2)
   10e16:   973eadd a4,a4,a5
   10e18:   0c807557vsetvli a0,zero,e16,m1,ta,ma
   10e1c:   96beadd a3,a3,a5
   10e1e:   963eadd a2,a2,a5
   10e20:   02d606d7vadd.vv v13,v13,v12
   10e24:   fdc5bneza1,10ddc 

The vncvt.x.x.w consume e16m1 VTYPE vsetvl but it shouldn't, it should be e8mf2.
This issue is fixed by recent refactor patch.


juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-18 18:25
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848
 
But it generate horrible register spillings.
 
The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.
 
So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain 
better optimizations:
 
1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.
 
Before this patch:
 
f3:
ble a4,zero,.L8
csrrt0,vlenb
sllit1,t0,4
csrra6,vlenb
sub sp,sp,t1
csrra5,vlenb
sllia6,a6,3
sllia5,a5,2
add a6,a6,sp
vsetvli a7,zero,e16,m8,ta,ma
sllia4,a4,3
vid.v   v8
addit6,a5,-1
vand.vi v8,v8,-2
neg t5,a5
vs8r.v  v8,0(sp)
vadd.vi v8,v8,1
vs8r.v  v8,0(a6)
j   .L4
.L12:
vsetvli a7,zero,e16,m8,ta,ma
.L4:
csrrt0,vlenb
sllit0,t0,3
vl8re16.v   v16,0(sp)
add t0,t0,sp
vmv.v.x v8,t6
mv  t1,a4
vand.vv v24,v16,v8
mv  a6,a4
vl8re16.v   v16,0(t0)
vand.vv v8,v16,v8
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v20,0(a2)
vle8.v  v16,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v24
vadd.vv v4,v16,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v20,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L12
csrrt0,vlenb
sllit1,t0,4
add sp,sp,t1
jr  ra
.L8:
ret
 
After this patch:
 
f3:
ble a4,zero,.L6
csrr a6,vlenb
csrr a5,vlenb
slli a6,a6,2
slli a5,a5,2
addi a6,a6,-1
slli a4,a4,3
neg t5,a5
vsetvli t1,zero,e16,m8,ta,ma
vmv.v.x v24,a6
vid.v v8
vand.vi v8,v8,-2
vadd.vi v16,v8,1
vand.vv v8,v8,v24
vand.vv v16,v16,v24
.L4:
mv t1,a4
mv a6,a4
bleu a4,a5,.L3
mv a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v v28,0(a2)
vle8.v v24,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v8
vadd.vv v4,v24,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v v4,0(a0)
vle8.v v28,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v16
vadd.vv v4,v4,v24
vsetvli zero,a6,e8,m4,ta,ma
vse8.v v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtu t1,a5,.L4
.L6:
ret
 
Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_loa

[PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread juzhe.zh...@rivai.ai

LGTM popcount patch.



juzhe.zh...@rivai.ai

Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-18 Thread juzhe.zh...@rivai.ai

Hi, this patch fix V4 issue:

Previously as Richard S commented:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html 

slp_op and mask_vectype are only initialised when mask_index >= 0.
Shouldn't this code be under mask_index >= 0 too?
Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
node has a known vectype by this point.  I think a comment would be useful.

Since I didn't encounter mismatched case in the regression of RISC-V and X86, 
so 
I fix it in V4 patch as follows:
+  if (mask_index >= 0 && slp_node)
+   {
+ bool match_p
+   = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
+ gcc_assert (match_p);
+   }
Add assertion here.

However, recently an ICE suddenly appear today in RISC-V regression:

FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, 
at tree-vect-stmts.cc:9885)
FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)

This is because we are encountering that mask_vectype is boolean type and it is 
external def.
Then vect_maybe_update_slp_op_vectype will return false.

Then I fix this piece of code in V5 here:

+  if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+   {
+ /* We don't vectorize the boolean type external SLP mask.  */
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"incompatible vector types for invariants\n");
+ return false;
+   }

Bootstrap and Regression on x86 passed.

Thanks.


juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-18 20:36
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression:
 
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"
 
The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
 
We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
 
1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.
 
2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:
 
void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}
 
https://godbolt.org/z/WG3M3n7Mo
 
GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
 
f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
 
After this patch:
 
f:
ble a3,zero,.L5
li a5,1
csrr t1,vlenb
slli a5,a5,33
srli a7,t1,2
addi a5,a5,1
slli a3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minu a5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtu a6,a7,.L3
.L5:
ret
 
Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.
 
Tested on RISC-V and Bootstrap && Regression on X86 passed.
 
Ok for trunk ?
 
gcc/ChangeLog:
 
* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.

Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-28 Thread juzhe.zh...@rivai.ai

Ping。Ok for trunk？



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-26 19:35
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; kito.cheng; pan2.li; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
From: Juzhe-Zhong 
 
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
   PASS when testing ternop-3.c.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
* config/riscv/vector.md: Fix vimuladd instruction bug.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.
 
---
gcc/config/riscv/autovec.md   |  65 +++
gcc/config/riscv/riscv-protos.h   |   2 +
gcc/config/riscv/riscv-v.cc   |  20 
gcc/config/riscv/vector.md|   2 +-
.../riscv/rvv/autovec/ternop/ternop-1.c   |  28 +
.../riscv/rvv/autovec/ternop/ternop-2.c   |  34 ++
.../riscv/rvv/autovec/ternop/ternop-3.c   |  33 ++
.../riscv/rvv/autovec/ternop/ternop_run-1.c   |  84 ++
.../riscv/rvv/autovec/ternop/ternop_run-2.c   | 104 ++
.../riscv/rvv/autovec/ternop/ternop_run-3.c   | 104 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
11 files changed, 477 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..04825df1210 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,3 +373,68 @@
 DONE;
   }
)
+
+;; =
+;; == Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VMACC and VMADD
+;; -
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;;The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;;The vmadd is the ideal instruction when operands[1|2] overlaps 
operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand 
which
+;;is the operands[5]. Since operands[5] should overlap operands[0], this 
operand
+;;should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow 
undefined
+;;operand.
+;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL 
operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap 
operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register 
allocation
+;;result after reload_completed.
+(define_expand "fma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+   (plus:VI
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " vr")
+   (match_operand:VI 2 "register_operand" " vr"))
+ (match_operand:VI 3 "register_operand"   " vr")))
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (plus:VI
+   (mult:VI
+ (match_operand:VI 1 "register_operand

Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-28 Thread juzhe.zh...@rivai.ai

This is existing bug in GCC 13. I think I should split into 2 patches.

juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-05-29 11:17
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw; pan2.li
Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
LGTM, but with one question.

On Fri, May 26, 2023 at 7:36 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch support FMA auto-vectorization pattern.
> 1. Let's RA decide vmacc or vmadd.
> 2. Fix bug of vector.md which generate incorrect information to VSETVL
>PASS when testing ternop-3.c.

Does this bug also appear in GCC 13? or this is new bug introduced at trunk

Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization

2023-05-28 Thread juzhe.zh...@rivai.ai

This patch is fixing VSETVL PASS bug. Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-26 11:01
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; 
Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
From: Juzhe-Zhong 
 
Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974
 
PR target/109974
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  | 30 ++-
.../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++
2 files changed, 46 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9847d649d1d..fe55f4ccd30 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
 return false;
   if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2)))
 return false;
-  gcc_assert (insn1->uses ().size () == insn2->uses ().size ());
+  /* RTL_SSA uses include REG_NOTE. Consider this following case:
+
+ insn1 RTL:
+ (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159])
+   (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (const_int 2 [0x2]))
+ (nil)))
+ The RTL_SSA uses of this instruction has 2 uses:
+ 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice.
+ 2. (reg:DI 14 a4 [276]) - once.
+
+ insn2 RTL:
+ (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160])
+   (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200])
+ (const_int 2 [0x2]))
+ (nil)))
+  The RTL_SSA uses of this instruction has 3 uses:
+ 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once
+ 2. (reg:DI 14 a4 [276]) - once
+ 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once
+
+  Return false when insn1->uses ().size () != insn2->uses ().size ()
+  */
+  if (insn1->uses ().size () != insn2->uses ().size ())
+return false;
   for (size_t i = 0; i < insn1->uses ().size (); i++)
 if (insn1->uses ()[i] != insn2->uses ()[i])
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
new file mode 100644
index 000..06a8562ebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include 
+
+void
+func (int8_t *__restrict x, int64_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i++, j +=2 )
+  {
+x[i + 0] += 1;
+y[j + 0] += 1;
+y[j + 1] += 2;
+  }
+}
+
+/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
"-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.36.3

Re: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization

2023-05-28 Thread juzhe.zh...@rivai.ai

Yes.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-29 12:36
To: juzhe.zh...@rivai.ai
CC: Kito.cheng; Robin Dapp; gcc-patches; jeffreyalaw; palmer; palmer; pan2.li
Subject: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
Ok, and just make sure this only appear for trunk, right?

juzhe.zh...@rivai.ai 於 2023年5月29日 週一，12:19寫道：
This patch is fixing VSETVL PASS bug. Ok for trunk ?



juzhe.zh...@rivai.ai

From: juzhe.zhong
Date: 2023-05-26 11:01
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; 
Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
From: Juzhe-Zhong 

Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974

PR target/109974

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.

---
gcc/config/riscv/riscv-vsetvl.cc  | 30 ++-
.../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++
2 files changed, 46 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9847d649d1d..fe55f4ccd30 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
 return false;
   if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2)))
 return false;
-  gcc_assert (insn1->uses ().size () == insn2->uses ().size ());
+  /* RTL_SSA uses include REG_NOTE. Consider this following case:
+
+ insn1 RTL:
+ (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159])
+   (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (const_int 2 [0x2]))
+ (nil)))
+ The RTL_SSA uses of this instruction has 2 uses:
+ 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice.
+ 2. (reg:DI 14 a4 [276]) - once.
+
+ insn2 RTL:
+ (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160])
+   (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200])
+ (const_int 2 [0x2]))
+ (nil)))
+  The RTL_SSA uses of this instruction has 3 uses:
+ 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once
+ 2. (reg:DI 14 a4 [276]) - once
+ 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once
+
+  Return false when insn1->uses ().size () != insn2->uses ().size ()
+  */
+  if (insn1->uses ().size () != insn2->uses ().size ())
+return false;
   for (size_t i = 0; i < insn1->uses ().size (); i++)
 if (insn1->uses ()[i] != insn2->uses ()[i])
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
new file mode 100644
index 000..06a8562ebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include 
+
+void
+func (int8_t *__restrict x, int64_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i++, j +=2 )
+  {
+x[i + 0] += 1;
+y[j + 0] += 1;
+y[j + 1] += 2;
+  }
+}
+
+/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
"-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.36.3

Re: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-29 Thread juzhe.zh...@rivai.ai

Hi, this patch is same implementation as FMA which has been merged.
Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-29 14:53
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support
From: Juzhe-Zhong 
 
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
 
---
gcc/config/riscv/autovec.md   |  45 
.../riscv/rvv/autovec/ternop/ternop-4.c   |  28 +
.../riscv/rvv/autovec/ternop/ternop-5.c   |  34 ++
.../riscv/rvv/autovec/ternop/ternop-6.c   |  33 ++
.../riscv/rvv/autovec/ternop/ternop_run-4.c   |  84 ++
.../riscv/rvv/autovec/ternop/ternop_run-5.c   | 104 ++
.../riscv/rvv/autovec/ternop/ternop_run-6.c   | 104 ++
7 files changed, 432 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index eff3e484fb4..a1028d71467 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -606,3 +606,48 @@
   }
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] VNMSAC and VNMSUB
+;; -
+;; Includes:
+;; - vnmsac
+;; - vnmsub
+;; -
+
+(define_expand "fnma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+   (minus:VI
+ (match_operand:VI 3 "register_operand"   " vr")
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " vr")
+   (match_operand:VI 2 "register_operand" " vr"
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fnma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (minus:VI
+   (match_operand:VI 3 "register_operand"   " vr,  0,   vr")
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+ (match_operand:VI 2 "register_operand" " vr, vr,   vr"
+   (clobber (match_scratch:SI 4 "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+PUT_MODE (operands[4], Pmode);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+if (which_alternative == 2)
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+riscv_vector::RVV_TERNOP, ops, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vimuladd")
+   (set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
new file mode 100644
index 000..22d11de89a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE)
\
+  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,
\
+   TYPE *__restrict a,  \
+   TYPE *__restrict b, int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] += -(a[i] *

Re: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-29 Thread juzhe.zh...@rivai.ai

Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-29 12:35
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV 
auto-vectorization support
From: Juzhe-Zhong 
 
Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc 
is not updated
and we can't support mode switching for this.
 
We can support floating-point to integer conversion now since it's not 
depending on FRM and
we don't need mode switching support for this ('rtz' conversions independent 
FRM).
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New 
test.
 
---
gcc/config/riscv/autovec.md   | 23 
gcc/config/riscv/iterators.md |  4 +-
gcc/config/riscv/vector-iterators.md  |  5 ++
.../rvv/autovec/conversions/vfcvt_rtz-run.c   | 52 +++
.../autovec/conversions/vfcvt_rtz-rv32gcv.c   |  6 +++
.../autovec/conversions/vfcvt_rtz-rv64gcv.c   |  6 +++
.../autovec/conversions/vfcvt_rtz-template.h  | 15 ++
7 files changed, 110 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b24867ae4d0..3989ffb26ee 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -478,6 +478,29 @@
   DONE;
})
+;; =
+;; == Conversions
+;; =
+
+;; -
+;;  [INT<-FP] Conversions
+;; -
+;; Includes:
+;; - vfcvt.rtz.xu.f.v
+;; - vfcvt.rtz.x.f.v
+;; -
+
+(define_expand "2"
+  [(set (match_operand: 0 "register_operand")
+ (any_fix:
+   (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
;; =
;; == Unary arithmetic
;; =
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8afe98e4410..d374a10810c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -225,7 +225,9 @@
(ss_minus "sssub")
(us_minus "ussub")
(sign_extend "extend")
- (zero_extend "zero_extend")])
+ (zero_extend "zero_extend")
+ (fix "fix_trunc")
+ (unsigned_fix "fixuns_trunc")])
;;  code attributes
(define_code_attr or_optab [(ior "ior")
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70fb5b80b1b..937ec3c7f67 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1208,6 +1208,11 @@
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") 
(VNx16DF "VNx16DI")
])
+(define_mode_attr vconvert [
+  (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") 
(VNx16SF "vnx16si") (VNx32SF "vnx32si")
+  (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") 
(VNx16DF "vnx16di")
+])
+
(define_mode_attr VNCONVERT [
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") 
(VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") 
(VNx16DI "VNx16SF")
diff --git 
a/gcc/testsuite/gcc.target/riscv/r

Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-29 Thread juzhe.zh...@rivai.ai


>> /* Return true if MODE is true VLS mode.  */
>> bool
>> vls_mode_p (machine_mode mode)
>> {
>>   switch (mode)
>> {
>> case E_V4SImode:
>> case E_V2DImode:
>> case E_V8HImode:
>> case E_V16QImode:
>>   return true;
>> default:
>>   return false;
>> }
>> }
To be consistent, you should put these into riscv-vector-switching.def.
It can make the function easier extend,change it like this:
change name into riscv_v_ext_vls_mode_p 
bool
riscv_v_ext_vls_mode_p (machine_mode mode)
{
#define VLS_ENTRY(MODE, REQUIREMENT, ...)   
   \
  case MODE##mode: \
return REQUIREMENT;
  switch (mode)
{
#include "riscv-vector-switch.def"
default:
  return false;
}
  return false;
}
Then in riscv-vector-switch.def
VLS_ENTRY (V4SI...
VLS_ENTRY (V2DI..
...
In the future, we extend more VLS modes in riscv-vector-switch.def

>>(define_insn_and_split "3"
>>  [(set (match_operand:VLS 0 "register_operand" "=vr")
>>  (any_int_binop_no_shift:VLS
>>(match_operand:VLS 1 "register_operand" "vr")
>>(match_operand:VLS 2 "register_operand" "vr")))]
>>  "TARGET_VECTOR"
>>  "#"
>>  "reload_completed"
>>  [(const_int 0)]
>>+{
>>  machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode);
>>  riscv_vector::vls_insn_expander (
>>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP,
>>operands, mode, vla_mode);
>>  DONE;
>>})
This pattern can work for current VLS modes so far since they are within 0~31, 
if we add more VLSmodes such as V32QImode, V64QImode,
it can't work . I am ok with this, but I should remind you early.

>> # VLS test
>>gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vls/*.\[cS\]]] \
>>  "" $CFLAGS
>>Add tests with -march=rv64gcv_zvl256b to see whether your testcase can 
>>generate LMUL = mf2 vsetvliand -march=rv64gcv_zvl2048 make sure your testcase 
>>will not go into the VLS modes (2048 * 1 / 8 > 128) 
For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS 
modes.I wonder how these VLS modes emit correct VSETVL?For example in vector.md:
(define_attr "sew" ""
  (cond [(eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI,\
VNx1BI,VNx2BI,VNx4BI,VNx8BI,VNx16BI,VNx32BI,VNx64BI,\
VNx128QI,VNx128BI,VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,\
VNx2x16QI,VNx3x16QI,VNx4x16QI,VNx5x16QI,VNx6x16QI,VNx7x16QI,VNx8x16QI,\
VNx2x8QI,VNx3x8QI,VNx4x8QI,VNx5x8QI,VNx6x8QI,VNx7x8QI,VNx8x8QI,\
VNx2x4QI,VNx3x4QI,VNx4x4QI,VNx5x4QI,VNx6x4QI,VNx7x4QI,VNx8x4QI,\
VNx2x2QI,VNx3x2QI,VNx4x2QI,VNx5x2QI,VNx6x2QI,VNx7x2QI,VNx8x2QI,\
VNx2x1QI,VNx3x1QI,VNx4x1QI,VNx5x1QI,VNx6x1QI,VNx7x1QI,VNx8x1QI")
   (const_int 8)
   (eq_attr "mode" "VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,\
VNx2x32HI,VNx2x16HI,VNx3x16HI,VNx4x16HI,\
VNx2x8HI,VNx3x8HI,VNx4x8HI,VNx5x8HI,VNx6x8HI,VNx7x8HI,VNx8x8HI,\
VNx2x4HI,VNx3x4HI,VNx4x4HI,VNx5x4HI,VNx6x4HI,VNx7x4HI,VNx8x4HI,\
VNx2x2HI,VNx3x2HI,VNx4x2HI,VNx5x2HI,VNx6x2HI,VNx7x2HI,VNx8x2HI,\
VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI")
   (const_int 16)
   (eq_attr "mode" "VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,\
VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,\
VNx2x16SI,VNx2x8SI,VNx3x8SI,VNx4x8SI,\
VNx2x4SI,VNx3x4SI,VNx4x4SI,VNx5x4SI,VNx6x4SI,VNx7x4SI,VNx8x4SI,\
VNx2x2SI,VNx3x2SI,VNx4x2SI,VNx5x2SI,VNx6x2SI,VNx7x2SI,VNx8x2SI,\
VNx2x1SI,VNx3x1SI,VNx4x1SI,VNx5x1SI,VNx6x1SI,VNx7x1SI,VNx8x1SI,\
VNx2x16SF,VNx2x8SF,VNx3x8SF,VNx4x8SF,\
VNx2x4SF,VNx3x4SF,VNx4x4SF,VNx5x4SF,VNx6x4SF,VNx7x4SF,VNx8x4SF,\
VNx2x2SF,VNx3x2SF,VNx4x2SF,VNx5x2SF,VNx6x2SF,VNx7x2SF,VNx8x2SF,\
VNx2x1SF,VNx3x1SF,VNx4x1SF,VNx5x1SF,VNx6x1SF,VNx7x1SF,VNx8x1SF")
   (const_int 32)
   (eq_attr "mode" "VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,\
VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,\
    VNx2x8DI,VNx2x4DI,VNx3x4DI,VNx4x4DI,\
VNx2x2DI,VNx3x2DI,VNx4x2DI,VNx5x2DI,VNx6x2DI,VNx7x2DI,VNx8x2DI,\
VNx2x1DI,VNx3x1DI,VNx4x1DI,VNx5x1DI,VNx6x1DI,VNx7x1DI,VNx8x1DI,\
VNx2x8DF,VNx2x4DF,VNx3x4DF,VNx4x4DF,\
VNx2x2DF,VNx3x2DF,VNx4x2DF,VNx5x2DF,VNx6x2DF,VNx7x2DF,VNx8x2DF,\
VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF")
   (const_int 64)]
  (const_int INVALID_ATTRIBUTE)))




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-30 14:06
To: gcc-patches; palmer; kito.cheng; juzh

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

Ok.  LGTM as long as you change the patch as I suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-30 14:51
To: juzhe.zh...@rivai.ai
CC: gcc-patches; palmer; kito.cheng; jeffreyalaw; Robin Dapp; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >> /* Return true if MODE is true VLS mode.  */
> >> bool
> >> vls_mode_p (machine_mode mode)
> >> {
> >>   switch (mode)
> >> {
> >> case E_V4SImode:
> >> case E_V2DImode:
> >> case E_V8HImode:
> >> case E_V16QImode:
> >>   return true;
> >> default:
> >>   return false;
> >> }
> >> }
>
> To be consistent, you should put these into riscv-vector-switching.def.
> It can make the function easier extend,change it like this:
> change name into riscv_v_ext_vls_mode_p
>
> bool
> riscv_v_ext_vls_mode_p (machine_mode mode)
> {
> #define VLS_ENTRY(MODE, REQUIREMENT, ...) 
>  \
>   case MODE##mode:
>  \
> return REQUIREMENT;
>   switch (mode)
> {
> #include "riscv-vector-switch.def"
> default:
>   return false;
> }
>   return false;
> }
>
> Then in riscv-vector-switch.def
> VLS_ENTRY (V4SI...
> VLS_ENTRY (V2DI..
> ...
> In the future, we extend more VLS modes in riscv-vector-switch.def
 
Good point, we should make this more consistent :)
 
> >>(define_insn_and_split "3"
> >>  [(set (match_operand:VLS 0 "register_operand" "=vr")
> >> (any_int_binop_no_shift:VLS
> >>  (match_operand:VLS 1 "register_operand" "vr")
> >>  (match_operand:VLS 2 "register_operand" "vr")))]
> >>  "TARGET_VECTOR"
> >>  "#"
> >>  "reload_completed"
> >>  [(const_int 0)]
> >>+{
> >>  machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode);
> >>  riscv_vector::vls_insn_expander (
> >>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP,
> >>operands, mode, vla_mode);
> >>  DONE;
> >>})
>
> This pattern can work for current VLS modes so far since they are within 
> 0~31, if we add more VLSmodes such as V32QImode, V64QImode,
> it can't work . I am ok with this, but I should remind you early.
 
Yeah, I Know the problem, my thought is we will have another set of
VLS patterns for those NUNITS >= 32, and require one clobber with GPR.
 
> Add tests with -march=rv64gcv_zvl256b to see whether your testcase can 
> generate LMUL = mf2 vsetvli
>
> and -march=rv64gcv_zvl2048 make sure your testcase will not go into the VLS 
> modes (2048 * 1 / 8 > 128)
 
I guess I should make a loop to test those combinations instead of
spearted file but with different options.
 
>
>
> For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS 
> modes.
>
> I wonder how these VLS modes emit correct VSETVL?
 
That's the magic I made here, I split the pattern after RA, but before
vsetvli, and convert all operands to VLA mode and use VLA pattern, so
that we don't need to modify any line of vsetvli stuff.

Re: [PATCH] VECT: Add SELECT_VL support

2023-05-30 Thread juzhe.zh...@rivai.ai

Hi, this patch is bootstrapped PASS.

Ok for trunk ?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-25 23:26
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -551,9 +551,14 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Create decrement IV.  */
   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
insert_after, &index_before_incr, &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  tree len = NULL_TREE;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+ index_before_incr, nitems_step);
+  else
+ len = gimple_build (header_seq, MIN_EXPR, iv_type, index_before_incr,
+ nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
   *iv_step = step;
   return index_after_incr;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..f67340976c8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,14 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
+  /* If we're using decrement IV and SELECT_VL is supported by the target.
+ Use output of SELECT_VL to adjust IV of loop control and data reference.
+ Note: We only use SELECT_VL on single-rgroup control.  */
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
+  && !slp)
+LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
+
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
  to be able to handle fewer than VF scalars, or needs to have a lower VF
  than the main loop.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 127b987cd62..8e8b0f71a4a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo,
   return iv_step;
}
+/* Prepare the pointer IVs which needs to be updated by a variable amount.
+   Such variable amount is the outcome of .SELECT_VL. In this case, we can
+   allow each iteration process the flexible number of elements as long as
+   the number <= vf elments.
+
+   Return data reference according to SELECT_VL.
+   If new statements are needed, insert them before GSI.  */
+
+static tree
+get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
+ tree aggr_type, class loop *at_loop, tree offset,
+ tree *dummy, gimple_stmt_iterator *gsi,
+ bool simd_lane_access_p, vec_loop_lens *loop_lens,
+ dr_vec_info *dr_info,
+ vect_memory_access_type memory_access_type)
+{
+  loop_vec_info loop_vinfo = dyn_cast (vinfo);
+  tree step = vect_dr_behavior (vinfo, dr_info)->step;
+
+  /* TODO: We don't support gather/scatter or load_lanes/store_lanes for 
pointer
+ IVs are updated by variable amount but we will support them in the future.
+   */
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+   && memory_access_type != VMAT_LOAD_STORE_LANES);
+
+  /* When we support SELECT_VL pattern, we dynamic adjust
+ the memory address by .SELECT_VL result.
+
+ The result of .SELECT_VL is the number of elements to
+ be processed of each iteration. So the memory address
+ adjustment operation should be:
+
+ bytesize = GET_MODE_SIZE (element_mode (aggr_type));
+ addr = addr + .SELECT_VL (ARG..) * bytesize;
+  */
+  gimple *ptr_incr;
+  tree loop_len
+= vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0, 0);
+  tree len_type = TREE_TYPE (loop_len);
+  poly_uint64 bytesize = GET_MODE_SIZE (

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

>> why is the conversion after register allocation always
>> safe?
I do worry about this issue too. 
I just notice :

+   case MEM:
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX);

I am not sure whether it is safe.

>> Couldn't we "lower" the fixed-length vectors to VLA at some point and
>> how does everything relate to fixed-vlmax?

I can answer you why we need this patch (I call it fixed-vlmin).
You can take a look at this example:
https://godbolt.org/z/3jYqoM84h 

This is how LLVM works.
This example, you can see GCC need --param=riscv-autovec-preference=fixed-vlmax 
-march=rv64gcv (same as mrvv-vector-bits=128).
However, LLVM doesn't need to specify the vector-length.

The benefits:
1. We don't need to specify actual real vector length, then we can vectorize 
this example.
2. GCC codegen can only run on CPU with vector length=128. However, LLVM can 
run on any RVV CPU with vector length >= 128.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 15:27
To: Kito Cheng; gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; 
pan2.li
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
Hi Kito,
 
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
> 
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
> 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
> 
> This is compatible with VLA vectorization.
> 
> Only support move and binary part of operation patterns.
 
On a high-level:  Why do we need to do it this way and not any other way? :)
Some more comments/explanations would definitely help, i.e. prior art on
aarch64, what exactly is easier for combine and friends now (no undef and so
on) and, importantly, why is the conversion after register allocation always
safe?  Couldn't we "lower" the fixed-length vectors to VLA at some point and
how does everything relate to fixed-vlmax? Essentially this is a "separate"
backend similar to ARM NEON but we share most of the things and possibly grow
it in the future?
 
What would the alternative be?
 
That said, couldn't we reuse the existing binop tests?  If you don't like them
change the existing ones as well and reuse then?
 
> +/* Return the minimal containable VLA mode for MODE.  */
> +
> +machine_mode
> +minimal_vla_mode (machine_mode mode)
> +{
> +  gcc_assert (GET_MODE_NUNITS (mode).is_constant ());
> +  unsigned type_size = GET_MODE_NUNITS (mode).to_constant ();
 
Couldn't you use .require () right away?  Same in some other hunks.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

Hi, Richi.

>> but ideally the user would be able to specify -mrvv-size=32 for an
>> implementation with 32 byte vectors and then vector lowering would make use
>> of vectors up to 32 bytes?

Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
GNU vectors.
You can take a look this example:
https://godbolt.org/z/3jYqoM84h 

GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
can run on CPU with vector-length = 128bit.
However, LLVM doesn't need to specify the vector length, and the codegen can 
run on any CPU with RVV  vector-length >= 128 bits.

This is what this patch want to do.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-05-30 15:13
To: Kito Cheng
CC: gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; rdapp.gcc; 
pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 8:07 AM Kito Cheng via Gcc-patches
 wrote:
>
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
>
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.

In the patch you added fixed 16 bytes vector modes, correct?  I've
never looked at
how ARM deals with the GNU vector extensions but I suppose they get mapped
to NEON and not SVE so basically behave the same way here.

But I do wonder about the efficiency for RVV where there doesn't exist a
complementary fixed-length ISA.  Shouldn't vector lowering
(tree-vect-generic.cc)
be enhanced to support lowering fixed-length vectors to variable length ones
with (variable) fixed length instead?  From your patch I second-guess the RVV
specification requires 16 byte vectors to be available (or will your
patch split the
insns?) but ideally the user would be able to specify -mrvv-size=32 for an
implementation with 32 byte vectors and then vector lowering would make use
of vectors up to 32 bytes?

Also vector lowering will split smaller vectors not equal to the fixed size to
scalars unless you add all fixed length modes smaller than 16 bytes as well.

> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
>
> This is compatible with VLA vectorization.
>
> Only support move and binary part of operation patterns.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def: Introduce VLS modes.
> * config/riscv/riscv-protos.h (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::vls_mode_p): New.
> * config/riscv/riscv-v.cc (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_mode_p): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::update_vls_mode): New.
> * config/riscv/riscv.cc (riscv_v_ext_mode_p): New.
> (riscv_v_adjust_nunits): Handle VLS type.
> (riscv_hard_regno_nregs): Ditto.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_regmode_natural_size): Ditto.
> * config/riscv/vector-iterators.md (VLS): New.
> (VM): Handle VLS type.
> (vel): Ditto.
> * config/riscv/vector.md: Include vector-vls.md.
> * config/riscv/vector-vls.md: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add vls folder.
> * gcc.target/riscv/rvv/vls/binop-template.h: New test.
> * gcc.target/riscv/rvv/vls/binop-v.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/move-template.h: New test.
> * gcc.target/riscv/rvv/vls/move-v.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-template.h: New test.
> * gcc.target/riscv/rvv/vls/load-store-v.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/vls-types.h: New test.
> ---
>  gcc/config/riscv/riscv-modes.def  |  3 +
>  gcc/config/riscv/riscv-protos.h   |  4 ++
>  gcc/config/riscv/riscv-v.cc   | 67 +++
>  gcc/config/riscv/riscv.cc | 27 +++-
&g

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

In the future, we will definitely mixing VLA and VLS-vlmin together in a 
codegen and it will not cause any issues.
For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am not 
sure since my SELECT_VL patch is not
finished, I will check if can work when I am working in SELECT_VL patch).

>> In general I don't have a good overview of which optimizations we gain by
>> such an approach or rather which ones are prevented by VLA altogether?
These patches VLS modes can help for SLP auto-vectorization.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 17:05
To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>>> but ideally the user would be able to specify -mrvv-size=32 for an
>>> implementation with 32 byte vectors and then vector lowering would make use
>>> of vectors up to 32 bytes?
> 
> Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
> GNU vectors.
> You can take a look this example:
> https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> 
> 
> GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
> can run on CPU with vector-length = 128bit.
> However, LLVM doesn't need to specify the vector length, and the codegen can 
> run on any CPU with RVV  vector-length >= 128 bits.
> 
> This is what this patch want to do.
> 
> Thanks.
I think Richard's question was rather if it wasn't better to do it more
generically and lower vectors to what either the current cpu or what the
user specified rather than just 16-byte vectors (i.e. indeed a fixed
vlmin and not a fixed vlmin == fixed vlmax).
 
This patch assumes everything is fixed for optimization purposes and then
switches over to variable-length when nothing can be changed anymore.  That
is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
We would need to make sure that no pass after reload makes use of VLA
properties at all.
 
In general I don't have a good overview of which optimizations we gain by
such an approach or rather which ones are prevented by VLA altogether?
What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
what we would have for pure VLA?
 
Regards
Robin

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai

Ok.

It seems that for this conditions:

+  /* If we're vectorizing a loop that uses length "controls" and
+ can iterate more than once, we apply decrementing IV approach
+ in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo
+LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

I should add direct_supportted_p (SELECT_VL...) to this is that right?

I have send SELECT_VL patch. I will add this in next SELECT_VL patch.

Let's wait Richard's more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:22
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi. Thanks for your analysis and helps.
> 
> >> We could simply retain the original
> >> incrementing IV for loop control and add the decrementing
> >> IV for computing LEN in addition to that and leave IVOPTs
> >> sorting out to eventually merge them (or not).
> 
> I am not sure how to do that. Could you give me more informations?
> 
> I somehow understand your concern is that variable amount of IV will make
> IVOPT fails. 
> 
> I have seen similar situation in LLVM (when apply variable IV,
> they failed to interleave the vectorize code). I am not sure whether they
> are the same reason for that.
> 
> For RVV, we not only want decrement IV style in vectorization but also
> we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM 
> also only apply get_vector_length in single vector length).
>
> >>You can do some testing with a cross compiler, alternatively
> >>there are powerpc machines in the GCC compile farm.
> 
> It seems that Power is ok with decrement IV since most cases are improved.
 
Well, but Power never will have SELECT_VL so at least for !SELECT_VL
targets you should avoid having an IV with variable decrement.  As
I said it should be easy to rewrite decrement IV to use a constant
increment (when not using SELECT_VL) and testing the pre-decrement
value in the exit test.
 
Richard.
> I think Richard may help to explain decrement IV more clearly.
> 
> Thanks
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-26 14:46
> To: ???
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, ??? wrote:
>  
> > Yesterday's patch has been approved (decremnt IV support):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > 
> > However, it creates fails on PowerPC:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > 
> > I am really sorry for causing inconvinience.
> > 
> > I wonder as we disccussed:
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > This conditions can not disable decrement IV on PowerPC.
> > Should I add a target hook for it?
>  
> No.  I've put some analysis in the PR.  To me the question is
> why (without that SELECT_VL case) we need a decrementing IV
> _for the loop control_?  We could simply retain the original
> incrementing IV for loop control and add the decrementing
> IV for computing LEN in addition to that and leave IVOPTs
> sorting out to eventually merge them (or not).
>  
> Alternatively avoid the variable decrement as I wrote in the
> PR and do the exit test based on the previous IV value.
>  
> But as said all this won't work for the SELECT_VL case, but
> then it's availability is something to key off rather than a
> new target hook?
>  
> > The patch I can only do bootstrap and regression on X86.
> > I didn't have an environment to test PowerPC. I am really sorry.
>  
> You can do some testing with a cross compiler, alternatively
> there are powerpc machines in the GCC compile farm.
>  
> Richard.
>  
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

>> For the future it would be then good to have the vectorizer
>>re-vectorize loops with
>>VLS vector uses to VLA style?
 Not really, this patch is just using a magic convert VLS vector into VLA stype 
since
 it can avoid defining the RVV patterns with VLS modes and avoid a lot of work.

 There is no benefits in case of convert VLS into VLS
 And I don't even consider it's safe.

especially this code:
+   case MEM: 
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX); 

I feel it is unsafe code.

Actually, my original plan is to define new RVV patterns with new VLS modes 
(The patterns are same as VLA patterns, just modes are different).
Then emit codegen this VLS RVV patterns.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> ____
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h>
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
> Robin
>

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai

I think I prefer doing VLS mode like these:
This is current VLA patterns:
(define_insn "@pred_"
  [(set (match_operand:VI 0 "register_operand"   "=vd, vd, vr, vr, vd, 
vd, vr, vr, vd, vd, vr, vr")
  (if_then_else:VI
(unspec:
  [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, 
vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
   (match_operand 5 "vector_length_operand"" rK, rK, rK,  rK, rK, rK, 
rK, rK, rK, rK, rK, rK")
   (match_operand 6 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 7 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 8 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(any_int_binop:VI
  (match_operand:VI 3 "" "")
  (match_operand:VI 4 "" ""))
(match_operand:VI 2 "vector_merge_operand" 
"vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))]
  "TARGET_VECTOR"
  "@
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1"
  [(set_attr "type" "")
   (set_attr "mode" "")])

(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])

You can see there is no VLS modes in "VI". Now to support VLS, I think we 
should extend "VI" iterator:
(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
V4SI V2DI V8HI V16QI
])

Then codegen directly to this VLS patterns without any conversion.
This is the safe way to deal with VLS patterns.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic V

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai

>> No, since powerpc is fine with decrementing VL it should also use it.
>>Instead you should make sure to produce SCEV analyzable IVs when
>>possible (when SELECT_VL is not or cannot be used).
Ok. Would you mind giving me the guideline how to rewrite the decrement IV?
Since I am not familiar with SCEV and I am not sure how to do that SCEV can 
analysis the decrement IV.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > &

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai

>> No, I said the current scheme does sth along

>> do {
>>remain -= MIN (vf, remain);
>> } while (remain != 0);

>> and I suggest to instead do

>> do {
>>old_remain = remain;
>>len = MIN (vf, remain);
>>remain -= vf;
>> } while (old_remain >= vf);

>> basically since only the last iteration will have len < vf we can
>> ignore that remain -= vf will underflow there if we appropriately
>> rewrite the exit test to use the pre-decrement value.

Oh, I understand you now. I will definitely have a try and send a patch.

Thank you so much.

By the way, could you take a look at SELECT_VL patch?
I guess you want to defer it to Richard and I will wait but still I think your 
comment is very important.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 18:00
To: Kewen.Lin
CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford
Subject: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, Kewen.Lin wrote:
 
> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> > Ok.
> > 
> > It seems that for this conditions:
> > 
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > 
> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
> 
> I guess no, with this condition any targets without SELECT_VL are unable
> to leverage the new decrement scheme for lengths, as your reply in PR109971
> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
> one new IV just like the previous one which has non-variable step, then it's
> SCEV-ed and some analysis based on it can do a good job.
 
No, I said the current scheme does sth along
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
and I suggest to instead do
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
basically since only the last iteration will have len < vf we can
ignore that remain -= vf will underflow there if we appropriately
rewrite the exit test to use the pre-decrement value.
 
> Since this is mainly for targets without SELECT_VL capability, I can follow
> up this if you don't mind.
> 
> BR,
> Kewen
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai

Hi, Richi.
I have send patch by following your suggestion and change the decrement IV 
follow:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html 

It works well in RVV.

Could you take a look at it?
If it's ok, I will send patch of SELECT_VL base on this.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > +&

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai

Before this patch:
foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
  sub   a2,a2,a5
bne a2,zero,.L3
.L5:
ret

After this patch:

foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
neg a7,a4   -->>>additional instruction
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
mv a6,a2  -->>>additional instruction
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
add a2,a2,a7
bgtu a6,a4,.L3
.L5:
ret

There is 1 more instruction in preheader and 1 more instruction in loop.
But I think it's OK for RVV since we will definitely be using SELECT_VL so this 
issue will gone.
As long as this flow is better to power (SCEV）。



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai

>> How does it affect RVV code quality?  I thought you specifically chose
>> the previous approach because code quality was better that way.
Yes, previous way is better for RVV.  But as I said, we will definitely use 
SELECT_VL then
in SELECT_VL,  we will using remain - step (produced by SELET_VL).



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai

Hi，all. I have posted my several investigations:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 

Turns out when "niters is a constant value and vf is a constant value"
This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase from 
IBM's testsuite for example) and I think this patch can fix IBM's cunroll issue.
Even though it will produce a 'mv' instruction in some ohter cases for RVV, I 
think Gain > Pain overal.

Actually, for current flow:

step = MIN ()
...
remain = remain - step.

I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
So, could you make a decision for this patch?

I wonder whether we should apply the approach of this patch (the codes can be 
refined after well reviewed) or
we should extend SCEV/IVOPTS ?

Thanks. 


juzhe.zh...@rivai.ai
 
From: 钟居哲
Date: 2023-05-30 23:05
To: rguenther
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
More information of power's testcase:

Before this patch:
test_npeel_int16_t:
lui a4,%hi(.LANCHOR0+130)
lui a3,%hi(.LANCHOR1)
addi a3,a3,%lo(.LANCHOR1)
addi a4,a4,%lo(.LANCHOR0+130)
li a5,58
li a2,16
vsetivli zero,16,e16,m1,ta,ma
vl1re16.v v3,0(a3)
vid.v v1
.L5:
minu a3,a5,a2
vsetvli zero,a3,e16,m1,ta,ma
sub a5,a5,a3
vse16.v v1,0(a4)
vsetivli zero,16,e16,m1,ta,ma
addi a4,a4,32
vadd.vv v1,v1,v3
bne a5,zero,.L5
ret

After this patch:
test_npeel_int16_t:
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
li a1,16
vsetivli zero,16,e16,m1,ta,ma
addi a2,a5,130
vid.v v1
addi a3,a5,162
vadd.vx v4,v1,a1
addi a4,a5,194
li a1,32
vadd.vx v3,v1,a1
vse16.v v1,0(a2)
vse16.v v4,0(a3)
vse16.v v3,0(a4)
addi a5,a5,226
li a1,48
vadd.vx v2,v1,a1
vsetivli zero,10,e16,m1,ta,ma
vse16.v v2,0(a5)
ret

It's obvious, previously, power's testcase in RVV side can not unroll, but 
after this patch, in RVV side, it can unroll now.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
>  Replied Message 
> From
> Richard Sandiford
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai
> Cc
> gcc-patches,
> rguenther,
> linkw
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai"  writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >   sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai


Hi, Richi.

>> Note with SELECT_VL all bets will be off since as I understand the
>> value it gives can vary from iteration to iteration (but we know
>> a lower and maybe an upper bound?)
Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), vlmax], 
can be any value between the range depending on the hardware implementation.

>> So I think we should patch this up in the vectorizer itself like with
>> your patch.  I'm going to wait for Richards input though since he
>> seems to disagree.

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, 
Kewen is happy with this patch, turns out this patch can fix power's issue.
So, Let's wait for Richard's comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-31 14:41
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi?all. I have posted my several investigations:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
> 
> Turns out when "niters is a constant value and vf is a constant value"
> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
> from IBM's testsuite for example) and I think this patch can fix IBM's 
> cunroll issue.
> Even though it will produce a 'mv' instruction in some ohter cases for RVV, I 
> think Gain > Pain overal.
> 
> Actually, for current flow:
> 
> step = MIN ()
> ...
> remain = remain - step.
> 
> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
> So, could you make a decision for this patch?
> 
> I wonder whether we should apply the approach of this patch (the codes can be 
> refined after well reviewed) or
> we should extend SCEV/IVOPTS ?
 
I don't think we can do anything in SCEV for this which means we'd
need to special-case this in niter analysis, in IVOPTs and any other
passes that might be affected (and not fixed by handling it in niter
analysis).  While improving niter analysis would be good (the user
could write this pattern as well) I do not have time to try
implementing that (I have no idea how ugly or robust it is going to be).
 
So I think we should patch this up in the vectorizer itself like with
your patch.  I'm going to wait for Richards input though since he
seems to disagree.
 
Note with SELECT_VL all bets will be off since as I understand the
value it gives can vary from iteration to iteration (but we know
a lower and maybe an upper bound?)
 
Thanks,
Richard.
 
> Thanks. 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: ???
> Date: 2023-05-30 23:05
> To: rguenther
> CC: richard.sandiford; gcc-patches; linkw
> Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
> More information of power's testcase:
> 
> Before this patch:
> test_npeel_int16_t:
> lui a4,%hi(.LANCHOR0+130)
> lui a3,%hi(.LANCHOR1)
> addi a3,a3,%lo(.LANCHOR1)
> addi a4,a4,%lo(.LANCHOR0+130)
> li a5,58
> li a2,16
> vsetivli zero,16,e16,m1,ta,ma
> vl1re16.v v3,0(a3)
> vid.v v1
> .L5:
> minu a3,a5,a2
> vsetvli zero,a3,e16,m1,ta,ma
> sub a5,a5,a3
> vse16.v v1,0(a4)
> vsetivli zero,16,e16,m1,ta,ma
> addi a4,a4,32
> vadd.vv v1,v1,v3
> bne a5,zero,.L5
> ret
> 
> After this patch:
> test_npeel_int16_t:
> lui a5,%hi(.LANCHOR0)
> addi a5,a5,%lo(.LANCHOR0)
> li a1,16
> vsetivli zero,16,e16,m1,ta,ma
> addi a2,a5,130
> vid.v v1
> addi a3,a5,162
> vadd.vx v4,v1,a1
> addi a4,a5,194
> li a1,32
> vadd.vx v3,v1,a1
> vse16.v v1,0(a2)
> vse16.v v4,0(a3)
> vse16.v v3,0(a4)
> addi a5,a5,226
> li a1,48
> vadd.vx v2,v1,a1
> vsetivli zero,10,e16,m1,ta,ma
> vse16.v v2,0(a5)
> ret
> 
> It's obvious, previously, power's testcase in RVV side can not unroll, but 
> after this patch, in RVV side, it can unroll now.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 20:33
> To: juzhe.zhong
> CC: Richard Sandiford; gcc-patches; linkw
> Subject: Re: [PATCH] VECT: Change flow of decrement IV
> On Tue, 30 May 2023, juzhe.zhong wrote:
>  
> > This patch will generate the number of rgroup ?mov? instructions inside the
> > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 
> > more
> > instruction in loop. If this patch is necessary? I think I should find a way
> > to fix it.
>  
> That's odd, you only need to adjust the IV which is used in the exit test,
> not all the others.
>  
> >  Replied Message 
> > From
>

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Hi, Richard.

>> I don't object though.  It just feels like we're giving up easily.
>> And that's a bit frustrating, since this potential problem was flagged
>> ahead of time.

I can take a look at it. Would you mind giving me some hints?
Should I do this in which PASS ? "ivopts" PASS?
Is that right that we can enhance analysis when we see the statement as follows:
remain = remain - step and step is coming from a MIN_EXPR (remain, vf).
Then what we need to do?
 
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-31 15:28
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
Richard Biener  writes:
> On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi?all. I have posted my several investigations:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
>> 
>> Turns out when "niters is a constant value and vf is a constant value"
>> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
>> from IBM's testsuite for example) and I think this patch can fix IBM's 
>> cunroll issue.
>> Even though it will produce a 'mv' instruction in some ohter cases for RVV, 
>> I think Gain > Pain overal.
>> 
>> Actually, for current flow:
>> 
>> step = MIN ()
>> ...
>> remain = remain - step.
>> 
>> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
>> So, could you make a decision for this patch?
>> 
>> I wonder whether we should apply the approach of this patch (the codes can 
>> be refined after well reviewed) or
>> we should extend SCEV/IVOPTS ?
>
> I don't think we can do anything in SCEV for this which means we'd
> need to special-case this in niter analysis, in IVOPTs and any other
> passes that might be affected (and not fixed by handling it in niter
> analysis).  While improving niter analysis would be good (the user
> could write this pattern as well) I do not have time to try
> implementing that (I have no idea how ugly or robust it is going to be).
>
> So I think we should patch this up in the vectorizer itself like with
> your patch.  I'm going to wait for Richards input though since he
> seems to disagree.
 
I think my main disagreement is that the IV phi can be analysed
as a SCEV with sufficient work (realising that the MIN result is
always VF when the latch is executed).  That SCEV might be useful
“as is” for things like IVOPTS, without specific work in those passes.
(Although perhaps not too useful, since most other IVs will be upcounting.)
 
I don't object though.  It just feels like we're giving up easily.
And that's a bit frustrating, since this potential problem was flagged
ahead of time.
 
> Note with SELECT_VL all bets will be off since as I understand the
> value it gives can vary from iteration to iteration (but we know
> a lower and maybe an upper bound?)
 
Right.  All IVs will have a variable step for SELECT_VL.
 
Thanks,
Richard

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai


>> I'm just saying that to go forward the vectorizer change looks
>>more promising (also considering the pace RISC-V people are working at
>>...)

Yeah,  RVV needs a lot of middle-end support:
SELECT_VL, LEN_MASK_LOAD/LEN_MASK_STORE,.etc

LEN_ADD for RVV reduction support like COND_ADD for ARM SVE...etc

SELECT_VL is still pending.

Without support in middle-end, GCC can not support powerful auto-vectorization 
(Performance will be much worse than RVV LLVM).
And unfortunately, I am the only guy working on middle-end support of RVV 
auto-vectorization. :)

I think we can make this patch merged and record the enhancement of SCEV in 
bugzilla to see we can improve that in the future.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-31 15:38
To: Richard Sandiford
CC: juzhe.zh...@rivai.ai; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Wed, 31 May 2023, Richard Sandiford wrote:
 
> Richard Biener  writes:
> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
> >
> >> Hi?all. I have posted my several investigations:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
> >> 
> >> Turns out when "niters is a constant value and vf is a constant value"
> >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
> >> from IBM's testsuite for example) and I think this patch can fix IBM's 
> >> cunroll issue.
> >> Even though it will produce a 'mv' instruction in some ohter cases for 
> >> RVV, I think Gain > Pain overal.
> >> 
> >> Actually, for current flow:
> >> 
> >> step = MIN ()
> >> ...
> >> remain = remain - step.
> >> 
> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
> >> So, could you make a decision for this patch?
> >> 
> >> I wonder whether we should apply the approach of this patch (the codes can 
> >> be refined after well reviewed) or
> >> we should extend SCEV/IVOPTS ?
> >
> > I don't think we can do anything in SCEV for this which means we'd
> > need to special-case this in niter analysis, in IVOPTs and any other
> > passes that might be affected (and not fixed by handling it in niter
> > analysis).  While improving niter analysis would be good (the user
> > could write this pattern as well) I do not have time to try
> > implementing that (I have no idea how ugly or robust it is going to be).
> >
> > So I think we should patch this up in the vectorizer itself like with
> > your patch.  I'm going to wait for Richards input though since he
> > seems to disagree.
> 
> I think my main disagreement is that the IV phi can be analysed
> as a SCEV with sufficient work (realising that the MIN result is
> always VF when the latch is executed).  That SCEV might be useful
> ?as is? for things like IVOPTS, without specific work in those passes.
> (Although perhaps not too useful, since most other IVs will be upcounting.)
 
I think we'd need another API for SCEV there then,
analyze_scalar_evolution_for_latch () so we can disregard the
value on the exit edges then.  That means we'd still need to touch
all users and decide whether it's safe to use that or not.
 
> I don't object though.  It just feels like we're giving up easily.
> And that's a bit frustrating, since this potential problem was flagged
> ahead of time.
 
Well, I expect that massaging SCEV and niter analysis will take
up quite some developer time while avoiding the situation in
the vectorizer is possible (and would fix the observed regressions).
We can always improve later here and I'd suggest to file an
enhancement bugreport with a simple C testcase using this kind of
iteration.
 
I'm just saying that to go forward the vectorizer change looks
more promising (also considering the pace RISC-V people are working at 
...)
 
Richard.
 
> > Note with SELECT_VL all bets will be off since as I understand the
> > value it gives can vary from iteration to iteration (but we know
> > a lower and maybe an upper bound?)
> 
> Right.  All IVs will have a variable step for SELECT_VL.
> 
> Thanks,
> Richard
>

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Oh, it's correct fix. Thanks for catching this.




juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-05-31 15:38
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; rguenther
Subject: Re: [PATCH] VECT: Change flow of decrement IV
> Hi, Richi.
> 
>>> Note with SELECT_VL all bets will be off since as I understand the
>>> value it gives can vary from iteration to iteration (but we know
>>> a lower and maybe an upper bound?)
> Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), 
> vlmax], 
> can be any value between the range depending on the hardware implementation.
> 
>>> So I think we should patch this up in the vectorizer itself like with
>>> your patch.  I'm going to wait for Richards input though since he
>>> seems to disagree.
> 
> According tohttps://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, 
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971,> 
> Kewen is happy with this patch, turns out this patch can fix power's issue.
 
Yeah, the exposed degradation and failures can be fixed by this patch.
I'd expect both approaches (this patch or extending niter analysis and
others) should work for the exposed issues.
 
A new finding is that my SPEC2017 rerun with this patch exposed some
verification failures, I made a regression test on Power10, it showed
a few failures too (mainly from fortran).  By looking into one of them
(case gfortran.dg/array_alloc_2.f90), I think the patch needs some
adjustment on chosen code according to exit_edge->flags like:
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ef28711c58f..5d518460b6d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -892,8 +892,9 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
 {
   gcc_assert (compare_step);
-  cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step,
-  NULL_TREE, NULL_TREE);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 }
   else
 
I'm running regression testing again based on this adjustment, will see
if it can fix all exposed failures.
 
BR,
Kewen
 
> So, Let's wait for Richard's comments.
> 
> Thanks.
> ------
> juzhe.zh...@rivai.ai
> 
>  
> *From:* Richard Biener <mailto:rguent...@suse.de>
> *Date:* 2023-05-31 14:41
> *To:* juzhe.zh...@rivai.ai <mailto:juzhe.zh...@rivai.ai>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; gcc-patches 
> <mailto:gcc-patches@gcc.gnu.org>; linkw <mailto:li...@linux.ibm.com>
> *Subject:* Re: Re: [PATCH] VECT: Change flow of decrement IV
> On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi?all. I have posted my several investigations:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html
> >
> > Turns out when "niters is a constant value and vf is a constant value"
> > This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take 
> tesecase from IBM's testsuite for example) and I think this patch can fix 
> IBM's cunroll issue.
> > Even though it will produce a 'mv' instruction in some ohter cases for 
> RVV, I think Gain > Pain overal.
> >
> > Actually, for current flow:
> >
> > step = MIN ()
> > ...
> > remain = remain - step.
> >
> > I

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Thanks Richard.
Seems that this patch's approach is ok to trunk?
Maybe the only thing we should do is to wait Kewen's testing feedback, am I 
right ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-31 17:01
To: Richard Biener via Gcc-patches
CC: Richard Biener; juzhe.zhong\@rivai.ai; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
Richard Biener via Gcc-patches  writes:
> On Wed, 31 May 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>> >
>> >> Hi?all. I have posted my several investigations:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
>> >> 
>> >> Turns out when "niters is a constant value and vf is a constant value"
>> >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take 
>> >> tesecase from IBM's testsuite for example) and I think this patch can fix 
>> >> IBM's cunroll issue.
>> >> Even though it will produce a 'mv' instruction in some ohter cases for 
>> >> RVV, I think Gain > Pain overal.
>> >> 
>> >> Actually, for current flow:
>> >> 
>> >> step = MIN ()
>> >> ...
>> >> remain = remain - step.
>> >> 
>> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
>> >> So, could you make a decision for this patch?
>> >> 
>> >> I wonder whether we should apply the approach of this patch (the codes 
>> >> can be refined after well reviewed) or
>> >> we should extend SCEV/IVOPTS ?
>> >
>> > I don't think we can do anything in SCEV for this which means we'd
>> > need to special-case this in niter analysis, in IVOPTs and any other
>> > passes that might be affected (and not fixed by handling it in niter
>> > analysis).  While improving niter analysis would be good (the user
>> > could write this pattern as well) I do not have time to try
>> > implementing that (I have no idea how ugly or robust it is going to be).
>> >
>> > So I think we should patch this up in the vectorizer itself like with
>> > your patch.  I'm going to wait for Richards input though since he
>> > seems to disagree.
>> 
>> I think my main disagreement is that the IV phi can be analysed
>> as a SCEV with sufficient work (realising that the MIN result is
>> always VF when the latch is executed).  That SCEV might be useful
>> ?as is? for things like IVOPTS, without specific work in those passes.
>> (Although perhaps not too useful, since most other IVs will be upcounting.)
>
> I think we'd need another API for SCEV there then,
> analyze_scalar_evolution_for_latch () so we can disregard the
> value on the exit edges then.  That means we'd still need to touch
> all users and decide whether it's safe to use that or not.
 
I'd expect the phi for the IV with the constant step to have the same
value as the phi for the IV with a MIN step.  I realise that the phi
isn't the thing that matters for niters, but I'd expect IVOPTS to
consider both the phi and the adjusted value to be candidates.  Only the
phi can be a candidate with the MIN step, but it feels like it should
still be a candidate, even with current interfaces.
 
You know this stuff much better than I do though, so I^m almost
certainly oversimplifying/overlooking things.
 
Like I say, I don't object to the vectoriser change, so please
don't go down a rabbit hole on my account. :)
 
Thanks,
Richard

Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Bootstrapped and Regression on X86 no surprise different.

Looking forward Kewen's test report for this patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-31 23:08
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V2] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
- insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ &incr_gsi, insert_after, &index_before_incr,
+ &index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
&preheader_seq, &header_seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- &iv_step);
+ &iv_step, &compare_step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

Re: Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Thanks kewen.
I have send V3 patch. Could you comment that ?
I want to make sure you do support that patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-06-01 12:32
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; rguenther; gcc-patches
Subject: Re: [PATCH V2] VECT: Change flow of decrement IV
Hi Juzhe,
 
on 2023/6/1 08:31, juzhe.zh...@rivai.ai wrote:
> Bootstrapped and Regression on X86 no surprise different.
> 
> Looking forward Kewen's test report for this patch.
> 
 
This patch can be bootstrapped and regress-tested on
powerpc64-linux-gnu P9 and powerpc64le-linux-gnu P9/P10.
 
Also SPEC2017 int/fp bmks build and run successfully
with it on powerpc64le-linux-gnu P10 (with an explicit
parameter --param=vect-partial-vector-usage=2).
 
It can fix the 510.parest_r -5% degradation, and it speed-ed up
525.x264_r +1%, 521.wrf_r +2.03%, 544.nab_r +1.27% and
549.fotonik3d_r +3.22%, but it degraded 503.bwaves_r -4%, we have
some heuristics on load and load pct. for 503.bwaves_r on Power,
I suspected it's related, by considering vect-partial-vector-usage=2
isn't default on Power and this can fix exposed failures and parest_r
degradation, I think the bwaves_r degradation should not block this.
For bwaves_r degradation, I'll have a further look later, open a PR
if it's an actual issue rather than just costing heuristics having
no effects.
 
btw, it would be better to add one PR marker line to associate
this with PR109971, something like:
 
PR tree-optimization/109971
 
Thanks!
 
BR,
Kewen
 
> Thanks.
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>
> *Date:* 2023-05-31 23:08
> *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther 
> <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong 
> <mailto:juzhe.zh...@rivai.ai>
> *Subject:* [PATCH V2] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B

Re: [PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

This patch is no difference from V2.
Just add PR tree-optimization/109971 as Kewen's suggested.

Already bootstrapped and Regression on X86 no difference.

Ok for trunk ?


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-01 12:36
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
  PR tree-optimization/109971
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
- insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ &incr_gsi, insert_after, &index_before_incr,
+ &index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
&preheader_seq, &header_seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- &iv_step);
+ &iv_step, &compare_step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

Re: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.

2023-06-01 Thread juzhe.zh...@rivai.ai

I plan to implement BF16 vector in GCC but still waiting for ISA ratified since 
GCC policy doesn't allow un-ratified ISA.

Currently, we are working on INT8,INT16,INT32,INT64,FP16,FP32,FP64 
auto-vectorizaiton.
It should very simple BF16 in current vector framework in GCC.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-06-01 14:57
To: juzhe.zh...@rivai.ai
Subject: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 
in riscv like x86_64 and arm.
FYI.
 
-Original Message-
From: Gcc-patches  On Behalf 
Of Jin Ma via Gcc-patches
Sent: Thursday, June 1, 2023 2:51 PM
To: gcc-patches@gcc.gnu.org
Cc: shi...@iscas.ac.cn; kito.ch...@gmail.com; Jin Ma 
Subject: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in 
riscv like x86_64 and arm.
 
hi, 
 
Are there any new developments about Zfb? Are there any plans to implement the 
Zvfbfmin and Zvfbfwma expansion? I see that Zfb is being reviewed in llvm, 
maybe we should do the same on gcc.
 
Ref: https://reviews.llvm.org/D151313
 https://reviews.llvm.org/D150929

Re: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode.

2023-06-01 Thread juzhe.zh...@rivai.ai

LGTM. 

We are waiting for FP16 vector to start floating-point auto-vectorizations

Thanks so much.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-01 15:17
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode.
From: Pan Li 
 
This patch would like to introduce the built-in type vfloat16m{f}*_t, as
well as their machine mode VNx*HF. They depend on architecture zvfhmin
or zvfh.
 
When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will
be true.
 
The underlying PATCH will implement the zvfhmin extension based on this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin
and zvfh.
* config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16.
(main): Disable FP16 tuple.
* config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro.
(TARGET_VECTOR_ELEN_FP_16): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add FP16.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16):
New macro.
* config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16
machine mode based on TARGET_VECTOR_ELEN_FP_16.
---
gcc/common/config/riscv/riscv-common.cc|  2 ++
gcc/config/riscv/genrvv-type-indexer.cc|  7 +--
gcc/config/riscv/riscv-opts.h  |  4 
gcc/config/riscv/riscv-vector-builtins.cc  |  2 ++
gcc/config/riscv/riscv-vector-builtins.def | 20 +++
gcc/config/riscv/riscv-vector-builtins.h   |  1 +
gcc/config/riscv/riscv-vector-switch.def   | 23 ++
7 files changed, 49 insertions(+), 10 deletions(-)
 
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index e6ed3df9ea6..3247d526c0a 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1248,6 +1248,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zve64x",   &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64},
   {"zve64f",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
   {"zve64d",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
+  {"zvfhmin",  &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
+  {"zvfh", &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
   {"zvl32b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL32B},
   {"zvl64b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL64B},
diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 18e1b375396..8fc93ceaab4 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -54,7 +54,7 @@ valid_type (unsigned sew, int lmul_log2, bool float_p)
 case 8:
   return lmul_log2 >= -3 && !float_p;
 case 16:
-  return lmul_log2 >= -2 && !float_p;
+  return lmul_log2 >= -2;
 case 32:
   return lmul_log2 >= -1;
 case 64:
@@ -73,6 +73,9 @@ valid_type (unsigned sew, int lmul_log2, unsigned nf, bool 
float_p)
   if (nf > 8 || nf < 1)
 return false;
+  if (sew == 16 && nf != 1 && float_p) // Disable FP16 tuple in temporarily.
+return false;
+
   switch (lmul_log2)
 {
 case 1:
@@ -342,7 +345,7 @@ main (int argc, const char **argv)
fprintf (fp, ")\n");
  }
   // Build for vfloat
-  for (unsigned sew : {32, 64})
+  for (unsigned sew : {16, 32, 64})
 for (int lmul_log2 : {-3, -2, -1, 0, 1, 2, 3})
   for (unsigned nf : {1, 2, 3, 4, 5, 6, 7, 8})
{
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 5f387d0e393..208a557b8ff 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -154,6 +154,8 @@ enum riscv_entity
#define MASK_VECTOR_ELEN_64(1 << 1)
#define MASK_VECTOR_ELEN_FP_32 (1 << 2)
#define MASK_VECTOR_ELEN_FP_64 (1 << 3)
+/* Align the bit index to riscv-vector-builtins.h.  */
+#define MASK_VECTOR_ELEN_FP_16 (1 << 6)
#define TARGET_VECTOR_ELEN_32 \
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_32) != 0)
@@ -163,6 +165,8 @@ enum riscv_entity
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_32) != 0)
#define TARGET_VECTOR_ELEN_FP_64 \
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_64) != 0)
+#define TARGET_VECTOR_ELEN_FP_16 \
+  ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_16) != 0)
#define MASK_ZVL32B(1 <<  0)
#define MASK_ZVL64B(1 <<  1)
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 9fea70709fd..43bf6d8f262 100644
--- a/gcc

Re: Re: [PATCH V3] VECT: Change flow of decrement IV

2023-06-01 Thread juzhe.zh...@rivai.ai

Thanks Kewen. Let's wait for Richard and Richi.



juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-06-01 13:24
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; rguenther; gcc-patches
Subject: Re: [PATCH V3] VECT: Change flow of decrement IV
Hi,
 
on 2023/6/1 13:00, juzhe.zh...@rivai.ai wrote:
> This patch is no difference from V2.
 
I support this patch based on the testing and SPEC2017 evaluation
results on Power (see my comments on patch v2).
 
> Just add PR tree-optimization/109971 as Kewen's suggested.
 
Thanks for adding that, I was expecting you will add that when you
are committing it, not really requesting one new version. :)  btw,
the PR marker(s) will trigger scripts to comment some commit info
(commit link, commit log) into the specified PR(s), people can
find some connections between PRs and (fixing or progressing forward)
commits easily.
 
BR,
Kewen
 
> 
> Already bootstrapped and Regression on X86 no difference.
> 
> Ok for trunk ?
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>
> *Date:* 2023-06-01 12:36
> *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther 
> <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong 
> <mailto:juzhe.zh...@rivai.ai>
> *Subject:* [PATCH V3] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
>   PR tree-optimization/109971
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>...
> -ivtmp_35 = ivtmp_9 - _36;
> +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>...
> -if (ivtmp_35 != 0)
> +if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>else
>  goto ; [16.67%]
> @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree step = rgc->controls.length () == 1 ? rgc->controls[0]
>: make_ssa_name (iv

Re: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe.zh...@rivai.ai

Hi， forget about this patch.
Just go directly the V2 patch with same title.

That's the last patch I fine tune for integer widening auto-vectorization.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-01 15:31
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
From: Juzhe-Zhong 
 
This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int16_t *__restrict dst4,
  int8_t *__restrict a, int8_t *__restrict b,
  int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] * (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
  dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}
 
In such complicate case, the operand is not single used, used by multiple 
statements.
GCC combine optimization will iterate the combination of the operands.
 
First round -> combine one of the operand and change vsext + vmul into vwmul.wv
Second round -> combine the other operand and change vwmul.wv into vwmul.vv
 
Notice when I add a pseudo vwmul.wv pattern, it makes vwmulsu.vv testcase fail
since GCC prefer such pattern order:
 
(mul: (zero_extend)
  (sign_exted))
 
So change vwmulsu.vv instruction operands order.
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Shift zero_extend and sign_extend order.
* config/riscv/autovec-opt.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
 
---
gcc/config/riscv/autovec-opt.md   | 56 +++
gcc/config/riscv/vector.md|  9 +--
.../riscv/rvv/autovec/widen/widen-7.c | 27 +
.../rvv/autovec/widen/widen-complicate-3.c| 32 +++
.../riscv/rvv/autovec/widen/widen_run-7.c | 34 +++
5 files changed, 154 insertions(+), 4 deletions(-)
create mode 100644 gcc/config/riscv/autovec-opt.md
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 000..5b7dc9bef8c
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,56 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul"
+  [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
+  (match_operand 5 "vector_length_operand"  "   rK,   rK")
+  (match_operand 6 "const_int_operand"  "i,i")
+  (match_operand 7 "const_int_operand"  "i,i")
+  (match_operand 8 "const_int_operand"  "i,i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (mult:VWEXTI
+ (any_extend:VWEXTI
+   (match_operand: 4 "register_operand" "   vr,   vr"))
+ (match_operand:VWEXTI 3 "register_operand" "   vr,   vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+  "TARGET_VECTOR"
+  &quo

Re: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe.zh...@rivai.ai

Oh. Yes. Thanks for catching this!
Will send V2 soon.



juzhe.zh...@rivai.ai
 
From: KuanLin Chen
Date: 2023-06-02 09:26
To: gcc-patches; juzhe.zhong
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && 
viota && vid
Hi Juzhe,
 
I think fault_load_def::get_name should remove "instance.pred ==
PRED_TYPE_mu", right?
 
 於 2023年6月2日 週五 上午7:05寫道：
>
> From: Juzhe-Zhong 
>
> Base on these:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
>
> Add _mu C++ overloaded intrinsics for load && viota && vid.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded 
> intrinsics.
>
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index a8113f6602b..498c6ba042e 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -164,7 +164,7 @@ public:
>{
>  if (STORE_P || LST_TYPE == LST_INDEXED)
>return true;
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> @@ -963,7 +963,7 @@ public:
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
>  return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
> -  || pred == PRED_TYPE_tumu;
> +  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
>}
>
>rtx expand (function_expander &e) const override
> @@ -979,7 +979,7 @@ public:
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
>  return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
> -  || pred == PRED_TYPE_tumu;
> +  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
>}
>
>rtx expand (function_expander &e) const override
> @@ -1749,7 +1749,7 @@ public:
>
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> @@ -1794,7 +1794,7 @@ public:
>
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> --
> 2.36.1
>

Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-02 Thread juzhe.zh...@rivai.ai

Hi, Robin.

>> I like the code examples in general but find them hard to read
>> at lengths > 5-10 or so.  Could we condense this a bit?
Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?


>> I'm a bit wary about getting the costs

>> right for combine patterns but we can deal with this later.

No, you don't need to worry about combining extensions and I don't think we 
need cost to adjust extensions combining.

For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since 
it will increase scalar register pressures.
So, for such combining, I would like take a another approach to combine this 
pattern carefully with accurate register pressure calculation.

However, for this patch.

vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better.
I don't think it is possible that using vwadd.vv will be worse. 

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:01
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
Hi Juzhe,
 
> ...
>vsetvli zero,t1,e8,m1,ta,ma
> vle8.v  v1,0(a4)
> vsetvli t3,zero,e16,m2,ta,ma
> vsext.vf2   v6,v1
> vsetvli zero,t1,e8,m1,ta,ma
> vle8.v  v1,0(a5)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a0,t4
> vzext.vf2   v4,v1
> vmul.vv v2,v4,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> vle8.v  v1,0(a6)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a1,t4
> vzext.vf2   v2,v1
> vmul.vv v4,v2,v4
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v4,0(t0)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a2,t4
> vmul.vv v2,v2,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> add t0,a3,t4
> vle8.v  v1,0(a7)
> vsetvli t3,zero,e16,m2,ta,ma
> sub t6,t6,t1
> vsext.vf2   v2,v1
> vmul.vv v2,v2,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>   vsetvli zero,t1,e8,mf2,ta,ma
> vle8.v  v1,0(a4)
> vle8.v  v3,0(a5)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a0,t3
> vwmulsu.vv  v2,v1,v3
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v2,0(t0)
> vle8.v  v2,0(a6)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a1,t3
> vwmulu.vv   v4,v3,v2
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v4,0(t0)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a2,t3
> vwmulsu.vv  v3,v1,v2
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v3,0(t0)
> add t0,a3,t3
> vle8.v  v3,0(a7)
> vsetvli t6,zero,e8,mf2,ta,ma
> sub t4,t4,t1
> vwmul.vvv2,v1,v3
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v2,0(t0)
> ...
 
I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?
 
> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.
 
In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.
 
Regards
Robin

Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-02 Thread juzhe.zh...@rivai.ai

Thanks. I am gonna wait for Jeff or Kito final approve.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:18
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
>>> I like the code examples in general but find them hard to read
>>> at lengths > 5-10 or so.  Could we condense this a bit?
> Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?
 
Sure, just condense a bit. No need for V2.
 
Regards
Robin

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1043 matches

Mail list logo