[PING][PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-07-11 Thread Michael Collison
Ping. Last patch here:

https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00735.html



Re: [PATCH 1/4] Clean up of new format of -falign-FOO.

2018-07-17 Thread Michael Collison
Hi Martin,

Your alignment patch breaks the arm port. In the file arm.c, function 
'get_label_padding' the code uses:

static HOST_WIDE_INT
get_label_padding (rtx label)
{
  HOST_WIDE_INT align, min_insn_size;

  align = 1 << label_to_alignment (label);
  min_insn_size = TARGET_THUMB ? 2 : 4;
  return align > min_insn_size ? align - min_insn_size : 0;
}

Which breaks with your current change. I think this needs to be modified to:

'align = 1 << label_to_alignment (label).levels[0].log'

Regards,

Michael Collison



Re: [PATCH v2] RISC-V: Add autovec FP binary operations.

2023-06-15 Thread Michael Collison

Robin,

Why do we need '-ffast-math' with the tests?

On 6/15/23 11:10, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
  - Add VF_AUTO iterator and use it.
  - Ensured we don't ICE with -march=rv64gcv_zfhmin.

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 +
  gcc/config/riscv/riscv-protos.h   |  5 +-
  gcc/config/riscv/riscv-v.cc   | 74 ++-
  gcc/config/riscv/riscv.cc | 27 +--
  gcc/config/riscv/vector-iterators.md  | 28 +++
  .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
  .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
  .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 ++
  .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
  .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
  .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
  .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
  .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
  .../riscv/rvv/autovec/bin

Re: [PATCH v2] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Michael Collison

Hi Robin,

Looks good to me except for note that this seems to depend on a new 
function: emit_vlmax_fp_insn which appears to be part of your autovec FP 
binary operation. So that patch would need to be merged first from what 
I can see.


On 6/15/23 11:12, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
   - Use VF_AUTO iterator.
   - Don't mention vfsqrt7.

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 ++-
  .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
  .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
  .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
  .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
  .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
  .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
  .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
  .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
  .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
  16 files changed, 241 insertions(+), 11 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
  })
  
  ;; ---

-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
  ;; 
---
  
  (define_expand "abs2"

@@ -532,6 +532,40 @@ (define_expand "abs2"
DONE;
  })
  
+;; ---

+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop:VF_AUTO
+ (match_operand:VF_AUTO 1

[PATCH] vect: Check that vector factor is a compile-time constant

2023-02-21 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an 
issue where vect_do_peeling assumes that the vectorization factor is a 
compile-time constant. The vectorization is not a compile-time constant 
on RISCV.


Tested on RISCV and x86_64-linux-gnu. Okay?

Michael

gcc/

    * tree-vect-loop-manip.cc (vect_do_peeling): Verify
    that vectorization factor is a compile-time constant.

---
 gcc/tree-vect-loop-manip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,

   niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
   /* It's guaranteed that vector loop bound before vectorization is at
  least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
 {
   value_range vr (type,
           wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Richard how would I check for a full masked main vector loop?

On 2/22/23 03:20, Richard Biener wrote:

On Wed, Feb 22, 2023 at 12:03 AM Michael Collison  wrote:

While working on autovectorizing for the RISCV port I encountered an
issue where vect_do_peeling assumes that the vectorization factor is a
compile-time constant. The vectorization is not a compile-time constant
on RISCV.

Tested on RISCV and x86_64-linux-gnu. Okay?

I wonder how you arrive at prologue peeling with a non-constant VF?
In any case it would probably be better to use constant_lower_bound (vf)
here?  Also it looks wrong to apply this limit in case we are using
a fully masked main vector loop.  But as said, the specific case of
non-constant VF and prologue peeling probably wasn't supposed to happen,
instead the prologue usually is applied via an offset to a fully masked loop?

Richard?

Thanks,
Richard.


Michael

gcc/

  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
  that vectorization factor is a compile-time constant.

---
   gcc/tree-vect-loop-manip.cc | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
niters, tree nitersm1,
 niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
 /* It's guaranteed that vector loop bound before vectorization is at
least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
   {
 value_range vr (type,
 wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Juzhe,

I disagree with this comment. There are many stakeholders for 
autovectorization and waiting until GCC 14 is not a viable solution for 
us as well as other stakeholders ready to begin work on autovectorization.


As we discussed I have been moving forward with patches for 
autovectorization and am preparing to send them to gcc-patches. This 
assert is preventing code from compiling and needs to be addressed.


If you have a solution in either the RISCV backend or in this file can 
you please present it?


On 2/22/23 10:27, juzhe.zh...@rivai.ai wrote:

>/gcc/ />//>/* tree-vect-loop-manip.cc (vect_do_peeling): Verify />/that vectorization factor is a compile-time constant. />//>/--- 
/>/gcc/tree-vect-loop-manip.cc | 2 +- />/1 file changed, 1 insertion(+), 1 deletion(-) />//>/diff --git a/gcc/tree-vect-loop-manip.cc 
b/gcc/tree-vect-loop-manip.cc />/index 6aa3d2ed0bf..1ad1961c788 100644 />/--- a/gcc/tree-vect-loop-manip.cc />/+++ b/gcc/tree-vect-loop-manip.cc 
/>/@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree />/niters, tree nitersm1, />/niters = vect_build_loop_niters (loop_vinfo, 
&new_var_p); />//* It's guaranteed that vector loop bound before vectorization is at />/least VF, so set range information for newly generated var. */ 
/>/- if (new_var_p) />/+ if (new_var_p && vf.is_constant ()) />/{ />/value_range vr (type, />/wi::to_wide (build_int_cst (type, vf)),/

I don't think we need to apply this limit in case of RVV 
auto-vectorization.
I have talked with Kito and I have a full solution of supporting RVV 
solution.


We are going to support RVV auto-vectorization in 3 configuration 
according to RVV ISA spec:
1. -march=zve32* support QI and HI auto-vectorization by VNx4QImode 
and VNx2HImode
2. -march=zve64* support QI and HI and SI auto-vectorization by 
VNx8QImode and VNx4HImode and VNx2SImode
3.-march=v* support QI and HI and SI and DI auto-vectorization by 
VNx16QImode and VNx8HImode and VNx4SImode and VNx2DImode


I will support them in GCC 14. Current loop vectorizer works well for 
us no need to fix it.

Thanks.

juzhe.zh...@rivai.ai


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Hi Jeff,

We do not have two independent implementations: my work is 100% based on 
the vector intrinsic foundation in upstream GCC. In fact I have only 
added two core patterns, vector add and subtract, that are based on the 
existing vector intrinsics implementation:


(define_expand "add3"
  [(match_operand:VI 0 "register_operand")
   (match_operand:VI 1 "register_operand")
   (match_operand:VI 2 "vector_arith_operand")]
  "TARGET_VECTOR"
{
  using namespace riscv_vector;

  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

  rtx vl = emit_vlmax_vsetvl (mode);
  rtx mask_policy = get_mask_policy_no_pred();
  rtx tail_policy = get_tail_policy_no_pred();
  rtx mask = CONSTM1_RTX(mode);
  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);

  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

            vl, tail_policy, mask_policy, vlmax_avl_p));

  DONE;
})

This pattern leverages the existing vector intrinsics framework. The 
bulk of the changes are the cost model, and target macros. The cost 
model is based on Juzhe's work.


The point I am making is the auto-vectorization work is no more 
experimental than the intrinsics work which is still being merged.


On 2/22/23 23:01, Jeff Law wrote:



On 2/22/23 10:54, Michael Collison wrote:

Juzhe,

I disagree with this comment. There are many stakeholders for 
autovectorization and waiting until GCC 14 is not a viable solution 
for us as well as other stakeholders ready to begin work on 
autovectorization.


As we discussed I have been moving forward with patches for 
autovectorization and am preparing to send them to gcc-patches. This 
assert is preventing code from compiling and needs to be addressed.


If you have a solution in either the RISCV backend or in this file 
can you please present it?
I don't necessarily think it means waiting for gcc-14, but it does 
mean waiting for gcc-13 to branch and gcc-14 development to open. I 
would object to anyone trying to push forward an autovec 
implementation into gcc-13.  We're well past that point IMHO, even if 
the changes only affected the RISC-V backend.


Given that it looks like we have two independent implementations we're 
almost certainly going to have to sit down with both, evaluate both 
from a quality of code viewpoint and benchmark them both and 
ultimately choose one implementation or the other, or maybe even some 
mixing and matching.


I would strongly suggest that both groups have implementations we can 
start evaluating from a design/implementation standpoint relatively 
soon.  Ideally both groups would actually have branches in the repo 
that are regularly updated with their current implementation.


While I have a great interest in seeing an autovec implementation move 
forward as soon as possible after gcc-14 development opens, I have no 
opinions at this point about either of the two existing implementations.


Jeff


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-03-01 Thread Michael Collison
Okay there seems to be consensus on using constant_lower_bound (vf), but 
I don't understand how that is a replacement for "vf.is_constant ()"? In 
one case we are checking if "vf" is a constant, on the other we are 
asking for the lower bound. For the crash in question 
"constant_lower_bound (vf) " returns the integer value of two.


On 2/27/23 09:51, Richard Sandiford wrote:

FWIW, this patch looks good to me.  I'd argue it's a regression fix
of kinds, in that the current code was correct before variable VF and
became incorrect after variable VF.  It might be possible to trigger
the problem on SVE too, with a sufficiently convoluted test case.
(Haven't tried though.)

Richard Biener  writes:

On Wed, Feb 22, 2023 at 12:03 AM Michael Collison  wrote:

While working on autovectorizing for the RISCV port I encountered an
issue where vect_do_peeling assumes that the vectorization factor is a
compile-time constant. The vectorization is not a compile-time constant
on RISCV.

Tested on RISCV and x86_64-linux-gnu. Okay?

I wonder how you arrive at prologue peeling with a non-constant VF?

Not sure about the RVV case, but I think it makes sense in principle.
E.g. if some ISA takes the LOAD_LEN rather than fully-predicated
approach, it can't easily use the first iteration of the vector loop
to do peeling for alignment.  (At least, the IV steps would then
no longer match VF for all iterations.)  I guess it could use a
*different* vector loop, but we don't support that yet.

There are also some corner cases for which we still don't support
predicated loops and instead fall back on an unpredicated VLA loop
followed by a scalar epilogue.  Peeling for alignment would then
require a scalar prologue too.


In any case it would probably be better to use constant_lower_bound (vf)
here?  Also it looks wrong to apply this limit in case we are using
a fully masked main vector loop.  But as said, the specific case of
non-constant VF and prologue peeling probably wasn't supposed to happen,
instead the prologue usually is applied via an offset to a fully masked loop?

Hmm, yeah, agree constant_lower_bound should work too.

Thanks,
Richard


Richard?

Thanks,
Richard.


Michael

gcc/

  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
  that vectorization factor is a compile-time constant.

---
   gcc/tree-vect-loop-manip.cc | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
niters, tree nitersm1,
 niters = vect_build_loop_niters (loop_vinfo, &new_var_p);
 /* It's guaranteed that vector loop bound before vectorization is at
least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
   {
 value_range vr (type,
 wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



[PATCH 00/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This series of patches adds foundational support for RISC-V 
autovectorization. These patches are based on the current upstream rvv 
vector intrinsic support and is not a new implementation. Most of the 
implementation consists of adding the new vector cost model, the 
autovectorization patterns themselves and target hooks.


This implementation only provides support for integer addition and 
subtraction as a proof of concept.


As discussed on this list, if these patches are approved they will be 
merged into a "auto-vectorization" branch once gcc-13 branches for release.


There are two known issues related to crashes (assert failures) 
associated with tree vectorization; one of which I have sent a patch for 
and have received feedback. I will be sending a patch for the second 
issue tomorrow.



 gcc/common/config/riscv/riscv-common.cc   |   2 +-
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/predicates.md    |  13 +
 gcc/config/riscv/riscv-cores.def  |  14 +-
 gcc/config/riscv/riscv-opts.h |  40 ++
 gcc/config/riscv/riscv-protos.h   |  15 +
 gcc/config/riscv/riscv-v.cc   | 178 -
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   2 +
 gcc/config/riscv/riscv-vector-cost.cc | 620 ++
 gcc/config/riscv/riscv-vector-cost.h  | 400 +++
 gcc/config/riscv/riscv.cc | 321 -
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/riscv.opt    |  20 +
 gcc/config/riscv/t-riscv  |   5 +
 gcc/config/riscv/vector-auto.md   | 172 +
 gcc/config/riscv/vector-iterators.md  |   2 +
 gcc/config/riscv/vector.md    |   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  24 +
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  24 +
 .../riscv/rvv/autovec/loop-sub-rv32.c |  24 +
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  24 +
 22 files changed, 1893 insertions(+), 18 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c



[PATCH 01/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison

This patch adds foundational support in the form of:

1. New predicates

2. New function prototypes

3. Exporting emit_vlmax_vsetvl to global scope

4. Add a new command line option -mriscv_vector_lmul

gcc/ChangeLog:

    * config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
    New external declaration.
    (riscv_vector_preferred_simd_mode): Ditto.
    (riscv_tuple_mode_p): Ditto.
    (riscv_vector_mask_mode_p): Ditto.
    (riscv_classify_nf): Ditto.
    (riscv_vlmul_regsize): Ditto.
    (riscv_vector_preferred_simd_mode): Ditto.
    (riscv_vector_get_mask_mode): Ditto.
    (emit_vlmax_vsetvl): Ditto.
    (get_mask_policy_no_pred): Ditto.
    (get_tail_policy_no_pred): Ditto.
    * config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
    (riscv_vector_lmul_enum): Ditto.
    (vlmul_field_enum): Ditto.
    * config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
    Remove static scope.
    * config/riscv/riscv.opt (riscv_vector_lmul):
    New option -mriscv_vector_lmul.
    * config/riscv/predicates.md (p_reg_or_const_csr_operand):
    New predicate.
    (vector_reg_or_const_dup_operand): Ditto.

---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 16 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 7bc7c0b4f4d..31517ae4606 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })

 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+    return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
    (match_operand 0 "const_csr_operand")))
@@ -287,6 +295,11 @@
   (ior (match_operand 0 "register_operand")
    (match_test "op == CONSTM1_RTX (GET_MODE (op))")))

+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+  && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
    (match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..2057a14e153 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL            /* global canary */
 };

+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1 */
+  VLMUL_FIELD_001, /* LMUL = 2 */
+  VLMUL_FIELD_010, /* LMUL = 4 */
+  VLMUL_FIELD_011, /* LMUL = 8 */
+  VLMUL_FIELD_100, /* RESERVED */
+  VLMUL_FIELD_101, /* LMUL = 1/8 */
+  VLMUL_FIELD_110, /* LMUL = 1/4 */
+  VLMUL_FIELD_111, /* LMUL = 1/2 */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR    (1 << 0)
 #define MASK_ZIFENCEI (1 << 1)

diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index 37c634eca1d..70c8dc4ce69 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -200,4 +200,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;

+/* Routines implemented in riscv-v.cc*/
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode, 
unsigned vf);

+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern int riscv_classify_nf (machine_mode);
+extern int riscv_vlmul_regsize(machine_mode);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode, 
unsigned vf);

+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx emit_vlmax_vsetvl (machine_mode vmode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 59c25c65cd5..58007cc16eb 100644
--- a/gcc/config/r

[PATCH 02/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds foundational support by making two functions that handle 
predication policies visibly globally.


gcc/ChangeLog:

    * config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
    Remove static declaration to to make externally visible.
    (get_mask_policy_for_pred): Ditto.
    * config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
    New external declaration.
    (get_mask_policy_for_pred): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc

index 2e92ece3b64..90fc73a5bcf 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -1850,7 +1850,7 @@ use_real_merge_p (enum predication_type_index pred)

 /* Get TAIL policy for predication. If predication indicates TU, 
return the TU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == 
PRED_TYPE_tumu)
@@ -1860,7 +1860,7 @@ get_tail_policy_for_pred (enum 
predication_type_index pred)


 /* Get MASK policy for predication. If predication indicates MU, 
return the MU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h

index ede08c6a480..135e2463b1e 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -433,6 +433,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);

 inline bool
 function_instance::operator!= (const function_instance &other) const
--
2.34.1



[PATCH 03/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patches adds two new files to support the vector cost model and 
modifies the Makefile fragment to build the cost model c++ file. Due to 
the large size this patch is provided as an attachment.


gcc/ChangeLog:

    * gcc/config.gcc (riscv-vector-cost.o): New object file to build.
    * config/riscv/riscv-vector-cost.cc: New file for riscv vector cost
    model
    * config/riscv/riscv-vector-cost.h: New header file for riscv vector
    cost model.
    * config/riscv/t-riscv: Add make rule for riscv-vector-cost.o.


From eb995818cd5f77f85e8df93b690b00ce1fd1aa35 Mon Sep 17 00:00:00 2001
From: Michael Collison 
Date: Thu, 2 Mar 2023 12:27:36 -0500
Subject: [PATCH] Autovectorization patch set 2

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-vector-cost.cc | 620 ++
 gcc/config/riscv/riscv-vector-cost.h  | 400 +
 gcc/config/riscv/t-riscv  |   5 +
 4 files changed, 1026 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c070e6ecd2e..a401187 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,7 @@ pru-*-*)
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
-	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
+	extra_objs="${extra_objs} riscv-vector-cost.o riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	d_target_objs="riscv-d.o"
 	extra_headers="riscv_vector.h"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc"
diff --git a/gcc/config/riscv/riscv-vector-cost.cc b/gcc/config/riscv/riscv-vector-cost.cc
new file mode 100644
index 000..5a33b20843a
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-cost.cc
@@ -0,0 +1,620 @@
+/* Cost model implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "rtlanal.h"
+#include "output.h"
+#include "alias.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "varasm.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "function.h"
+#include "explow.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "reload.h"
+#include "tm_p.h"
+#include "target.h"
+#include "basic-block.h"
+#include "expr.h"
+#include "optabs.h"
+#include "bitmap.h"
+#include "df.h"
+#include "diagnostic.h"
+#include "builtins.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "opts.h"
+#include "langhooks.h"
+#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "tree-vectorizer.h"
+#include "tree-ssa-loop-niter.h"
+#include "riscv-vector-builtins.h"
+
+/* This file should be included last.  */
+#include "riscv-vector-cost.h"
+#include "target-def.h"
+
+bool vector_insn_cost_table::get_cost(rtx x, machine_mode mode, int *cost,
+  bool speed) const {
+  rtx op0, op1, op2;
+  enum rtx_code code = GET_CODE(x);
+  scalar_int_mode int_mode;
+
+  /* By default, assume that everything has equivalent cost to the
+ cheapest instruction.  Any additional costs 

[PATCH 04/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds support for functions used in implementing various 
portions of autovectorization support.


gcc/ChangeLog:

    * config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
    New function.
    (riscv_vector_preferred_simd_mode): Ditto.
    (get_mask_policy_no_pred): Ditto.
    (get_tail_policy_no_pred): Ditto.
    (riscv_tuple_mode_p): Ditto.
    (riscv_classify_nf): Ditto.
    (riscv_vlmul_regsize): Ditto.
    (riscv_vector_mask_mode_p): Ditto.
    (riscv_vector_get_mask_mode): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 58007cc16eb..58f69e259c0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"

 using namespace riscv_vector;

@@ -108,6 +110,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
minval,

   && IN_RANGE (INTVAL (elt), minval, maxval));
 }

+/* Return the vlmul field for a specific machine mode. */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+    {
+    case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+    case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+    case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+    case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+    case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+    case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+    default:
+  break;
+    }
+
+  /* we don't care about VLMUL for Mask */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -162,6 +199,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type 
vlmul)

   return ratio;
 }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+    return word_mode;
+
+  switch (mode)
+    {
+    case E_QImode:
+  return vf == 1   ? VNx8QImode
+     : vf == 2 ? VNx16QImode
+     : vf == 4 ? VNx32QImode
+           : VNx64QImode;
+  break;
+    case E_HImode:
+  return vf == 1   ? VNx4HImode
+     : vf == 2 ? VNx8HImode
+     : vf == 4 ? VNx16HImode
+           : VNx32HImode;
+  break;
+    case E_SImode:
+  return vf == 1   ? VNx2SImode
+     : vf == 2 ? VNx4SImode
+     : vf == 4 ? VNx8SImode
+           : VNx16SImode;
+  break;
+    case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+    return vf == 1     ? VNx1DImode
+       : vf == 2 ? VNx2DImode
+       : vf == 4 ? VNx4DImode
+             : VNx8DImode;
+  break;
+    case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != 
MASK_VECTOR_ELEN_32

+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+    return vf == 1     ? VNx2SFmode
+       : vf == 2 ? VNx4SFmode
+       : vf == 4 ? VNx8SFmode
+             : VNx16SFmode;
+  break;
+    case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+    return vf == 1     ? VNx1DFmode
+       : vf == 2 ? VNx2DFmode
+       : vf == 4 ? VNx4DFmode
+             : VNx8DFmode;
+  break;
+    default:
+  break;
+    }
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -374,6 +469,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }

+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred(PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred(PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode. */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode. */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+    {
+
+    default:
+  break;
+    }
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode. */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+    return 1;
+  switch (riscv_classify_vlmul_field (mode))
+    {
+    case VLMUL_FIELD_001:
+  return 2;
+    case VLMUL_FIELD_010:
+  return 4;
+    case VLMUL_FIELD_011:
+  return 8;
+    case VLMUL_FIELD_100:
+  gcc_unreachable ();
+    default:
+  return 1;
+    }
+}
+
+/* Return true if it is a RVV mask mode. */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)

[PATCH 05/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds support for registering target hooks for basic 
autovectorization support as well as basic tuning information for the 
vector extension.


gcc/ChangeLog:

    * config/riscv/riscv-cores.def (RISCV_TUNE):
    Add VECTOR_TUNE_INFO parameter and
    * common/config/riscv/riscv-common.cc (RISCV_TUNE):
    Add VECTOR_TUNE_INFO parameter.
    * config/riscv/riscv.cc (riscv_vector_tune_param):
    New struct for vector tuning information.
    (riscv_tune_info): add vector_tune_param.
    (vector_tune_param): New static variable.
    (riscv_vectorization_factor): New variable.
    (generic_rvv_insn_scale_table): New struct.
    (generic_rvv_stmt_scale_table): New struct.
    (generic_rvv_insn_cost_table): New vector insn cost table.
    (generic_rvv_stmt_cost_table): New vector statement cost table.
    (generic_rvv_tune_info): New rvv tuning table.
    (RISCV_TUNE): Add VECTOR_TUNE_INFO parameter.
    (riscv_rtx_costs): Return vector estimate if vector mode.
    (riscv_option_override): Set vector_tune_param.
    (riscv_option_override): Set riscv_vectorization_factor.
    (riscv_estimated_poly_value): Implement
    TARGET_ESTIMATED_POLY_VALUE.
    (riscv_preferred_simd_mode): Implement
    TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
    (riscv_autovectorize_vector_modes): Implement
    TARGET_AUTOVECTORIZE_VECTOR_MODES.
    (riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
    (riscv_empty_mask_is_expensive): Implement
    TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
    (riscv_builtin_vectorization_cost): Implement
    TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.
    (riscv_vectorize_create_costs): Implement
    TARGET_VECTORIZE_CREATE_COSTS.
    (TARGET_ESTIMATED_POLY_VALUE): Register target macro.
    (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Ditto.
    (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
    (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
    (TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
    (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
    (TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
    (TARGET_VECTORIZE_CREATE_COSTS): Ditto

---
 gcc/common/config/riscv/riscv-common.cc |   2 +-
 gcc/config/riscv/riscv-cores.def    |  14 +-
 gcc/config/riscv/riscv.cc   | 321 +++-
 3 files changed, 325 insertions(+), 12 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index ebc1ed7d7e4..6b8d92af986 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -246,7 +246,7 @@ static const riscv_cpu_info riscv_cpu_tables[] =

 static const char *riscv_tunes[] =
 {
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, 
VECTOR_TUNE_INFO)    \

 TUNE_NAME,
 #include "../../../config/riscv/riscv-cores.def"
 NULL
diff --git a/gcc/config/riscv/riscv-cores.def 
b/gcc/config/riscv/riscv-cores.def

index 2a834cae21d..4feb0366222 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -30,15 +30,15 @@
    identifier, reference to riscv.cc.  */

 #ifndef RISCV_TUNE
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, VECTOR_TUNE_INFO)
 #endif

-RISCV_TUNE("rocket", generic, rocket_tune_info)
-RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
-RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
-RISCV_TUNE("size", generic, optimize_size_tune_info)
+RISCV_TUNE("rocket", generic, rocket_tune_info, generic_rvv_tune_info)
+RISCV_TUNE("sifive-3-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-5-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("thead-c906", generic, thead_c906_tune_info, 
generic_rvv_tune_info)

+RISCV_TUNE("size", generic, optimize_size_tune_info, generic_rvv_tune_info)

 #undef RISCV_TUNE

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f11b7949a49..16b38ba4d76 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,16 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
+#include "riscv-vector-cost.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -238,6 +248,12 @@ struct riscv_tune_param
   bool slow_unaligned_access;
 };

+/* Cost for vector insn classes.  */
+struct riscv_vector_tune_param {
+    const vector_insn_cost_table* rvv_insn_costs_table;

[PATCH 06/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds patterns that provide basic autovectorization support 
for integer adds and subtracts.


gcc/ChangeLog:

    * config/riscv/riscv.md (riscv_classify_vlmul_field):
    New external declaration.
    (riscv_vector_preferred_simd_mode): Include
    vector-iterators.md.
    * config/riscv/vector-auto.md: New file containing
    autovectorization patterns.
    * config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
    New unspecs for autovectorization patterns.
    * config/riscv/vector.md: Remove include of vector-iterators.md
    and include vector-auto.md.

---
 gcc/config/riscv/riscv.md    |   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 05924e9bbf1..c34124095f7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")

 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md 
b/gcc/config/riscv/vector-auto.md

new file mode 100644
index 000..e5a19663d18
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI 
Technologies Ltd.

+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; 
-

+;;  [INT] Addition
+;; 
-

+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; 
-

+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(ope

[PATCH 07/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison

This patch adds tests for autovectorization of integer add and subtract.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec: New directory
    for autovectorization tests.
    * gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
    test to verify code generation of vector add on rv32.
    * gcc.target/riscv/rvv/autovec/loop-add.c: New
    test to verify code generation of vector add on rv64.
    * gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
    test to verify code generation of vector subtract on rv32.
    * gcc.target/riscv/rvv/autovec/loop-sub.c: New
    test to verify code generation of vector subtract on rv64.

---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c

new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-time

[PATCH v2 00/07] RISC-V: autovec: Add auto-vectorization support

2023-03-05 Thread Michael Collison
This series of patches adds foundational support for RISC-V 
autovectorization support. These patches are based on the current 
upstream rvv vector intrinsic support and is not a new implementation. 
Most of the implementation consists of adding the new vector cost model, 
the autovectorization patterns themselves and target hooks.This 
implementation only provides support for integer addition and 
subtraction as a proof of concept. This patch set should not be 
construed to be feature complete. Based on conversations with the 
community these patches are intended to lay the groundwork for feature 
completion and collaboration within the RISC-V community.In version 1 of 
this patch submission I neglected to indicate that these patches are 
largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai) of RiVAI. More 
specifically the rvv-next branch 
at:https://github.com/riscv-collab/riscv-gcc.git 
is the foundation of this 
patch set. I want to publicly apologize to Juzhe and RiVIA for not 
attributing their work visibly and publicly.As discussed on this list, 
if these patches are approved they will be merged into a 
"auto-vectorization" branch once gcc-13 branches for release.There are 
two known issues related to crashes (assert failures) associated with 
tree vectorization; one of which I have sent a patch for and have 
received feedback.


Changes in v2

- Updated ChangeLog entry to include RiVAI contributions

- Fixed ChangeLog email formatting

- Fixed gnu formatting issues in the code





[PATCH v2 01/07] RISC-V: autovec: Add new predicates and function prototypes

2023-03-05 Thread Michael Collison

This patch adds foundational support in the form of:

1. New predicates

2. New function prototypes

3. Exporting emit_vlmax_vsetvl to global scope

4. Add a new command line option -mriscv_vector_lmu

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
            New external declaration.
            (riscv_vector_preferred_simd_mode): Ditto.
            (riscv_tuple_mode_p): Ditto.
            (riscv_vector_mask_mode_p): Ditto.
            (riscv_classify_nf): Ditto.
            (riscv_vlmul_regsize): Ditto.
            (riscv_vector_preferred_simd_mode): Ditto.
            (riscv_vector_get_mask_mode): Ditto.
            (emit_vlmax_vsetvl): Ditto.
            (get_mask_policy_no_pred): Ditto.
            (get_tail_policy_no_pred): Ditto.
            * config/riscv/riscv-opts.h (riscv_vector_bits_enum): 
New enum.

            (riscv_vector_lmul_enum): Ditto.
            (vlmul_field_enum): Ditto.
            * config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
            Remove static scope.
            * config/riscv/riscv.opt (riscv_vector_lmul):
            New option -mriscv_vector_lmul.
            * config/riscv/predicates.md (p_reg_or_const_csr_operand):
            New predicate.
            (vector_reg_or_const_dup_operand): Ditto.

---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 15 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..19aa5e12920 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })

 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+    return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
    (match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
    (match_test "rtx_equal_p (op, 
riscv_vector::gen_scalar_move_mask (GET_MODE (op)))")))


+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
    (match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..c6b6d84fce4 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL            /* global canary */
 };

+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR    (1 << 0)
 #define MASK_ZIFENCEI (1 << 1)

diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index 88a6bf5442f..6a486a1cd61 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -217,4 +217,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;

+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode,
+                          unsigned vf);
+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern 

[PATCH v2 02/07] RISC-V: autovec: Export policy functions to global scope

2023-03-05 Thread Michael Collison
This patch adds foundational support by making two functions that handle 
predication policies visibly globally.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-vector-builtins.cc 
(get_tail_policy_for_pred):

            Remove static declaration to to make externally visible.
            (get_mask_policy_for_pred): Ditto.
            * config/riscv/riscv-vector-builtins.h 
(get_tail_policy_for_pred):

            New external declaration.
            (get_mask_policy_for_pred): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc

index 2d57086262b..352ffd8867d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2448,7 +2448,7 @@ use_real_merge_p (enum predication_type_index pred)

 /* Get TAIL policy for predication. If predication indicates TU, 
return the TU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == 
PRED_TYPE_tumu)
@@ -2458,7 +2458,7 @@ get_tail_policy_for_pred (enum 
predication_type_index pred)


 /* Get MASK policy for predication. If predication indicates MU, 
return the MU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h

index 8464aa9b7e9..d62d2bdab54 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -456,6 +456,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);

 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
--
2.34.1




[PATCH v2 03/07] RISC-V: autovec: Add vector cost model

2023-03-05 Thread Michael Collison
This patches adds two new files to support the vector cost model and 
modifies the Makefile fragment to build the cost model c++ file. Due to 
the large size this patch is provided as an attachment.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * gcc/config.gcc (riscv-vector-cost.o): New object file 
to build.
            * config/riscv/riscv-vector-cost.cc: New file for riscv 
vector cost

            model
            * config/riscv/riscv-vector-cost.h: New header file for 
riscv vector

            cost model.
                * config/riscv/t-riscv: Add make rule for 
riscv-vector-cost.o.
From c606f674114a362ba0299caf160b23a98f37c898 Mon Sep 17 00:00:00 2001
From: Michael Collison 
Date: Sun, 5 Mar 2023 17:53:42 -0500
Subject: [PATCH] RISC-V: Add vector cost model

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-vector-cost.cc | 689 ++
 gcc/config/riscv/riscv-vector-cost.h  | 481 ++
 gcc/config/riscv/t-riscv  |   5 +
 4 files changed, 1176 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index da3a6d3ba1f..4a260572a3d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,7 @@ pru-*-*)
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
-	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
+	extra_objs="${extra_objs} riscv-vector-cost.o riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	d_target_objs="riscv-d.o"
 	extra_headers="riscv_vector.h"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc"
diff --git a/gcc/config/riscv/riscv-vector-cost.cc b/gcc/config/riscv/riscv-vector-cost.cc
new file mode 100644
index 000..4abd0e54da0
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-cost.cc
@@ -0,0 +1,689 @@
+/* Cost model implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "rtlanal.h"
+#include "output.h"
+#include "alias.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "varasm.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "function.h"
+#include "explow.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "reload.h"
+#include "tm_p.h"
+#include "target.h"
+#include "basic-block.h"
+#include "expr.h"
+#include "optabs.h"
+#include "bitmap.h"
+#include "df.h"
+#include "diagnostic.h"
+#include "builtins.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "opts.h"
+#include "langhooks.h"
+#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "tree-vectorizer.h"
+#include "tree-ssa-loop-niter.h"
+#include "riscv-vector-builtins.h"
+
+/* This file should be included last.  */
+#include "riscv-vector-cost.h"
+#include "target-def.h"
+
+bool
+vector_insn_cost_table::get_cost (rtx x, machine_mode mode, int *cost,
+  bool speed) const
+{
+  rtx op0, op1, op2;
+  enum rtx_code code = GET_CODE (x);
+  scalar_int_mode int_mode;
+
+  /* By default, a

[PATCH v2 04/07] RISC-V: autovec: Add auto-vectorization support functions

2023-03-05 Thread Michael Collison
This patch adds support for functions used in implementing various 
portions of autovectorization support.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
            New function.
            (riscv_vector_preferred_simd_mode): Ditto.
            (get_mask_policy_no_pred): Ditto.
            (get_tail_policy_no_pred): Ditto.
            (riscv_tuple_mode_p): Ditto.
            (riscv_classify_nf): Ditto.
            (riscv_vlmul_regsize): Ditto.
            (riscv_vector_mask_mode_p): Ditto.
            (riscv_vector_get_mask_mode): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2d2de6e4a6c..c9a0d6b4c06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -38,10 +38,12 @@
 #include "memmodel.h"
 #include "emit-rtl.h"
 #include "tm_p.h"
+#include "targhooks.h"
 #include "target.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"

 using namespace riscv_vector;
@@ -109,6 +111,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
minval,

   && IN_RANGE (INTVAL (elt), minval, maxval));
 }

+/* Return the vlmul field for a specific machine mode.  */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+    {
+    case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+    case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+    case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+    case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+    case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+    case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+    default:
+  break;
+    }
+
+  /* we don't care about VLMUL for Mask.  */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -163,6 +200,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type 
vlmul)

   return ratio;
 }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+    return word_mode;
+
+  switch (mode)
+    {
+    case E_QImode:
+  return vf == 1   ? VNx8QImode
+     : vf == 2 ? VNx16QImode
+     : vf == 4 ? VNx32QImode
+           : VNx64QImode;
+  break;
+    case E_HImode:
+  return vf == 1   ? VNx4HImode
+     : vf == 2 ? VNx8HImode
+     : vf == 4 ? VNx16HImode
+           : VNx32HImode;
+  break;
+    case E_SImode:
+  return vf == 1   ? VNx2SImode
+     : vf == 2 ? VNx4SImode
+     : vf == 4 ? VNx8SImode
+           : VNx16SImode;
+  break;
+    case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+    return vf == 1     ? VNx1DImode
+       : vf == 2 ? VNx2DImode
+       : vf == 4 ? VNx4DImode
+             : VNx8DImode;
+  break;
+    case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != 
MASK_VECTOR_ELEN_32

+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+    return vf == 1     ? VNx2SFmode
+       : vf == 2 ? VNx4SFmode
+       : vf == 4 ? VNx8SFmode
+             : VNx16SFmode;
+  break;
+    case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+    return vf == 1     ? VNx1DFmode
+       : vf == 2 ? VNx2DFmode
+       : vf == 4 ? VNx4DFmode
+             : VNx8DFmode;
+  break;
+    default:
+  break;
+    }
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -375,6 +470,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }

+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode.  */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode.  */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+    {
+
+    default:
+  break;
+    }
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode.  */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) 

[PATCH v2 05/07] RISC-V: autovec: Add tuning and target vectorization hooks

2023-03-05 Thread Michael Collison
This patch adds support for registering target hooks for basic 
autovectorization support as well as basic tuning information for the 
vector extension.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-cores.def (RISCV_TUNE):
            Add VECTOR_TUNE_INFO parameter and
            * common/config/riscv/riscv-common.cc (RISCV_TUNE):
            Add VECTOR_TUNE_INFO parameter.
            * config/riscv/riscv.cc (riscv_vector_tune_param):
            New struct for vector tuning information.
            (riscv_tune_info): add vector_tune_param.
            (vector_tune_param): New static variable.
            (riscv_vectorization_factor): New variable.
            (generic_rvv_insn_scale_table): New struct.
            (generic_rvv_stmt_scale_table): New struct.
            (generic_rvv_insn_cost_table): New vector insn cost table.
            (generic_rvv_stmt_cost_table): New vector statement 
cost table.

            (generic_rvv_tune_info): New rvv tuning table.
            (RISCV_TUNE): Add VECTOR_TUNE_INFO parameter.
            (riscv_rtx_costs): Return vector estimate if vector mode.
            (riscv_option_override): Set vector_tune_param.
            (riscv_option_override): Set riscv_vectorization_factor.
            (riscv_estimated_poly_value): Implement
            TARGET_ESTIMATED_POLY_VALUE.
            (riscv_preferred_simd_mode): Implement
            TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
        (riscv_autovectorize_vector_modes): Implement
        TARGET_AUTOVECTORIZE_VECTOR_MODES.
        (riscv_get_mask_mode): Implement 
TARGET_VECTORIZE_GET_MASK_MODE.

        (riscv_empty_mask_is_expensive): Implement
        TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
        (riscv_builtin_vectorization_cost): Implement
        TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.
        (riscv_vectorize_create_costs): Implement
        TARGET_VECTORIZE_CREATE_COSTS.
        (TARGET_ESTIMATED_POLY_VALUE): Register target macro.
        (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Ditto.
           (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
        (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
        (TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
        (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
        (TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
        (TARGET_VECTORIZE_CREATE_COSTS): Ditto

---
 gcc/common/config/riscv/riscv-common.cc |   2 +-
 gcc/config/riscv/riscv-cores.def    |  14 +-
 gcc/config/riscv/riscv.cc   | 324 +++-
 3 files changed, 328 insertions(+), 12 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index ebc1ed7d7e4..6b8d92af986 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -246,7 +246,7 @@ static const riscv_cpu_info riscv_cpu_tables[] =

 static const char *riscv_tunes[] =
 {
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, 
VECTOR_TUNE_INFO)    \

 TUNE_NAME,
 #include "../../../config/riscv/riscv-cores.def"
 NULL
diff --git a/gcc/config/riscv/riscv-cores.def 
b/gcc/config/riscv/riscv-cores.def

index 2a834cae21d..4feb0366222 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -30,15 +30,15 @@
    identifier, reference to riscv.cc.  */

 #ifndef RISCV_TUNE
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, VECTOR_TUNE_INFO)
 #endif

-RISCV_TUNE("rocket", generic, rocket_tune_info)
-RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
-RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
-RISCV_TUNE("size", generic, optimize_size_tune_info)
+RISCV_TUNE("rocket", generic, rocket_tune_info, generic_rvv_tune_info)
+RISCV_TUNE("sifive-3-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-5-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("thead-c906", generic, thead_c906_tune_info, 
generic_rvv_tune_info)

+RISCV_TUNE("size", generic, optimize_size_tune_info, generic_rvv_tune_info)

 #undef RISCV_TUNE

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index befb9b498b7..44659062070 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,16 @@ along with GCC; see the file COPYING3.  If not see
 

[PATCH V2 06/07] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-05 Thread Michael Collison
This patch adds patterns that provide basic autovectorization support 
for integer adds and subtracts.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

                * config/riscv/riscv.md 
(riscv_vector_preferred_simd_mode): Include

                vector-iterators.md.
                * config/riscv/vector-auto.md: New file containing
                autovectorization patterns.
                * config/riscv/vector-iterators.md 
(UNSPEC_VADD/UNSPEC_VSUB):

                New unspecs for autovectorization patterns.
                * config/riscv/vector.md: Remove include of 
vector-iterators.md

                and include vector-auto.md.

---
 gcc/config/riscv/riscv.md    |   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6c3176042fb..a504ace72e5 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")

 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md 
b/gcc/config/riscv/vector-auto.md

new file mode 100644
index 000..e5a19663d18
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI 
Technologies Ltd.

+;; Contributed by Michael Collison (colli...@rivosinc.com), Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; 
-

+;;  [INT] Addition
+;; 
-

+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; 
-

+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred(

[PATCH v2 07/07] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-05 Thread Michael Collison

This patch adds tests for autovectorization of integer add and subtract.

gcc/testsuite/ChangeLog:

2023-03-02  Michael Collison 
                Vineet Gupta 

                * gcc.target/riscv/rvv/autovec: New directory
            for autovectorization tests.
            * gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
            test to verify code generation of vector add on rv32.
            * gcc.target/riscv/rvv/autovec/loop-add.c: New
            test to verify code generation of vector add on rv64.
            * gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
            test to verify code generation of vector subtract on rv32.
            * gcc.target/riscv/rvv/autovec/loop-sub.c: New
            test to verify code generation of vector subtract on rv64.

---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c

new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec curren

Re: [PATCH v2 00/07] RISC-V: autovec: Add auto-vectorization support

2023-03-05 Thread Michael Collison
Thanks for the feedback, will try that next time.

Michael Collison


> On Mar 5, 2023, at 11:06 PM, Xi Ruoyao  wrote:
> 
> On Sun, 2023-03-05 at 22:13 -0500, Michael Collison wrote:
> 
> /* snip */
> 
>> - Fixed ChangeLog email formatting
> 
> Unfortunately it's not fixed.  We expect one tab, but now you have 16
> whitespaces.
> 
> To me it looks like your email client is being too smart and destroying
> the patch .  Try "git send-email" which is much easier to be correctly
> configured.
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


[PATCH v3 0/6] RISC-V: autovec: Add auto-vectorization support

2023-03-07 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Michael Collison (6):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for add & sub
  RISC-V:autovec: Add autovectorization tests for add & sub

 gcc/config/riscv/predicates.md|  13 ++
 gcc/config/riscv/riscv-opts.h |  40 
 gcc/config/riscv/riscv-protos.h   |  15 ++
 gcc/config/riscv/riscv-v.cc   | 178 +-
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   2 +
 gcc/config/riscv/riscv.cc | 156 +++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/riscv.opt|  20 ++
 gcc/config/riscv/vector-auto.md   | 172 +
 gcc/config/riscv/vector-iterators.md  |   2 +
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c |  24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  24 +++
 16 files changed, 698 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

-- 
2.34.1



[PATCH v3 1/6] RISC-V: autovec: Add new predicates and function prototypes

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
New external declaration.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv.opt (riscv_vector_lmul):
New option -mriscv_vector_lmul.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 15 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..19aa5e12920 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })
 
 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
(match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..c6b6d84fce4 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 88a6bf5442f..6a486a1cd61 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -217,4 +217,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
 
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode,
+ unsigned vf);
+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern int riscv_classify_nf (machine_mode);
+extern int riscv_vlmul_regsize (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx emit_vlmax_vsetvl (machine_mode vmode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/con

[PATCH v3 3/6] RISC-V: autovec: Add auto-vectorization support functions

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
New function.
(riscv_vector_preferred_simd_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2d2de6e4a6c..d21bde1bda6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -109,6 +111,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Return the vlmul field for a specific machine mode.  */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+{
+case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+default:
+  break;
+}
+
+  /* we don't care about VLMUL for Mask.  */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -163,6 +200,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+return word_mode;
+
+  switch (mode)
+{
+case E_QImode:
+  return vf == 1   ? VNx8QImode
+: vf == 2 ? VNx16QImode
+: vf == 4 ? VNx32QImode
+  : VNx64QImode;
+  break;
+case E_HImode:
+  return vf == 1   ? VNx4HImode
+: vf == 2 ? VNx8HImode
+: vf == 4 ? VNx16HImode
+  : VNx32HImode;
+  break;
+case E_SImode:
+  return vf == 1   ? VNx2SImode
+: vf == 2 ? VNx4SImode
+: vf == 4 ? VNx8SImode
+  : VNx16SImode;
+  break;
+case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+   return vf == 1   ? VNx1DImode
+  : vf == 2 ? VNx2DImode
+  : vf == 4 ? VNx4DImode
+: VNx8DImode;
+  break;
+case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+   return vf == 1   ? VNx2SFmode
+  : vf == 2 ? VNx4SFmode
+  : vf == 4 ? VNx8SFmode
+: VNx16SFmode;
+  break;
+case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+   return vf == 1   ? VNx1DFmode
+  : vf == 2 ? VNx2DFmode
+  : vf == 4 ? VNx4DFmode
+: VNx8DFmode;
+  break;
+default:
+  break;
+}
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -375,6 +470,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode.  */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode.  */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+{
+
+default:
+  break;
+}
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode.  */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+return 1;
+  switch (riscv_classify_vlmul_field (mode))
+{
+case VLMUL_FIELD_001:
+  return 2;
+case VLMUL_FIELD

[PATCH v3 2/6] RISC-V: autovec: Export policy functions to global scope

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 2d57086262b..352ffd8867d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2448,7 +2448,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2458,7 +2458,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8464aa9b7e9..d62d2bdab54 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -456,6 +456,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
 
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
-- 
2.34.1



[PATCH v3 4/6] RISC-V: autovec: Add target vectorization hooks

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc (riscv_option_override):
Set riscv_vectorization_factor.
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
---
 gcc/config/riscv/riscv.cc | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index befb9b498b7..1ca9f3c7ae4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
 /* The number of bytes in a vector chunk.  */
 unsigned riscv_bytes_per_vector_chunk;
 
+/* Prefer vf for auto-vectorizer.  */
+unsigned riscv_vectorization_factor;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
@@ -6199,6 +6211,10 @@ riscv_option_override (void)
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
+
+  if (TARGET_VECTOR)
+riscv_vectorization_factor = riscv_vector_lmul;
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -6893,6 +6909,128 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 2048 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  machine_mode vmode =
+riscv_vector::riscv_vector_preferred_simd_mode (mode,
+   riscv_vectorization_factor);
+  if (VECTOR_MODE_P (

[PATCH v3 5/6] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
New unspecs for autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md|   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6c3176042fb..a504ace72e5 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..5227a73d96d
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; -
+;;  [INT] Addition
+;; -
+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; -
+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+
+;; 

[PATCH v3 6/6] RISC-V: autovec: Add autovectorization tests for add & sub

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v2] vect: Check that vector factor is a compile-time constant

2023-03-08 Thread Michael Collison
2023-03-05  Michael Collison  

* tree-vect-loop-manip.cc (vect_do_peeling): Use
result of constant_lower_bound instead of vf in case
vf is not a compile time constant.
---
 gcc/tree-vect-loop-manip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d88edafa018..f60fa50e8f4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2921,7 +2921,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (new_var_p)
{
  value_range vr (type,
- wi::to_wide (build_int_cst (type, vf)),
+ wi::to_wide (build_int_cst (type, lowest_vf)),
  wi::to_wide (TYPE_MAX_VALUE (type)));
  set_range_info (niters, vr);
}
-- 
2.34.1



[PATCH] vect: Verify that GET_MODE_NUNITS is power-of-2

2023-03-10 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
power of two. The RISC-V target has vector modes (e.g. VNx1DImode) that
are not a power of two.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* poly-int.h (exact_div_p): New function to
verify that argument is a power of 2 poly_int.
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a power of 2.
---
 gcc/poly-int.h   | 17 +
 gcc/tree-vect-slp.cc |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index 12571455081..d09632f341f 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -2219,6 +2219,23 @@ multiple_p (const poly_int_pod &a, const 
poly_int_pod &b,
   return constant_multiple_p (a, b, multiple);
 }
 
+/* Return true, if A is known to be a multiple of B.  */
+
+template
+inline bool
+exact_div_p (const poly_int_pod &a, Cb b)
+{
+  typedef POLY_CONST_COEFF (Ca, Cb) C;
+  poly_int r;
+  for (unsigned int i = 0; i < N; i++)
+{
+  if ((a.coeffs[i] % b) != 0)
+   return false;
+
+}
+  return true;
+}
+
 /* Return A / B, given that A is known to be a multiple of B.  */
 
 template
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9a4e000925e..6be2036a13a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -426,7 +426,8 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && exact_div_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 2))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
-- 
2.34.1



[PATCH] vect: Verify that GET_MODE_NUNITS is greater than one.

2023-03-14 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is greater than one.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9a4e000925e..add58113fa8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -426,7 +426,8 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && known_gt (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 1))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
-- 
2.34.1



Re: [PATCH, gcc7, aarch64] Add arithmetic overflow patterns

2016-01-28 Thread Michael Collison

Hi Richard,

Note that this patch appears to depend on your previous patch:

https://gcc.gnu.org/ml/gcc-patches/2016-01/txtDPaXOBMuOB.txt

for the definition of define_mode_attr DWI. I was looking at this patch 
as I was working on Bugzilla 68543 which this will address.


Regards,

Michael Collison



[PING][ARM] Re: Use vector wide add for mixed-mode adds

2016-02-03 Thread Michael Collison

Second Ping. Most recent patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01682.html

Regards,

Michael Collison

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH, gcc7, aarch64] Add arithmetic overflow patterns

2016-02-06 Thread Michael Collison

Richard,

One other question on the patch. I note that when you expand the addv 
and uaddv patterns you emit rtl using gen_add3_compareV and 
en_add3_compareC respectively. These patterns use sign_extend and 
zero_extend respectively. Why do you not do the same thing for the subv 
and usubv patterns? The subv patterns expand into calls to 
gen_sub3_compare1 which does not emit sign or zero extends. Why 
the difference?


On 01/28/2016 10:53 AM, Richard Henderson wrote:

On 01/28/2016 01:50 AM, Michael Collison wrote:

Hi Richard,

Note that this patch appears to depend on your previous patch:

https://gcc.gnu.org/ml/gcc-patches/2016-01/txtDPaXOBMuOB.txt

for the definition of define_mode_attr DWI. I was looking at this patch as I
was working on Bugzilla 68543 which this will address.

Yes, it did.  I've now committed a slightly modified patch for pr69305, which
includes some name changes that were present in this patch.  There may be minor
patch conflicts but nothing major.



r~



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [ARM] Use vector wide add for mixed-mode adds

2016-02-14 Thread Michael Collison

Hi Kyrill,

I made the following changes based on your comments:

1. I rebased the patch so that it applies cleanly on trunk
2. Fixed the dg-add-options as requested to my new test cases
3. Fixed the GNU style issues identified by ./contrib/check_GNU_style.sh

The failure you are seeing on slp-reduc-3.c is a known failure. The test 
case has a xfail with 'xfail { vect_widen_sum_hi_to_si_pattern' which I 
added in my patch. Richard Biener resolved some of these issues with PR 
68333, but 'slp-reduc-3.c' still fails. I will create a new PR.


I retested on the Linaro testing infrastructure with the latest trunk 
and the only failure is 'slp-reduc-3.c'. Okay for GCC 7?


2016-02-12 Michael Collison 

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

On 02/09/2016 09:27 AM, Kyrill Tkachov wrote:

Hi Michael,

On 17/12/15 00:02, Michael Collison wrote:

Kyrill,

I have attached a patch that address your comments. The only change I 
would ask you to re-consider renaming is the function 'bool 
aarch32_simd_check_vect_par_cnst_half'. This function was copied from 
the aarch64 port and I thought it as important to match the naming 
for maintenance purposes. I did rename the function to 'bool 
arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm' 
and added '_p' per you suggestions. Is this okay?




Ok, that's fine with me.


I implemented all your other change suggestions.



Thanks, sorry it took a long time to get back to this, I was busy with 
regression-fixing patches as we're

in bug-fixing mode...


2015-12-16 Michael Collison 

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): 
New

define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): 
New

define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): 
New

define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): 
New

define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.



I've tried this out and I have a few comments.
The arm.c hunk doesn't apply to current trunk anymore due to context.
Can you please rebase the patch?
I've fixed it up manually in my tree so I can build it.
With this patch I'm seeing two PASS->FAIL on arm-none-eabi:
FAIL: gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect &qu

[ARM] Add support for overflow add, sub, and neg operations

2016-02-24 Thread Michael Collison
This patch adds support for builtin overflow of add, subtract and 
negate. This patch is targeted for gcc 7 stage 1. It was tested with no 
regressions in arm and thumb modes on the following targets:


arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi

2016-02-24  Michael Collison  

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New patterns
to support builtin overflow negate operations.


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 1819553..69231f2 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -59,6 +59,7 @@ CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 CC_MODE (CC_C);
 CC_MODE (CC_N);
+CC_MODE (CC_V);
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8a2745..e0fbb6f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22854,6 +22854,8 @@ maybe_get_arm_condition_code (rtx comparison)
 	{
 	case LTU: return ARM_CS;
 	case GEU: return ARM_CC;
+	case NE: return ARM_CS;
+	case EQ: return ARM_CC;
 	default: return ARM_NV;
 	}
 
@@ -22879,6 +22881,15 @@ maybe_get_arm_condition_code (rtx comparison)
 	default: return ARM_NV;
 	}
 
+case CC_Vmode:
+  switch (comp_code)
+	{
+	case NE: return ARM_VS;
+	case EQ: return ARM_VC;
+	default: return ARM_NV;
+
+	}
+
 case CCmode:
   switch (comp_code)
 	{
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 64873a2..705fe0b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -539,6 +539,42 @@
(set_attr "type" "multiple")]
 )
 
+(define_expand "addv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareV (operands[0], operands[1], operands[2]));
+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Vmode, CC_REGNUM), const0_rtx);
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+			gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+			pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+
+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], operands[2]));
+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), const0_rtx);
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+			gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+			pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+
 (define_expand "addsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(plus:SI (match_operand:SI 1 "s_register_operand" "")
@@ -616,6 +652,163 @@
  ]
 )
 
+(define_insn_and_split "adddi3_compareV"
+  [(set (reg:CC_V CC_REGNUM)
+	(ne:CC_V
+	  (plus:TI
+	(sign_extend:TI (match_operand:DI 1 "register_operand" "r"))
+	(sign_extend:TI (match_operand:DI 2 "register_operand" "r")))
+	  (sign_extend:TI (plus:DI (match_dup 1) (match_dup 2)
+   (set (match_operand:DI 0 "register_operand" "=r")
+	(plus:DI (match_dup 1) (match_dup 2)))]
+  "TARGET_ARM"
+  "#"
+  "TARGET_ARM && reload_completed"
+  [(parallel [(set (reg:CC_C CC_REGNUM)
+		   (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
+ (match_dup 1)))
+	  (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
+   (parallel [(set (reg:CC_V CC_REGNUM)
+		   (ne:CC_V
+		(plus:DI (plus:DI
+			  (sign_extend:DI (match_dup 4))
+			  (sign_extend:DI (match_dup 5)))
+			 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		(plus:DI (sign_extend:DI
+			  (plus:SI (match_dup 4) (match_dup 5)))
+			 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
+	 (set (match_dup 3) (plus:SI (plus:SI
+	  (match_dup 4) (matc

Re: [ARM] Add support for overflow add, sub, and neg operations

2016-02-26 Thread Michael Collison



On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:
This patch adds support for builtin overflow of add, subtract and 
negate. This patch is targeted for gcc 7 stage 1. It was tested with 
no regressions in arm and thumb modes on the following targets:


arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New 
patterns

to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these 
operations, and how

they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, &res);
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr

Also, we'd need tests for each of these overflow operations, since 
these are pretty complex

patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic 
functions are definitely being exercised.


Also, you may want to consider splitting this into a patch series, 
each adding a single
overflow operation, together with its tests. That way it will be 
easier to keep track of
which pattern applies to which use case and they can go in 
independently of each other.


Let me know if you still fell the same way given the existing test cases.



+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], 
operands[2]));

+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), 
const0_rtx);

+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+

I notice this and many other patterns in this patch are guarded on 
TARGET_ARM. Is there any reason why they

should be restricted to arm state and not be TARGET_32BIT ?
I thought about this as well. I will test will TARGET_32BIT and get back 
to you.



Thanks,
Kyrill


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



[PATCH][ARM] PR target/70008

2016-02-28 Thread Michael Collison
This patches address PR 70008, where a reverse subtract with carry 
instruction can be generated in thumb2 mode. It was tested with no 
regressions in arm and thumb modes on the following targets:


arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison  

PR target/70008
* config/arm/arm.md (*subsi3_carryin): Only match pattern if
TARGET_ARM due to 'rsc' instruction alternative.
* config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e67239d..a008207 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -870,7 +870,7 @@
 (minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I")
 (match_operand:SI 2 "s_register_operand" "r,r"))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
-  "TARGET_32BIT"
+  "TARGET_ARM"
   "@
sbc%?\\t%0, %1, %2
rsc%?\\t%0, %2, %1"
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 9925365..79305c5 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -848,6 +848,20 @@
(set_attr "type" "multiple")]
 )
 
+(define_insn "*thumb2_subsi3_carryin"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+(minus:SI (minus:SI (match_operand:SI 1 "s_register_operand" "r")
+(match_operand:SI 2 "s_register_operand" "r"))
+  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  "TARGET_THUMB2"
+  "@
+   sbc%?\\t%0, %1, %2"
+  [(set_attr "conds" "use")
+   (set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "type" "adc_reg")]
+)
+
 (define_insn "*thumb2_cond_sub"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
 (minus:SI (match_operand:SI 1 "s_register_operand" "0,?Ts")
-- 
1.9.1



Re: [PATCH][ARM] PR target/70008

2016-02-29 Thread Michael Collison



On 2/29/2016 4:06 AM, Kyrill Tkachov wrote:

Hi Michael,

On 29/02/16 04:47, Michael Collison wrote:
This patches address PR 70008, where a reverse subtract with carry 
instruction can be generated in thumb2 mode. It was tested with no 
regressions in arm and thumb modes on the following targets:


arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison 

PR target/70008
* config/arm/arm.md (*subsi3_carryin): Only match pattern if
TARGET_ARM due to 'rsc' instruction alternative.
* config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.




The *subsi3_carrying pattern has the arch attribute:
   (set_attr "arch" "*,a")

That means that the second alternative that generates the RSC 
instruction is only enabled
for ARM mode. Do you have a testcase where this doesn't happen and 
this pattern generates

the second alternative for Thumb2?


No I don't have a test case; i noticed the pattern when working on the 
overflow project. I did not realize
that an attribute could affect the matching of an alternative. I will 
close the bug.





Thanks,
Kyrill




Re: [ARM] Add support for overflow add, sub, and neg operations

2016-02-29 Thread Michael Collison



On 2/29/2016 4:13 AM, Kyrill Tkachov wrote:


On 26/02/16 10:32, Michael Collison wrote:



On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:
This patch adds support for builtin overflow of add, subtract and 
negate. This patch is targeted for gcc 7 stage 1. It was tested 
with no regressions in arm and thumb modes on the following targets:


arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New 
patterns

to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these 
operations, and how

they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, &res);
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr



Thanks! That looks much better.

Also, we'd need tests for each of these overflow operations, since 
these are pretty complex

patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic 
functions are definitely being exercised.


Great, that gives me more confidence on the correctness aspects but...


Not so fast. I went back and changed the TARGET_ARM conditions to 
TARGET_32BIT. When I did this some of the
test cases fail in thumb2 mode. I was a little surprised by this result 
since I generate the same rtl in both modes in almost

all cases. I am investigating.




Also, you may want to consider splitting this into a patch series, 
each adding a single
overflow operation, together with its tests. That way it will be 
easier to keep track of
which pattern applies to which use case and they can go in 
independently of each other.


Let me know if you still fell the same way given the existing test 
cases.




... I'd like us to still have scan-assembler tests. The torture tests 
exercise the correctness,
but we'd want tests to catch regressions where we stop generating the 
new patterns due to other

optimisation changes, which would lead to code quality regressions.
So I'd like us to have scan-assembler tests for these sequences to 
make sure we generate the right

instructions.

I will definitely write some scan-assembler tests. Thanks for the feedback.



Thanks,
Kyrill



+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], 
operands[2]));

+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), 
const0_rtx);

+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+

I notice this and many other patterns in this patch are guarded on 
TARGET_ARM. Is there any reason why they

should be restricted to arm state and not be TARGET_32BIT ?
I thought about this as well. I will test will TARGET_32BIT and get 
back to you.



Thanks,
Kyrill








[PATCH][ARM] PR target/70014

2016-03-01 Thread Michael Collison
This patches addresses PR 70014, where the predicates and operand do not 
match and could cause problems with the register allocator. Tested 
successfully on


arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-03-01  Michael Collison  

PR target/70014
* config/arm/arm.md (*subsi3_carryin_const): Change predicate
for operand 1 to s_register_operand. Change predicate for operand
2 to arm_not_immediate_operand.

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e67239d..47171b9 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -883,8 +883,8 @@
 
 (define_insn "*subsi3_carryin_const"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-(minus:SI (plus:SI (match_operand:SI 1 "reg_or_int_operand" "r")
-   (match_operand:SI 2 "arm_not_operand" "K"))
+(minus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "r")
+   (match_operand:SI 2 "arm_not_immediate_operand" "K"))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
   "TARGET_32BIT"
   "sbc\\t%0, %1, #%B2"
-- 
1.9.1



Re: [PATCH][ARM] PR target/70008

2016-03-01 Thread Michael Collison

Hi Richard,

I think we could incorporate your feedback by changing the predicate on 
operand 1 to "arm_rhs_operand" which allows "s_register_operand" or 
"arm_immediate_operand". Everything else in my patch would stay the same 
including splitting the thumb2 pattern out into it's own insn. I'm 
testing this change now. Let me know if this direction is okay with you.


On 02/29/2016 08:29 AM, Richard Earnshaw (lists) wrote:

On 29/02/16 11:21, Michael Collison wrote:


On 2/29/2016 4:06 AM, Kyrill Tkachov wrote:

Hi Michael,

On 29/02/16 04:47, Michael Collison wrote:

This patches address PR 70008, where a reverse subtract with carry
instruction can be generated in thumb2 mode. It was tested with no
regressions in arm and thumb modes on the following targets:

arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison 

 PR target/70008
 * config/arm/arm.md (*subsi3_carryin): Only match pattern if
 TARGET_ARM due to 'rsc' instruction alternative.
 * config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.



The *subsi3_carrying pattern has the arch attribute:
(set_attr "arch" "*,a")

That means that the second alternative that generates the RSC
instruction is only enabled
for ARM mode. Do you have a testcase where this doesn't happen and
this pattern generates
the second alternative for Thumb2?

No I don't have a test case; i noticed the pattern when working on the
overflow project. I did not realize
that an attribute could affect the matching of an alternative. I will
close the bug.



Thanks,
Kyrill

This is all true, but there is a potential performance issue with this
pattern though, that could lead to sub-optimal code.

The predicate accepts reg-or-int, but in ARM state only simple
'const-ok-for-arm' immediates are permitted by the predicates, and in
thumb code, no immediates are permitted at all.  This could potentially
result in sub-optimal code due to late splitting of the pattern.  It
would be better if the predicate understood these limitations and
restricted immediates accordingly.

R.



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH][ARM] PR target/70008

2016-03-02 Thread Michael Collison
I have attached a new patch which hopefully address Richard's concerns. 
I modified the predicate on operand 1 to to "arm_rhs_operand" to be 
consistent with the constraints. I retained the split into two patterns; 
one for arm and another for thumb2. I thought this was cleaner.


Okay for trunk?

2016-02-28  Michael Collison  

PR target/70008
* config/arm/arm.md (*subsi3_carryin): Change predicate to
arm_rhs_operand to be consistent with constraints.
Only allow pattern for TARGET_ARM.
* config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.

On 02/29/2016 08:29 AM, Richard Earnshaw (lists) wrote:

On 29/02/16 11:21, Michael Collison wrote:


On 2/29/2016 4:06 AM, Kyrill Tkachov wrote:

Hi Michael,

On 29/02/16 04:47, Michael Collison wrote:

This patches address PR 70008, where a reverse subtract with carry
instruction can be generated in thumb2 mode. It was tested with no
regressions in arm and thumb modes on the following targets:

arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison 

 PR target/70008
 * config/arm/arm.md (*subsi3_carryin): Only match pattern if
 TARGET_ARM due to 'rsc' instruction alternative.
 * config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.



The *subsi3_carrying pattern has the arch attribute:
(set_attr "arch" "*,a")

That means that the second alternative that generates the RSC
instruction is only enabled
for ARM mode. Do you have a testcase where this doesn't happen and
this pattern generates
the second alternative for Thumb2?

No I don't have a test case; i noticed the pattern when working on the
overflow project. I did not realize
that an attribute could affect the matching of an alternative. I will
close the bug.



Thanks,
Kyrill

This is all true, but there is a potential performance issue with this
pattern though, that could lead to sub-optimal code.

The predicate accepts reg-or-int, but in ARM state only simple
'const-ok-for-arm' immediates are permitted by the predicates, and in
thumb code, no immediates are permitted at all.  This could potentially
result in sub-optimal code due to late splitting of the pattern.  It
would be better if the predicate understood these limitations and
restricted immediates accordingly.

R.



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e67239d..e6bcd7f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -867,15 +867,14 @@
 
 (define_insn "*subsi3_carryin"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r")
-(minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I")
+(minus:SI (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,I")
 (match_operand:SI 2 "s_register_operand" "r,r"))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
-  "TARGET_32BIT"
+  "TARGET_ARM"
   "@
sbc%?\\t%0, %1, %2
rsc%?\\t%0, %2, %1"
   [(set_attr "conds" "use")
-   (set_attr "arch" "*,a")
(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
(set_attr "type" "adc_reg,adc_imm")]
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 9925365..79305c5 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -848,6 +848,20 @@
(set_attr "type" "multiple")]
 )
 
+(define_insn "*thumb2_subsi3_carryin"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+(minus:SI (minus:SI (match_operand:SI 1 "s_register_operand" "r")
+(match_operand:SI 2 "s_register_operand" "r"))
+  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  "TARGET_THUMB2"
+  "@
+   sbc%?\\t%0, %1, %2"
+  [(set_attr "conds" "use")
+   (set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "type" "adc_reg")]
+)
+
 (define_insn "*thumb2_cond_sub"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
 (minus:SI (match_operand:SI 1 "s_register_operand" "0,?Ts")
-- 
1.9.1



Re: [ARM] Optimize compare against smin/umin

2015-07-26 Thread Michael Collison

Here is an updated patch that addresses the issues you mentioned:

2015-07-24  Michael Collison  

  * gcc/config/arm/arm.md (*arm_smin_cmp): New pattern.
  (*arm_umin_cmp): Likewise.
  * gcc.target/arm/mincmp.c: Test min compare idiom.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0be70a8..361c292 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3455,6 +3455,44 @@
(set_attr "type" "multiple,multiple")]
 )

+;; t = (s/u)min (x, y)
+;; cc = cmp (t, z)
+;; is the same as
+;; cmp x, z
+;; cmpge(u) y, z
+
+(define_insn_and_split "*arm_smin_cmp"
+  [(set (reg:CC CC_REGNUM)
+(compare:CC
+ (smin:SI (match_operand:SI 0 "s_register_operand" "r")
+  (match_operand:SI 1 "s_register_operand" "r"))
+ (match_operand:SI 2 "s_register_operand" "r")))]
+  "TARGET_32BIT"
+  "#"
+  "&& reload_completed"
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 0) (match_dup 2)))
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+  (set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 1) (match_dup 2]
+)
+
+(define_insn_and_split "*arm_umin_cmp"
+  [(set (reg:CC CC_REGNUM)
+(compare:CC
+ (umin:SI (match_operand:SI 0 "s_register_operand" "r")
+  (match_operand:SI 1 "s_register_operand" "r"))
+ (match_operand:SI 2 "s_register_operand" "r")))]
+  "TARGET_32BIT"
+  "#"
+  "&& reload_completed"
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 0) (match_dup 2)))
+   (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0))
+  (set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 1) (match_dup 2]
+)
+
 (define_expand "umaxsi3"
   [(parallel [
 (set (match_operand:SI 0 "s_register_operand" "")
diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c 
b/gcc/testsuite/gcc.target/arm/mincmp.c

new file mode 100644
index 000..2a55c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mincmp.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm32 } */
+
+#define min(x, y) ((x) <= (y)) ? (x) : (y)
+
+unsigned int foo (unsigned int i, unsigned int x ,unsigned int y)
+{
+  return i < (min (x, y));
+}
+
+int bar (int i, int x, int y)
+{
+  return i < (min (x, y));
+}
+
+/* { dg-final { scan-assembler "cmpcs" } } */
+/* { dg-final { scan-assembler "cmpge" } } */
--
1.9.1
On 07/13/2015 04:27 AM, Ramana Radhakrishnan wrote:

On Thu, Jun 25, 2015 at 6:08 PM, Michael Collison
 wrote:

This patch is designed to optimize constructs such as:

#define min(x, y) ((x) <= (y)) ? (x) : (y)

unsignedint  foo (unsignedint  i, unsignedint  x ,unsignedint  y)
{
   return  i < (min (x, y));
}

int  bar (int  i,int  x,int  y)
{
   return  i < (min (x, y));
}

Patch was tested on arm-linux-gnueabi, arm-linux-gnueabihf,
armeb-linux-gnueabihf. Okay for trunk?

Sorry about the slow review and I wanted someone else to look at it
given I had a hand in writing this patch up.

Please add a testcase.



2015-06-24  Michael Collison  

Please fix the Changelog formatting here.


 * gcc/config/arm/arm.md (*arm_smin_cmp): New pattern.
 (*arm_umin_cmp): Likewise.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 1ac8af0..994c95f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3455,6 +3455,28 @@
 (set_attr "type" "multiple,multiple")]
  )

+;; t = (s/u)min (x, y)
+;; cc = cmp (t, z)
+;; is the same as
+;; cmp x, z
+;; cmpge(u) y, z
+
+(define_insn_and_split "*arm_smin_cmp"
+  [(set (reg:CC CC_REGNUM)
+(compare:CC
+ (smin:SI (match_operand:SI 0 "s_register_operand" "r")
+  (match_operand:SI 1 "s_register_operand" "r"))
+ (match_operand:SI 2 "s_register_operand" "r")))]
+  "TARGET_32BIT"
+  "#"
+  ""
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 0) (match_dup 2)))
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+  (set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 1) (match_dup 2]
+)


IIUC it's not entirely safe to have cond_execs in the instruction
stream prior to reload - I think the consensus was that spilling and
filling with cond-exec style instructions could end up with
non-cond-exec style spills thus destroying registers in the non
cond-exec cases. so, lets just add a reload_completed to be safe here.

See https://patches.linaro.org/6469/ for more on this topic.


+
  (define_expand "umaxsi3"
[(parallel [
  (set (match_operand:SI 0 "s_register_operand" "")
@@ -3521,6 +3543,22 @@
 (

[PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-26 Thread Michael Collison
This patch is designed to optimize end of loop conditions involving of 
the form
 i < x && i < y into i < min (x, y). Loop condition involving '>' are 
handled similarly using max(x,y).

As an example:

#define N 1024

int  a[N], b[N], c[N];

void add (unsignedint  m, unsignedint  n)
{
  unsignedint  i, bound = (m < n) ? m : n;
  for  (i = 0; i < m && i < n; ++i)
a[i] = b[i] + c[i];
}


Performed bootstrap and make check on: x86_64_unknown-linux-gnu, 
arm-linux-gnueabihf, and aarch64-linux-gnu.

Okay for trunk?

2015-07-24  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
 (convert (bit_and (op (convert:utype @0) (convert:utype @1))
   (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
--

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-29 Thread Michael Collison

Richard and Jeff,

Any conclusion to this discussion? Is this okay in match.pd or would you 
like to see it implemented elsewhere?


On 7/28/2015 12:41 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law  wrote:

On 07/27/2015 03:25 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
 wrote:

This patch is designed to optimize end of loop conditions involving of
the
form
   i < x && i < y into i < min (x, y). Loop condition involving '>' are
handled similarly using max(x,y).
As an example:

#define N 1024

int  a[N], b[N], c[N];

void add (unsignedint  m, unsignedint  n)
{
unsignedint  i, bound = (m < n) ? m : n;
for  (i = 0; i < m && i < n; ++i)
  a[i] = b[i] + c[i];
}


Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
arm-linux-gnueabihf, and aarch64-linux-gnu.
Okay for trunk?


So this works only for && that has been lowered to non-CFG form
(I suppose phiopt would catch that?  If not, ifcombine would be the
place to implement it I guess).

phiopt is supposed to be generating MIN/MAX expressions for us.  If it isn't
it'd be good to see any testcases where it isn't.

I think that raises a general question though.  Does it make more sense to
capture MIN/MAX (and others) in phiopt or in the match.pd framework?

match.pd is good for pattern recognition - patterns of fixed size.  There are
cases that are done in fold-const.c for example that doesn't fit very well
and should be done as separate pass, like for example figuring out whether
an expression can be easily negated or whether there are sign-changes that
can be stripped.  Basically all cases where fold currently recurses (unbound).

The above case is a corner case I think - the number of && you can change
into (multiple) MIN/MAX is unbound but we might only care about the case
where there will be one MIN/MAX operation.

Generally phiopt and other patterns that match the CFG are not yet well
supported by match.pd (though I outlined how matching PHI nodes when
facing (simplify (cond ...) ...) would be possible).

So while putting something into match.pd is easy I'd like people to
think if doing the same thing elsewhere is better - that is, if this is really
a pattern transform operation or if you are just implementing a special-case
of a general transform as a pattern.

Richard.


Jeff





Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-31 Thread Michael Collison

Hi Jeff,

Yes I will create a test case. I'm not quite sure what to check for even 
in the machine dependent test case. It's quite possible for the 
instructions that are generated to change over time.


On 7/31/2015 9:20 AM, Jeff Law wrote:

On 07/28/2015 01:41 AM, Richard Biener wrote:


The above case is a corner case I think - the number of && you can 
change

into (multiple) MIN/MAX is unbound but we might only care about the case
where there will be one MIN/MAX operation.

I suspect that's going to be the most important/common case.



Generally phiopt and other patterns that match the CFG are not yet well
supported by match.pd (though I outlined how matching PHI nodes when
facing (simplify (cond ...) ...) would be possible).
Right.  Though I thought the conclusion after outlining we determined 
it wasn't really feasible, yet.





So while putting something into match.pd is easy I'd like people to
think if doing the same thing elsewhere is better - that is, if this 
is really
a pattern transform operation or if you are just implementing a 
special-case

of a general transform as a pattern.

So in this case we're taking something like:

 _6 = i_1 < m_4(D);
  _7 = i_1 < n_3(D);
  _8 = _6 & _7;
  if (_8 != 0)


And turning it into

_X = MIN (m_4, n_3)
if (i_1 < _X)

That seems to me like a good match for match.pd given its generality 
and the need to walk up the use-def chain.  It's certainly not a good 
fit for phi-opt since we're not looking at PHIs :-)




Michael -- can you take your sample code and turn it into a test for 
the testsuite.  I'd hazard a guess it'll need to be target specific 
because of its interactions with branch-costing.  Might as well make 4 
variants (lt -> MIN, le -> MIN, ge->MAX, gt->MAX).


We're going to want that regardless of whether tackling this issue in 
match.pd (my current preference) or elsewhere.


jeff




Re: [PATCH, PR 57195] Allow mode iterators inside angle brackets

2015-09-16 Thread Michael Collison


On 09/14/2015 02:34 AM, Richard Sandiford wrote:

Michael Collison  writes:

Here is a modified patch that takes your comments into account. Breaking
on depth == 0 with '>' does not work due to the code looking for whitespace.

What goes wrong?  Just to make sure we're talking about the same thing,
I meant that in:

(match_operand:FOO> ...

the name should be "FOO" and you should get an error on ">" when parsing
the text after the name, just like you would for:

(match_operand:FOO] ...


When I try breaking on '>' with a nesting depth of 0 all examples of 
 fail.


It's not a big deal though, so...


2015-08-25  Michael Collison  

  PR other/57195
  * read-md.c (read_name): Allow mode iterators inside angle
  brackets in rtl expressions.

OK, thanks.

Meaning okay to check the patch in?


Richard



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



[PING] [Aarch64] Use vector wide add for mixed-mode adds

2015-09-16 Thread Michael Collison

Ping. Originally posted here:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00408.html

Regards,

Michael Collison

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-17 Thread Michael Collison

On 07/31/2015 11:27 AM, Jeff Law wrote:

On 07/31/2015 12:18 PM, Michael Collison wrote:

Hi Jeff,

Yes I will create a test case. I'm not quite sure what to check for even
in the machine dependent test case. It's quite possible for the
instructions that are generated to change over time.
I think we're going to want to look at the gimple IR and search for 
the MIN/MAX expressions rather than the instructions.  Given we don't 
know where the transformation is going to land (yet), you can probably 
start with -fdump-tree-optimized and scanning the .optimized dump.


We can still do that and have the test be target specific.

jeff


Jeff and Richard,

Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
expressions. I need some assistance; this test case will fail on targets 
that don't have support for MIN/MAX such as 68k. Is there any way to 
remedy this short of enumerating whether a target support MIN/MAX in 
testsuite/lib/target_support?


2015-07-24  Michael Collison 
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
 (convert (bit_and (op (convert:utype @0) (convert:utype @1))
   (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c

new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)
+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
--
1.9.1

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-18 Thread Michael Collison

Marc,

Can you elaborate on merging the patterns using 'ext' as mentioned in 
your post? I don't see any documentation or examples.


On 09/18/2015 12:00 AM, Marc Glisse wrote:

On Thu, 17 Sep 2015, Michael Collison wrote:

Here is the the patch modified with test cases for MIN_EXPR and 
MAX_EXPR expressions. I need some assistance; this test case will 
fail on targets that don't have support for MIN/MAX such as 68k. Is 
there any way to remedy this short of enumerating whether a target 
support MIN/MAX in testsuite/lib/target_support?


2015-07-24  Michael Collison 
   Andrew Pinski 

   * match.pd ((x < y) && (x < z) -> x < min (y,z),
   (x > y) and (x > z) -> x > max (y,z))
   * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see
(convert (bit_and (op (convert:utype @0) (convert:utype @1))
  (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use 
op:s since this is mostly useful if it removes the 2 original 
comparisons.



+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the 
transformation could make sense in other cases as well. It can always 
be generalized later though.



+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
 ext (min min max max)
 (simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c

new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the 
target. Also, both tests seem to be for GENERIC (i.e. I expect that 
you are already seeing the optimized version with -fdump-tree-original 
or -fdump-tree-gimple). Maybe something as simple as:

int f(long a, long b, long c) {
  int cmp1 = a < b;
  int cmp2 = a < c;
  return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */




--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [ARM] Use vector wide add for mixed-mode adds

2015-09-22 Thread Michael Collison
This is a modified version of the previous patch that removes the 
documentation and read-md.c fixes. These patches have been submitted 
separately and approved.


This patch is designed to address code that was not being vectorized due 
to missing widening patterns in the ARM backend. Code such as:


int t6(int len, void * dummy, short * __restrict x)
{
  len = len & ~31;
  int result = 0;
  __asm volatile ("");
  for (int i = 0; i < len; i++)
result += x[i];
  return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi, 
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.


2015-09-22  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns
where mode is VQI to improve mixed mode add vectorization.

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..54623fe 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,57 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "s_register_operand" "")
+	(plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" ""))
+			   (match_operand: 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+int i;
+int half_elem = /2;
+rtvec v1 = rtvec_alloc (half_elem);
+rtvec v2 = rtvec_alloc (half_elem);
+rtx p1, p2;
+
+for (i = 0; i < half_elem; i++)
+  RTVEC_ELT (v1, i) = GEN_INT (i);
+p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+for (i = half_elem; i < ; i++)
+  RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+if (operands[0] != operands[2])
+  emit_move_insn (operands[0], operands[2]);
+
+emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0]));
+emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0]));
+DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_ssum_lo3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+	(plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+		   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		(match_operand: 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
+(define_insn "vec_sel_widen_ssum_hi3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+	(plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+		   (match_operand:VQI 2 "vect_par_constant_high" "")))
+		(match_operand: 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
 (define_insn "widen_ssum3"
   [(set (match_operand: 0 "s_register_operand" "=w")
 	(plus: (sign_extend:
@@ -1184,4 +1235,55 @@
   [(set_attr "type" "neon_add_widen")]
 )
 
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "s_register_operand" "")
+	(plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" ""))
+			   (match_operand: 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+int i;
+int half_elem = /2;
+rtvec v1 = rtvec_alloc (half_elem);
+rtvec v2 = rtvec_alloc (half_elem);
+rtx p1, p2;
+
+for (i = 0; i < half_elem; i++)
+  RTVEC_ELT (v1, i) = GEN_INT (i);
+p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+for (i = half_elem; i < ; i++)
+  RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+if (operands[0] != operands[2])
+  emit_move_insn (operands[0], operands[2]);
+
+emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0]));
+emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0]));
+DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_usum_lo3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+	(plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+		   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		(match_operand: 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length"

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison

Richard and Marc,

What is ':s'? I don't see any documentation for it. So you would like me 
to remove :c and add :s?



On 09/18/2015 02:23 AM, Richard Biener wrote:

On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:

Just a couple extra points. We can end up with a mix of < and >, which might
prevent from matching:

   _3 = b_1(D) > a_2(D);
   _5 = a_2(D) < c_4(D);
   _8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first transformation
before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.


On Fri, 18 Sep 2015, Marc Glisse wrote:


On Thu, 17 Sep 2015, Michael Collison wrote:


Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR
expressions. I need some assistance; this test case will fail on targets
that don't have support for MIN/MAX such as 68k. Is there any way to remedy
this short of enumerating whether a target support MIN/MAX in
testsuite/lib/target_support?

2015-07-24  Michael Collison 
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
 (convert (bit_and (op (convert:utype @0) (convert:utype @1))
   (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use op:s
since this is mostly useful if it removes the 2 original comparisons.


+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the
transformation could make sense in other cases as well. It can always be
generalized later though.


+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
 ext (min min max max)
(simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the target.
Also, both tests seem to be for GENERIC (i.e. I expect that you are already
seeing the optimized version with -fdump-tree-original or
-fdump-tree-gimple). Maybe something as simple as:
int f(long a, long b, long c) {
  int cmp1 = a < b;
  int cmp2 = a < c;
  return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */




--
Marc Glisse


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison


The current patch is attached.

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.


On 09/30/2015 01:14 AM, Richard Biener wrote:

On Wed, Sep 30, 2015 at 9:29 AM, Michael Collison
 wrote:

Richard and Marc,

What is ':s'? I don't see any documentation for it. So you would like me to
remove :c and add :s?

There is documentation for both in the internals manual.

I don't have enough context to say whether you should remove "them" or
not.  What's
the current patch?  If you made the suggested changes you should be left with
only required :s and :c.

Richard.



On 09/18/2015 02:23 AM, Richard Biener wrote:

On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:

Just a couple extra points. We can end up with a mix of < and >, which
might
prevent from matching:

_3 = b_1(D) > a_2(D);
_5 = a_2(D) < c_4(D);
_8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first
transformation
before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.


On Fri, 18 Sep 2015, Marc Glisse wrote:


On Thu, 17 Sep 2015, Michael Collison wrote:


Here is the the patch modified with test cases for MIN_EXPR and
MAX_EXPR
expressions. I need some assistance; this test case will fail on
targets
that don't have support for MIN/MAX such as 68k. Is there any way to
remedy
this short of enumerating whether a target support MIN/MAX in
testsuite/lib/target_support?

2015-07-24  Michael Collison 
 Andrew Pinski 

 * match.pd ((x < y) && (x < z) -> x < min (y,z),
 (x > y) and (x > z) -> x > max (y,z))
 * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not
see
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
(convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use
op:s
since this is mostly useful if it removes the 2 original comparisons.


+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the
transformation could make sense in other cases as well. It can always be
generalized later though.


+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
  ext (min min max max)
(simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the
target.
Also, both tests seem to be for GENERIC (i.e. I expect that you are
already
seeing the optimized version with -fdump-tree-original or
-fdump-tree-gimple). Maybe something as simple as:
int f(long a, long b, long c) {
   int cmp1 = a < b;
   int cmp2 = a < c;
   return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */




--
Marc Glisse


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --gi

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison

Richard and Marc,

Latest patch attached which incorporates all comments.

2015-09-30  Michael Collison 
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

On 09/30/2015 12:30 PM, Marc Glisse wrote:

On Fri, 18 Sep 2015, Marc Glisse wrote:

+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to 
use op:s
since this is mostly useful if it removes the 2 original 
comparisons.


As I was saying, :c is useless.
(x:c y z)
is replaced by two copies of the transformation, one with
(x y z)
and the other with
(x z y)
In your transformation, both versions would be equivalent, so the second
one is redundant.

Also, if you have:
a=x

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..ef2e025 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2311,3 +2311,13 @@ along with GCC; see the file COPYING3.  If not see
 (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
 	   (convert:utype @4
+
+/* Transform (@0 < @1 and @0 < @2) to use min, 
+   (@0 > @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+(simplify
+(bit_and (op:s @0 @1) (op:s @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..dfe6120
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-01 Thread Michael Collison


ChangeLog formatting and test case fixed.

On 09/30/2015 12:30 PM, Marc Glisse wrote:

On Fri, 18 Sep 2015, Marc Glisse wrote:

+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to 
use op:s
since this is mostly useful if it removes the 2 original 
comparisons.


As I was saying, :c is useless.
(x:c y z)
is replaced by two copies of the transformation, one with
(x y z)
and the other with
(x z y)
In your transformation, both versions would be equivalent, so the second
one is redundant.

Also, if you have:
a=x

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..ef2e025 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2311,3 +2311,13 @@ along with GCC; see the file COPYING3.  If not see
 (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
 	   (convert:utype @4
+
+/* Transform (@0 < @1 and @0 < @2) to use min, 
+   (@0 > @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+(simplify
+(bit_and (op:s @0 @1) (op:s @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..2e4300c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-01 Thread Michael Collison

Marc,

Ah I did misunderstand you. Patch with match.pd formatting fix.

On 10/01/2015 01:05 AM, Marc Glisse wrote:

On Thu, 1 Oct 2015, Michael Collison wrote:


ChangeLog formatting and test case fixed.


Oups, sorry for the lack of precision, but I meant indenting the code 
in match.pd, I hadn't even looked at the ChangeLog.




--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..caf3c82 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2311,3 +2311,13 @@ along with GCC; see the file COPYING3.  If not see
 (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
 	   (convert:utype @4
+
+/* Transform (@0 < @1 and @0 < @2) to use min, 
+   (@0 > @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+ (simplify
+  (bit_and (op:s @0 @1) (op:s @0 @2))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   (op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..2e4300c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1



Re: [ARM] Use vector wide add for mixed-mode adds

2015-10-01 Thread Michael Collison

Kyrill,

I have modified the patch to address your comments. I also modified 
check_effective_target_vect_widen_sum_hi_to_si_pattern in 
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode. 
This resolved

several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have 
four related execution failure

tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns
where mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not being vectorized due
to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
len = len & ~31;
int result = 0;
__asm volatile ("");
for (int i = 0; i < len; i++)
  result += x[i];
return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison 

  * config/arm/neon.md (widen_sum): New patterns
  where mode is VQI to improve mixed mode add vectorization.



Please list all the new define_expands and define_insns
in the changelog. Also, please add an ChangeLog entry for
the testsuite additions.

The approach looks ok to me with a few comments on some
parts of the patch itself.


+(define_insn "vec_sel_widen_ssum_hi3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+(plus: (sign_extend: (vec_select:VW 
(match_operand:VQI 1 "s_register_operand" "%w")
+   (match_operand:VQI 2 
"vect_par_constant_high" "")))
+(match_operand: 3 "s_register_operand" 
"0")))]

+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)


This is a single instruction, and it has a length of 4, so no need to 
override the length attribute.

Same with the other define_insns in this patch.


diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c 
b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c

new file mode 100644
index 000..ed10669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */

The arm_neon_hw check is usually used when you want to run the tests.
Since this is a compile-only tests you just need arm_neon_ok.

 +/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
+
+
+

Stray trailing newlines. Similar comments for the other testcases.

Thanks,
Kyrill



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..b3485f1 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,55 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "s_register_operand" "")
+	(plus: (sign_extend: (match_operand:VQI 1 &

Re: [ARM] Use vector wide add for mixed-mode adds

2015-10-20 Thread Michael Collison

Hi Kyrill,

Since your email I have done the following:

1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian 
targets. The big endian patches produced no change in the test results. 
I still have several execution failures with targeting big endian with 
lto enabled.


2. I diff'd the rtl dumps from a big endian compiler with lto enabled 
and disabled. I also examined the assembly language and there no 
differences except for the .ascii directives.


I want to ask a question about existing patterns in neon.md that utilize 
the vec_select and all the lanes as my example does: Why are the 
following pattern not matched if the target is big endian?



(define_insn "neon_vec_unpack_lo_"
  [(set (match_operand: 0 "register_operand" "=w")
(SE: (vec_select:
  (match_operand:VU 1 "register_operand" "w")
  (match_operand:VU 2 "vect_par_constant_low" ""]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl. %q0, %e1"
  [(set_attr "type" "neon_shift_imm_long")]
)

(define_insn "neon_vec_unpack_hi_"
  [(set (match_operand: 0 "register_operand" "=w")
(SE: (vec_select:
  (match_operand:VU 1 "register_operand" "w")
  (match_operand:VU 2 "vect_par_constant_high" ""]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl. %q0, %f1"
  [(set_attr "type" "neon_shift_imm_long")]

These patterns are similar to the new patterns I am adding and I am 
wondering if my patterns should exclude BYTES_BIG_ENDIAN?


On 10/08/2015 04:02 AM, Kyrill Tkachov wrote:

Hi Michael,

On 01/10/15 11:05, Michael Collison wrote:

Kyrill,

I have modified the patch to address your comments. I also modified
check_effective_target_vect_widen_sum_hi_to_si_pattern in
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode.
This resolved
several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have
four related execution failure
tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


We'd want to get to the bottom of these before committing.
Does codegen before and after the patch show anything?
When it comes to big-endian and NEON, the fiddly parts are
usually lane numbers. Do you need to select the proper lanes with
ENDIAN_LANE_N like Charles in his patch at:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html?

Thanks,
Kyrill



I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison 

  * config/arm/neon.md (widen_sum): New patterns
  where mode is VQI to improve mixed mode vectorization.
  * config/arm/neon.md 
(vec_sel_widen_ssum_lo3): New

  define_insn to match low half of signed vaddw.
  * config/arm/neon.md 
(vec_sel_widen_ssum_hi3): New

  define_insn to match high half of signed vaddw.
  * config/arm/neon.md 
(vec_sel_widen_usum_lo3): New

  define_insn to match low half of unsigned vaddw.
  * config/arm/neon.md 
(vec_sel_widen_usum_hi3): New

  define_insn to match high half of unsigned vaddw.
  * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
  * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
  * testsuite/lib/target-supports.exp
  (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
  that arm neon support vector widen sum of HImode TO SImode.


Note that the testsuite changes should have their own ChangeLog entry
with the paths there starting relative to gcc/testsuite/



On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not being 
vectorized due

to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
 len = len & ~31;
 int result = 0;
 __asm volatile ("");
 for (int i = 0; i < len; i++)
   result += x[i];
 return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison 

   * config/arm/neon.md (widen_sum): New patterns

[ARM][PATCH, PR 68223] arm_[su]min_cmp pattern fails

2015-11-05 Thread Michael Collison
The patterns arm_smin_cmp and arm_umin_cmp patterns fail if operand 0 
and operand 2 are equal and both are less than operand 1. The solution 
is to remove the two patterns.


2015-11-06  Michael Collison  

PR target/68223
* gcc/config/arm/arm.md (*arm_smin_cmp): Remove pattern.
(*arm_umin_cmp): Likewise.
* gcc.target/arm/mincmp.c: Remove testcase.

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 02e147e..6ba1ec3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3455,44 +3455,6 @@
(set_attr "type" "multiple,multiple")]
 )
 
-;; t = (s/u)min (x, y)
-;; cc = cmp (t, z)
-;; is the same as
-;; cmp x, z
-;; cmpge(u) y, z
-
-(define_insn_and_split "*arm_smin_cmp"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	 (smin:SI (match_operand:SI 0 "s_register_operand" "r")
-		  (match_operand:SI 1 "s_register_operand" "r"))
-	 (match_operand:SI 2 "s_register_operand" "r")))]
-  "TARGET_32BIT"
-  "#"
-  "&& reload_completed"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_dup 0) (match_dup 2)))
-   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
-	  (set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 1) (match_dup 2]
-)
-
-(define_insn_and_split "*arm_umin_cmp"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	 (umin:SI (match_operand:SI 0 "s_register_operand" "r")
-		  (match_operand:SI 1 "s_register_operand" "r"))
-	 (match_operand:SI 2 "s_register_operand" "r")))]
-  "TARGET_32BIT"
-  "#"
-  "&& reload_completed"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_dup 0) (match_dup 2)))
-   (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0))
-	  (set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 1) (match_dup 2]
-)
-
 (define_expand "umaxsi3"
   [(parallel [
 (set (match_operand:SI 0 "s_register_operand" "")
diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c b/gcc/testsuite/gcc.target/arm/mincmp.c
deleted file mode 100644
index ade3bd9..000
--- a/gcc/testsuite/gcc.target/arm/mincmp.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2" } */
-/* { dg-require-effective-target arm32 } */
-
-#define min(x, y) ((x) <= (y)) ? (x) : (y)
-
-unsigned int 
-foo (unsigned int i, unsigned int x, unsigned int y)
-{
-  return i < (min (x, y));
-}
-
-int 
-bar (int i, int x, int y)
-{
-  return i < (min (x, y));
-}
-
-/* { dg-final { scan-assembler "cmpcs" } } */
-/* { dg-final { scan-assembler "cmpge" } } */
-- 
1.9.1



Re: [Aarch64] Use vector wide add for mixed-mode adds

2015-11-08 Thread Michael Collison

This is a followup patch to my earlier patch here:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00408.html

and comments here:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01300.html

This patches fixes the failure in slp-reduc-3.c by adding aarch64 support in
check_effective_target_vect_widen_sum_hi_to_si_pattern in 
target-supports.exp.
The remaining failures in slp-multitypes-[45].c and vect-125.c appear to 
be deficiencies in

the vectorizer, as the same failures are seen on PowerPC and ia64. See here:

PowerPC: https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg03293.html
ia64: https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg03176.html

Thanks to James Greenhalgh at Arm for pointing this out. My patch 
disables these tests for targets with
widening adds that support V8HI to V4SI. Tested on aarch64-none-elf, 
aarch64_be-none-elf, and aarch64-none-linus-gnu.


2015-11-06  Michael Collison 
* config/aarch64/aarch64-simd.md (widen_ssum, widen_usum)
(aarch64_w_internal): New patterns
* config/aarch64/iterators.md (Vhalf, VDBLW): New mode attributes.
* gcc.target/aarch64/saddw-1.c: New test.
* gcc.target/aarch64/saddw-2.c: New test.
* gcc.target/aarch64/uaddw-1.c: New test.
* gcc.target/aarch64/uaddw-2.c: New test.
* gcc.target/aarch64/uaddw-3.c: New test.
* gcc.dg/vect/slp-multitypes-4.c: Disable test for
targets with widening adds from V8HI=>V4SI.
* gcc.dg/vect/slp-multitypes-5.c: Ditto.
* gcc.dg/vect/vect-125.c: Ditto.
* lib/target-support.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern):
Add aarch64 to list of support targets.

Okay to commit?

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 65a2b6f..acb7cf0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2750,6 +2750,60 @@
 
 ;; w.
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend: (match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_saddw_internal (temp, operands[2],
+		operands[1], p));
+emit_insn (gen_aarch64_saddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend:
+		   (match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_saddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend: (match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_uaddw_internal (temp, operands[2],
+		 operands[1], p));
+emit_insn (gen_aarch64_uaddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend:
+		   (match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_uaddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
 (define_insn "aarch64_w"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
@@ -2760,6 +2814,18 @@
   [(set_attr "type" "neon__widen")]
 )
 
+(define_insn "aarch64_w_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+(ADDSUB: (match_operand: 1 "register_operand" "w")
+			(ANY_EXTEND:
+			  (vec_select:
+			   (match_operand:VQW 2 "register_operand" "w")
+			   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
+  "TARGET_SIMD"
+  "w\\t%0., %1., %2."
+  [(set_attr "type" "neon__widen")]
+)
+
 (define_insn "aarch64_w2_internal"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/iterators.md b/g

Re: [Aarch64] Use vector wide add for mixed-mode adds

2015-11-22 Thread Michael Collison



On 11/22/2015 8:48 AM, James Greenhalgh wrote:

On Sun, Nov 08, 2015 at 11:51:47PM -0700, Michael Collison wrote:

2015-11-06  Michael Collison 
 * config/aarch64/aarch64-simd.md (widen_ssum, widen_usum)
(aarch64_w_internal): New patterns
 * config/aarch64/iterators.md (Vhalf, VDBLW): New mode attributes.
 * gcc.target/aarch64/saddw-1.c: New test.
 * gcc.target/aarch64/saddw-2.c: New test.
 * gcc.target/aarch64/uaddw-1.c: New test.
 * gcc.target/aarch64/uaddw-2.c: New test.
 * gcc.target/aarch64/uaddw-3.c: New test.
 * lib/target-support.exp
 (check_effective_target_vect_widen_sum_hi_to_si_pattern):
 Add aarch64 to list of support targets.


These hunks are all OK (with the minor style comments below applied).


Okay I will update with your comments.


As we understand what's happening here, let's take the regressions below
for now and add AArch64 to the targets affected by pr68333.


 * gcc.dg/vect/slp-multitypes-4.c: Disable test for
 targets with widening adds from V8HI=>V4SI.
 * gcc.dg/vect/slp-multitypes-5.c: Ditto.
 * gcc.dg/vect/vect-125.c: Ditto.

Let's leave these for now, while we wait for pr68333.


To clarify you would like me to exclude these bits from the patch?




diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 65a2b6f..acb7cf0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2750,6 +2750,60 @@
  
  ;; w.
  
+(define_expand "widen_ssum3"

+  [(set (match_operand: 0 "register_operand" "")
+   (plus: (sign_extend: (match_operand:VQW 1 "register_operand" 
""))

Split this line (more than 80 characters).


+ (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_saddw_internal (temp, operands[2],
+   operands[1], p));
+emit_insn (gen_aarch64_saddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+   (plus: (sign_extend:
+  (match_operand:VD_BHSI 1 "register_operand" ""))
+ (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_saddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+   (plus: (zero_extend: (match_operand:VQW 1 "register_operand" 
""))

Split this line (more than 80 characters).


+ (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_uaddw_internal (temp, operands[2],
+operands[1], p));
+emit_insn (gen_aarch64_uaddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+   (plus: (zero_extend:
+  (match_operand:VD_BHSI 1 "register_operand" ""))
+ (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_uaddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
  (define_insn "aarch64_w"
[(set (match_operand: 0 "register_operand" "=w")
  (ADDSUB: (match_operand: 1 "register_operand" "w")
@@ -2760,6 +2814,18 @@
[(set_attr "type" "neon__widen")]
  )
  
+(define_insn "aarch64_w_internal"

+  [(set (match_operand: 0 "register_operand" "=w")
+(ADDSUB: (match_operand: 1 "register_operand" "w")
+   (ANY_EXTEND:
+ (vec_select:
+  (match_operand:VQW 2 "register_operand" "w")
+  (match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
+  "TARGET_SIMD"
+  "w\\t%0., %1., %2."
+  [(set_attr "type" "neon__widen")]
+)
+
  (define_insn "aarch64_w2_internal"
[(set (match_operand: 0 "register_operand" "=w")
  (ADDSUB: (match_operand: 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/saddw-1.c 
b/gcc/testsuite/gcc.target/aarch64/saddw-1.c
new file mode 100644
index 0

Re: [Aarch64] Use vector wide add for mixed-mode adds

2015-11-24 Thread Michael Collison

This is a followup patch which addresses formatting comments posted here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02611.html

2015-11-24  Michael Collison 
* config/aarch64/aarch64-simd.md (widen_ssum, widen_usum)
(aarch64_w_internal): New patterns
* config/aarch64/iterators.md (Vhalf, VDBLW): New mode attributes.
* gcc.target/aarch64/saddw-1.c: New test.
* gcc.target/aarch64/saddw-2.c: New test.
* gcc.target/aarch64/uaddw-1.c: New test.
* gcc.target/aarch64/uaddw-2.c: New test.
* gcc.target/aarch64/uaddw-3.c: New test.
* lib/target-support.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern):
Add aarch64 to list of support targets.

Okay to commit?

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 3fa23b3..79be6be 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2777,6 +2777,62 @@
 
 ;; w.
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend: 
+		(match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_saddw_internal (temp, operands[2],
+		operands[1], p));
+emit_insn (gen_aarch64_saddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend:
+		(match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_saddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend: 
+		(match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_uaddw_internal (temp, operands[2],
+		 operands[1], p));
+emit_insn (gen_aarch64_uaddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend:
+		(match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_uaddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
 (define_insn "aarch64_w"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
@@ -2787,6 +2843,18 @@
   [(set_attr "type" "neon__widen")]
 )
 
+(define_insn "aarch64_w_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+(ADDSUB: (match_operand: 1 "register_operand" "w")
+			(ANY_EXTEND:
+			  (vec_select:
+			   (match_operand:VQW 2 "register_operand" "w")
+			   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
+  "TARGET_SIMD"
+  "w\\t%0., %1., %2."
+  [(set_attr "type" "neon__widen")]
+)
+
 (define_insn "aarch64_w2_internal"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c2eb7de..02e930b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -479,6 +479,13 @@
 			 (V4SF "V2SF")  (V4HF "V2HF")
 			 (V8HF "V4HF")  (V2DF  "DF")])
 
+;; Half modes of all vector modes, in lower-case.
+(define_mode_attr Vhalf [(V8QI "v4qi")  (V16QI "v8qi")
+			 (V4HI "v2hi")  (V8HI  "v4hi")
+			 (V2SI "si")(V4SI  "v2si")
+			 (V2DI "di")(V2SF  "sf")
+			 (V4SF "v2sf")  (V2DF  "df")])
+
 ;; Double modes of vector modes.
 (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI")
 			(V4HF "V8HF")
@@ -496,6 +503,11 @@
 			(SI   "v2si")  (DI   "v2di")
 			(DF   "v2df")])
 
+;; Modes with double-width elements.
+(define

Re: [ARM] Use vector wide add for mixed-mode adds

2015-11-29 Thread Michael Collison


This is a modified version of my previous patch that supports vector 
wide add. I added support for vaddw on big endian when generating the 
parallel operand for the vector select.


There are four failing test cases on arm big endian with similar code. 
They are:


gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


The failures occur without my patch and are related to a bug with vector 
loads using VUZP operations.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

Validated on arm-none-eabi, arm-none-linux-gnueabi, 
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.


2015-11-29  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype
for new function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
aarch32_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

Okay for trunk?

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f9b1276..26fe370 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
 			  ATTRIBUTE_UNUSED);
 extern void arm_init_builtins (void);
 extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+		   bool high);
 #ifdef RTX_CODE
 extern bool arm_vector_mode_supported_p (machine_mode);
 extern bool arm_small_register_classes_for_mode_p (machine_mode);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 61e2aa2..158c2e8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30111,4 +30111,80 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
   *pri = tmp;
   return;
 }
+
+/* Construct and return a PARALLEL RTX vector with elements numbering the
+   lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
+   the vector - from the perspective of the architecture.  This does not
+   line up with GCC's perspective on lane numbers, so we end up with
+   different masks depending on our target endian-ness.  The diagram
+   below may help.  We must draw the distinction when building masks
+   which select one half of the vector.  An instruction selecting
+   architectural low-lanes for a big-endian target, must be described using
+   a mask selecting GCC high-lanes.
+
+ Big-Endian Little-Endian
+
+GCC 0   1   2   3   3   2   1   0
+  | x | x | x | x |   | x | x | x | x |
+Architecture3   2   1   0   3   2   1   0
+
+Low Mask: { 2, 3 }{ 0, 1 }
+High Mask:{ 0, 1 }{ 2, 3 }
+*/
+
+rtx
+aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high)
+{
+  int nunits = GET_MODE_NUNITS (mode);
+  rtvec v = rtvec_alloc (nunits / 2);
+  int high_base = nunits / 2;
+  int low_base = 0;
+  int base;
+  rtx t1;
+  int i;
+
+  if (BYTES_BIG_ENDIAN)
+base = high ? low_base : high_base;
+  else
+base = high ? high_base : low_base;
+
+  for (i = 0; i < nunits / 2; i++)
+RTVEC_ELT (v, i) = GEN_INT (base + i);
+
+  t1 = gen_rtx_PARALLEL (mode, v);
+  return t1;
+}
+
+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspectiv

[PING] [ARM] Use vector wide add for mixed-mode adds

2015-12-07 Thread Michael Collison

Ping. Originally posted here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03440.html

Regards,

Michael Collison


[PATCH] match.pd: rewrite select to branchless expression

2022-11-08 Thread Michael Collison
This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into 
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also 
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 
0x1)) & z ) op y.


Matching this patterns allows GCC to generate branchless code for one of 
the functions in coremark.


Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-08  Michael Collison  

    * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
    -> (-(and (x , 0x1)) & z ) op y)

2022-11-08  Michael Collison  

    * gcc.dg/tree-ssa/branchless-cond.c: New test.

---
 gcc/match.pd  | 22 
 .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..722f517ac6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))

+/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq (bit_and @0 integer_onep@1)
+    integer_zerop)
+    @2
+    (op:c @3 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
+/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne (bit_and @0 integer_onep@1)
+    integer_zerop)
+    (op:c @3 @2)
+    @2)
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
 /* Simplifications of shift and rotates.  */

 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1






Re: [PATCH] match.pd: rewrite select to branchless expression

2022-11-09 Thread Michael Collison

Richard,

Thanks for your feedback. I want to make sure I am following what you 
are recommending. Are you suggesting changing:


(for op (bit_xor bit_ior)
(simplify
(cond (eq (bit_and @0 integer_onep@1)
integer_zerop)
@2
(op:c @3 @2))
(if (INTEGRAL_TYPE_P (type)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2


to

(for op (bit_xor bit_ior)
 (simplify
  (cond (eq zero_one_valued_p@0
    integer_zerop)
    @1
    (op:c @2 @1))
  (if (INTEGRAL_TYPE_P (type)
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (bit_and (negate (convert:type (bit_and @0 { build_one_cst 
(type); }))) @2) @1



On 11/9/22 02:41, Richard Biener wrote:

On Tue, Nov 8, 2022 at 9:02 PM Michael Collison  wrote:

This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-08  Michael Collison  

  * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
  -> (-(and (x , 0x1)) & z ) op y)

2022-11-08  Michael Collison  

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---
   gcc/match.pd  | 22 
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 48 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..722f517ac6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z )
^ y */

Please write the match as a C expression in the comment, as present
it's a weird mix.  So x & 0x1 == 0 ? y : z  y -> (-(typeof(y))(x &
0x1) & z)  y


+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq (bit_and @0 integer_onep@1)
+integer_zerop)
+@2
+(op:c @3 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2

Since you are literally keeping (bit_and @0 @1) and not matching @0 with
anything I suspect you could instead use

  (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ...

eventually extending that to cover bit_and with one.  Do you need to guard
this against 'type' being a signed/unsigned 1-bit precision integer?


+
+/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z )
^ y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne (bit_and @0 integer_onep@1)
+integer_zerop)
+(op:c @3 @2)
+@2)
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1






[PATCH v2] match.pd: rewrite select to branchless expression

2022-11-10 Thread Michael Collison
This patches transforms ((x & 0x1) == 0) ? y : z  y -into 
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also 
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 
0x1)) & z ) op y.


Matching this patterns allows GCC to generate branchless code for one of 
the functions in coremark.


Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-10  Michael Collison  

    * match.pd ((x & 0x1) == 0) ? y : z  y
    -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

    * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

 gcc/match.pd  | 24 +
 .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1



Re: [PATCH v2] match.pd: rewrite select to branchless expression

2022-11-11 Thread Michael Collison

Hi Prathamesh,

It is my understanding that INTEGRAL_TYPE_P applies to the other integer 
types you mentioned (chart, short, long). In fact the test function that 
motivated this match has a mixture of char and short and does not 
restrict matching.


On 11/11/22 02:44, Prathamesh Kulkarni wrote:

On Fri, 11 Nov 2022 at 07:58, Michael Collison  wrote:

This patches transforms ((x & 0x1) == 0) ? y : z  y -into
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-10  Michael Collison  

  * match.pd ((x & 0x1) == 0) ? y : z  y
  -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

   gcc/match.pd  | 24 +
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 50 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}

Sorry to nitpick -- Since the pattern gates on INTEGRAL_TYPE_P, would
it be a good idea
to have these tests for other integral types too besides int like
{char, short, long} ?

Thanks,
Prathamesh

+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1



Re: [PATCH v4 09/10] This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-18 Thread Michael Collison

Thanks Kito I will look into this.


On 4/18/23 10:26, Kito Cheng wrote:

I would prefer drop this patch from this patch series since I believe
https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
is the right fix for this issue.

On Tue, Apr 18, 2023 at 2:40 AM Michael Collison  wrote:

From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

 * tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
  gcc/tree-vect-data-refs.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
   poly_uint64 nelt = GET_MODE_NUNITS (mode);

   /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
   vec_perm_builder sel (nelt, 2, 3);
   sel.quick_grow (6);
   for (i = 0; i < 3; i++)
--
2.34.1



Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread Michael Collison

Juzhe and Kito,

Thank you for the clarification.

On 4/18/23 18:48, juzhe.zh...@rivai.ai wrote:

Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless 
to fix it here.
We dynamic adjust the minimum vector-length for different '-march' 
according to RVV ISA specification.

So we strongly suggest that we should drop this fix.

Thanks.

juzhe.zh...@rivai.ai

*From:* Kito Cheng <mailto:kito.ch...@gmail.com>
*Date:* 2023-04-19 02:21
*To:* Richard Biener <mailto:richard.guent...@gmail.com>; Jeff Law
<mailto:jeffreya...@gmail.com>; Palmer Dabbelt
<mailto:pal...@dabbelt.com>
*CC:* Michael Collison <mailto:colli...@rivosinc.com>; gcc-patches
<mailto:gcc-patches@gcc.gnu.org>; 钟居哲 <mailto:juzhe.zh...@rivai.ai>
*Subject:* Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS
is a multiple of 2.
Few more background about RVV:
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
Current status (without that patch):
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
However `v` extension guarantees the minimal VLEN is 128 bits.
We are trying to introduce another type/mode mapping for this
configure:
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng 
wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
>

https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison
 wrote:
> > >
> > > While working on autovectorizing for the RISCV port I
encountered an issue
> > > where can_duplicate_and_interleave_p assumes that
GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes
(e.g. VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison 
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p
(vec_info *vinfo, unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type,
count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE
(vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > +  

Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-20 Thread Michael Collison

Hi Kito,

I will remove the unused UNSPECs, thank you for finding them.

I removed the include of "vector-iterators.md" because "riscv.md" 
already includes it and I was receiving multiple definition errors.


On 4/18/23 21:19, Kito Cheng wrote:

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70ad85b661b..7fae87968d7 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -34,6 +34,8 @@
UNSPEC_VMULHU
UNSPEC_VMULHSU

+  UNSPEC_VADD
+  UNSPEC_VSUB

Defined but unused?


UNSPEC_VADC
UNSPEC_VSBC
UNSPEC_VMADC
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
  ;; - Auto-vectorization (TBD)
  ;; - Combine optimization (TBD)

-(include "vector-iterators.md")
-

Why remove this?


[PATCH v5 01/10] RISC-V: autovec: Add new predicates and function prototypes

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
---
 gcc/config/riscv/predicates.md  | 13 +
 gcc/config/riscv/riscv-opts.h   | 29 +
 gcc/config/riscv/riscv-protos.h |  9 +
 3 files changed, 51 insertions(+)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc5943..b3f2d622c7b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })
 
 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
(match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..af77df11430 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,35 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RISC-V auto-vectorization preference.  */
+enum riscv_autovec_preference_enum {
+  NO_AUTOVEC,
+  RVV_SCALABLE,
+  RVV_FIXED_VLMAX
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..55056222e57 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -237,4 +237,13 @@ extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
 #endif
 
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
-- 
2.34.1



[PATCH v5 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer & len_load/store

2023-04-26 Thread Michael Collison
2023-04-25  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/vector-auto.md | 74 +
 gcc/config/riscv/vector.md  |  4 +-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index bc384d9aedf..7f8f3a6cb18 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -135,6 +135,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..83d2ab6957a
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,74 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+;; -
+;;  [INT] Vector binary patterns
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred ();
+  rtx tail_policy = get_tail_policy_no_pred ();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx (NONVLMAX);
+
+  emit_insn (gen_pred_ (operands[0], mask, merge, operands[1], 
operands[2],
+vl, tail_policy, mask_policy, 
vlmax_avl_p));
+
+  DONE;
+})
+
+
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_constants [
(INVALID_ATTRIBUTE255)
(X0_REGNUM  0)
@@ -351,6 +349,8 @@
   (symbol_ref "INTVAL (operands[4])")]
(const_int INVALID_ATTRIBUTE)))
 
+(include "vector-auto.md")
+
 ;; -
 ;;  Miscellaneous Operations
 ;; -
-- 
2.34.1



[PATCH v5 06/10] RISC-V:autovec: Add autovectorization tests for add & sub

2023-04-26 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v5 00/10] RISC-V: autovec: Add autovec support

2023-04-26 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 


Kevin Lee (2):
  This patch adds a guard for VNx1 vectors that are present in ports
like riscv.
  This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/predicates.md|  13 ++
 gcc/config/riscv/riscv-opts.h |  29 
 gcc/config/riscv/riscv-protos.h   |   9 ++
 gcc/config/riscv/riscv-v.cc   |  79 +++
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 130 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-auto.md   |  74 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   3 +
 gcc/tree-vect-data-refs.cc|   2 +
 gcc/tree-vect-slp.cc  |   7 +-
 33 files changed, 864 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 

[PATCH v5 08/10] RISC-V:autovec: Add autovectorization tests for binary integer

2023-04-26 Thread Michael Collison
2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  3 +++
 17 files changed, 395 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  vo

[PATCH v5 09/10] RISC-V: autovec: This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-26 Thread Michael Collison
From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
 gcc/tree-vect-data-refs.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
  poly_uint64 nelt = GET_MODE_NUNITS (mode);
 
  /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
  vec_perm_builder sel (nelt, 2, 3);
  sel.quick_grow (6);
  for (i = 0; i < 3; i++)
-- 
2.34.1



[PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 79 +
 1 file changed, 79 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 392f5d02e17..ecd98680d64 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -176,6 +178,46 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  if (!TARGET_VECTOR)
+return word_mode;
+
+  switch (mode)
+{
+case E_QImode:
+  return VNx8QImode;
+  break;
+case E_HImode:
+  return VNx4HImode;
+  break;
+case E_SImode:
+  return VNx2SImode;
+  break;
+case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+   return VNx1DImode;
+  break;
+case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+   return VNx2SFmode;
+  break;
+case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+   return VNx1DFmode;
+  break;
+default:
+  break;
+}
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -421,6 +463,43 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV mask mode.  */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)
+{
+  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
+ || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
+ || mode == VNx64BImode);
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE for RVV.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
+  return default_get_mask_mode (mode);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
This function is not only used by builtins, but also will be used by
auto-vectorization in the future.  */
-- 
2.34.1



[PATCH v5 10/10] RISC-V: autovec: This patch supports 8 bit auto-vectorization in riscv.

2023-04-26 Thread Michael Collison
From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* config/riscv/riscv.cc (riscv_autovectorize_vector_modes): Add
new vector mode
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 gcc/config/riscv/riscv.cc | 1 +
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c  | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c | 5 +++--
 21 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 77209b161f6..f293414acd1 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,7 @@ riscv_autovectorize_vector_modes (vector_modes *modes, 
bool)
   modes->safe_push (VNx8QImode);
   modes->safe_push (VNx4QImode);
   modes->safe_push (VNx2QImode);
+  modes->safe_push (VNx1QImode);
 }
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..76f5a3a3ff5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..3d1e10bf4e1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..a4c7abfb0ad 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
+++ b/gcc/te

[PATCH v5 02/10] RISC-V: autovec: Export policy functions to global scope

2023-04-26 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 01cea23d3e6..1ed9e4acc40 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2493,7 +2493,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2503,7 +2503,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1



[PATCH v5 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-26 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d73deaecce0..a64fe454e19 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, &half_nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1



[PATCH v5 04/10] RISC-V:autovec: Add target vectorization hooks

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
---
 gcc/config/riscv/riscv.cc | 129 ++
 1 file changed, 129 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dc47434fac4..77209b161f6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
 /* The number of bytes in a vector chunk.  */
 unsigned riscv_bytes_per_vector_chunk;
 
+/* Prefer vf for auto-vectorizer.  */
+unsigned riscv_vectorization_factor;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
@@ -6363,6 +6375,9 @@ riscv_option_override (void)
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
+
+  if (TARGET_VECTOR)
+riscv_vectorization_factor = RVV_LMUL1;
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -7057,6 +7072,105 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::riscv_vector_preferred_simd_mode (mode);
+
+  return word_mode;
+}
+
+/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV.  */
+static unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (!TARGET_VECTOR)
+return 0;
+
+  if (riscv_vectorization_factor == RVV_LMUL1)
+{
+  modes->safe_push (VNx16QImode);
+  modes->safe_push (VNx8QImode);
+  modes->saf

Re: [PATCH v4 05/10] RISC-V: autovec: Add autovectorization patterns for binary integer operations

2023-04-26 Thread Michael Collison

Hi Robin and Juzhe,

Just took a look and I like the approach.

On 4/26/23 19:43, juzhe.zhong wrote:

Yeah,Robin stuff is what I want and is making perfect sense for me.
 Replied Message 
FromRobin Dapp 
Date04/27/2023 02:15
To 	juzhe.zh...@rivai.ai 
,

collison ,
gcc-patches 
Cc  jeffreyalaw ,
Kito.cheng ,
kito.cheng ,
palmer ,
palmer 
Subject 	Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization 
patterns for binary integer operations


Hi Michael,

I have the diff below for the binops in my tree locally.
Maybe something like this works for you? Untested but compiles and
the expander helpers would need to be fortified obviously.

Regards
Robin

--

gcc/ChangeLog:

   * config/riscv/autovec.md (3): New binops expander.
   * config/riscv/riscv-protos.h (emit_nonvlmax_binop): Define.
   * config/riscv/riscv-v.cc (emit_pred_binop): New function.
   (emit_nonvlmax_binop): New function.
   * config/riscv/vector-iterators.md: New iterator.
---
gcc/config/riscv/autovec.md  | 12 
gcc/config/riscv/riscv-protos.h  |  1 +
gcc/config/riscv/riscv-v.cc  | 89 
gcc/config/riscv/vector-iterators.md | 20 +++
4 files changed, 97 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5d46ff57ab..c21d241f426 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -47,3 +47,15 @@ (define_expand "len_store_"
                 operands[1], operands[2], mode);
  DONE;
})
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+    (any_int_binop:VI (match_operand:VI 1 "register_operand")
+              (match_operand:VI 2 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_binop (code_for_pred (, 
mode),

+                 operands[0], operands[1], operands[2],
+                 gen_reg_rtx (Pmode), mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index f6ea6846736..5cca543c773 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -163,6 +163,7 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
int get_ta (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5e69427ac54..98ebc052340 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -52,7 +52,7 @@ namespace riscv_vector {
template  class insn_expander
{
public:
-  insn_expander () : m_opno (0) {}
+  insn_expander () : m_opno (0), has_dest(false) {}
  void add_output_operand (rtx x, machine_mode mode)
  {
create_output_operand (&m_ops[m_opno++], x, mode);
@@ -83,6 +83,44 @@ public:
add_input_operand (gen_int_mode (type, Pmode), Pmode);
  }

+  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  {
+    dest_mode = GET_MODE (dest);
+    has_dest = true;
+
+    add_output_operand (dest, dest_mode);
+
+    if (mask)
+  add_input_operand (mask, GET_MODE (mask));
+    else
+  add_all_one_mask_operand (mask_mode);
+
+    add_vundef_operand (dest_mode);
+  }
+
+  void set_len_and_policy (rtx len, bool vlmax_p)
+    {
+  gcc_assert (has_dest);
+  gcc_assert (len || vlmax_p);
+
+  if (len)
+    add_input_operand (len, Pmode);
+  else
+    {
+      rtx vlmax = gen_reg_rtx (Pmode);
+      emit_vlmax_vsetvl (dest_mode, vlmax);
+      add_input_operand (vlmax, Pmode);
+    }
+
+  if (GET_MODE_CLASS (dest_mode) != MODE_VECTOR_BOOL)
+    add_policy_operand (get_prefer_tail_policy (), 
get_prefer_mask_policy ());

+
+  if (vlmax_p)
+    add_avl_type_operand (avl_type::VLMAX);
+  else
+    add_avl_type_operand (avl_type::NONVLMAX);
+    }
+
  void expand (enum insn_code icode, bool temporary_volatile_p = false)
  {
if (temporary_volatile_p)
@@ -96,6 +134,8 @@ public:

private:
  int m_opno;
+  bool has_dest;
+  machine_mode dest_mode;
  expand_operand m_ops[MAX_OPERANDS];
};

@@ -183,37 +223,29 @@ emit_pred_op (unsigned icode, rtx mask, rtx 
dest, rtx src, rtx len,

     machine_mode mask_mode, bool vlmax_p)
{
  insn_expander<8> e;
-  machine_mode mode = GET_MODE (dest);
+  e.set_dest_and_mask (mask, dest, mask_mode);

-  e.add_output_operand (dest, mode);
-
-  if (mask)
-    e.add_input_operand (mask, GET_MODE (mask));
-  else
-    e.add_all_one_mask_operand (mask_mode);
+  e.add_inpu

Re: [PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-05-03 Thread Michael Collison

HI Kito,

I see there have been many comments on the 
"riscv_vector_preferred_simd_mode" hook, is there an updated version?


On 5/3/23 06:53, Kito Cheng wrote:

@@ -176,6 +178,46 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
return ratio;
  }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)

JuZhe's patch[1] has been implemented and his version handles
types/modes in the right way IMO,
so I would like to take his version for this hook.

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230419164214.1032017-3-juzhe.zh...@rivai.ai/


[PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v6:
- Incorporated upstream comments, added target hook for 
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Kevin Lee (1):
  RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/riscv-opts.h |  10 ++
 gcc/config/riscv/riscv-protos.h   |   9 ++
 gcc/config/riscv/riscv-v.cc   |  91 
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 130 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-auto.md   |  74 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   4 +
 gcc/tree-vect-slp.cc  |   7 +-
 31 files changed, 843 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 cr

[PATCH v6 1/9] RISC-V: autovec: Add new predicates and function prototypes

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv-opts.h (riscv_vector_lmul_enum): New enum.
---
 gcc/config/riscv/riscv-opts.h   | 10 ++
 gcc/config/riscv/riscv-protos.h |  9 +
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 4207db240ea..00c4ab222ae 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,7 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+
 /* RISC-V auto-vectorization preference.  */
 enum riscv_autovec_preference_enum {
   NO_AUTOVEC,
@@ -82,6 +83,15 @@ enum riscv_autovec_lmul_enum {
   RVV_M8 = 8
 };
 
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 33eb574aadc..fb39b856735 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -243,4 +243,13 @@ th_mempair_output_move (rtx[4], bool, machine_mode, 
RTX_CODE);
 #endif
 
 extern bool riscv_use_divmod_expander (void);
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
-- 
2.34.1



[PATCH v6 5/9] RISC-V:autovec: Add autovectorization patterns for binary integer & len_load/store

2023-05-05 Thread Michael Collison
2023-04-25  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/vector-auto.md | 74 +
 gcc/config/riscv/vector.md  |  4 +-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c508ee3ad89..e9b49eda617 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -140,6 +140,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..83d2ab6957a
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,74 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+;; -
+;;  [INT] Vector binary patterns
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred ();
+  rtx tail_policy = get_tail_policy_no_pred ();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx (NONVLMAX);
+
+  emit_insn (gen_pred_ (operands[0], mask, merge, operands[1], 
operands[2],
+vl, tail_policy, mask_policy, 
vlmax_avl_p));
+
+  DONE;
+})
+
+
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 1642822d098..5c9252c281b 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_constants [
(INVALID_ATTRIBUTE255)
(X0_REGNUM  0)
@@ -368,6 +366,8 @@
   (symbol_ref "INTVAL (operands[4])")]
(const_int INVALID_ATTRIBUTE)))
 
+(include "vector-auto.md")
+
 ;; -
 ;;  Miscellaneous Operations
 ;; -
-- 
2.34.1



[PATCH v6 2/9] RISC-V: autovec: Export policy functions to global scope

2023-05-05 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 434bd8e157b..f0ebc095fa7 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2496,7 +2496,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2506,7 +2506,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1



[PATCH v6 3/9] RISC-V:autovec: Add auto-vectorization support functions

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 91 +
 1 file changed, 91 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 99c414cc910..7faffb55046 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -176,6 +178,56 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* SCALABLE means that the vector-length is agnostic (run-time invariant and
+   compile-time unknown). FIXED meands that the vector-length is specific
+   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
+   auto-vectorization using VLMAX vsetvl configuration.  */
+static bool
+autovec_use_vlmax_p (void)
+{
+  return riscv_autovec_preference == RVV_SCALABLE
+|| riscv_autovec_preference == RVV_FIXED_VLMAX;
+}
+
+/* Return the vectorization machine mode for RVV according to LMUL.  */
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 &&
+ riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE
+ when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*.
+ in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
+ VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
+ enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to
+ use them. Currently, we can support auto-vectorization in
+ -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
+ -march=rv32_zve32x_zvl64b are disabled.
+ */
+  if (autovec_use_vlmax_p ())
+{
+  /* If TARGET_MIN_VLEN < 128, we don't allow LMUL < 2
+auto-vectorization since Loop Vectorizer may use VNx1SImode or
+VNx1DImode to vectorize which will create ICE in the
+'can_duplicate_and_interleave_p' of tree-vect-slp.cc.  */
+  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
+   return word_mode;
+  /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
+riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
+get the auto-vectorization mode.  */
+  poly_uint64 nunits;
+  poly_uint64 vector_size
+   = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 scalar_size = GET_MODE_SIZE (mode);
+  gcc_assert (multiple_p (vector_size, scalar_size, &nunits));
+  machine_mode rvv_mode;
+  if (get_vector_mode (mode, nunits).exists (&rvv_mode))
+   return rvv_mode;
+}
+  /* TODO: We will support minimum length VLS auto-vectorization in the future.
+   */
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -430,6 +482,45 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+/* Return the mask policy for no predication.  */
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return the tail policy for no predication.  */
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_tail_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV mask mode.  */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)
+{
+  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
+ || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
+ || mode == VNx64BImode);
+}
+
+/* Return the appropriate mask mode for MODE.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
+  return default_get_mask_mode (mode);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
This function is not only used by builtins, but also will be used by
auto-vectorization in the future.  */
-- 
2.34.1



[PATCH v6 4/9] RISC-V:autovec: Add target vectorization hooks

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.
---
 gcc/config/riscv/riscv.cc | 130 ++
 1 file changed, 130 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1e328f6a801..1425f50d80a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7138,6 +7147,112 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::riscv_vector_preferred_simd_mode (mode);
+
+  return word_mode;
+}
+
+bool
+riscv_support_vector_misalignment (machine_mode mode,
+  const_tree type ATTRIBUTE_UNUSED,
+  int misalignment,
+  bool is_packed ATTRIBUTE_UNUSED)
+{
+  if (TARGET_VECTOR)
+{
+  if (STRICT_ALIGNMENT)
+   {
+ /* Return if movmisalign pattern is not supported for this mode.  */
+ if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
+   return false;
+
+ /* Misalignment factor is unknown at compile time.  */
+ if (misalignment == -1)
+   return false;
+   }
+  return true;
+}
+
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+ is_packed);
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+static opt_machine_mode
+riscv_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mo

[PATCH v6 8/9] RISC-V:autovec: Add autovectorization tests for binary integer

2023-05-05 Thread Michael Collison
2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +++
 17 files changed, 396 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  vo

[PATCH v6 7/9] RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-05-05 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b299e209b5b..3b7a21724ec 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, &half_nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1



[PATCH v6 9/9] RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

2023-05-05 Thread Michael Collison
From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-or-rv32.c|  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c   |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c  |  7 ---
 20 files changed, 92 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..d2765e67d0d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..c43f6d3e8cb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..703f4843c2b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
+++ b/gcc/testsu

[PATCH v6 6/9] RISC-V:autovec: Add autovectorization tests for add & sub

2023-05-05 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

  1   2   3   >