Passed both the X86 bootstrap and regression test.

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Friday, April 28, 2023 2:45 PM
To: Kito Cheng <kito.ch...@sifive.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
<yanzhang.w...@intel.com>
Subject: RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given 
we have one similar case VMORN in previous.

Pan

-----Original Message-----
From: Kito Cheng <kito.ch...@sifive.com>
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 <pan2...@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
<yanzhang.w...@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but 
don't know why it's not evaluated

        (eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
           (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM <pan2...@intel.com> wrote:
>
> From: Pan Li <pan2...@intel.com>
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24                    <- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
>         * config/riscv/vector.md: Add new define split to perform
>           the simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li <pan2...@intel.com>
> Co-authored-by: kito-cheng <kito.ch...@sifive.com>
> ---
>  gcc/config/riscv/vector.md                    |  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291
> ++++++++++++++++++
>  2 files changed, 323 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
>    "vle<sew>ff.v\t%0,%3%p1"
>    [(set_attr "type" "vldff")
>     (set_attr "mode" "<MODE>")])
> +
> +;;
> +---------------------------------------------------------------------
> +-------- ;; ---- Integer Compare Instructions Simplification ;;
> +---------------------------------------------------------------------
> +--------
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;;
> +---------------------------------------------------------------------
> +--------
> +(define_split
> +  [(set (match_operand:VB      0 "register_operand")
> +       (if_then_else:VB
> +         (unspec:VB
> +           [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +            (match_operand    4 "vector_length_operand")
> +            (match_operand    5 "const_int_operand")
> +            (match_operand    6 "const_int_operand")
> +            (reg:SI VL_REGNUM)
> +            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +         (match_operand:VB    3 "vector_move_operand")
> +         (match_operand:VB    2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX 
> (<MODE>mode),
> +                            RVV_VUNDEF (<MODE>mode), operands[3],
> +                            operands[4], operands[5]));
> +    DONE;
> +  }
> +)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 00000000000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +/* { dg-final { scan-assembler-times 
> +{vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
> --
> 2.34.1
>

Reply via email to