https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743

--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kito Cheng <k...@gcc.gnu.org>:

https://gcc.gnu.org/g:c919d059fcb67747d3c0bd539c7044e874b03fb7

commit r14-789-gc919d059fcb67747d3c0bd539c7044e874b03fb7
Author: Kito Cheng <kito.ch...@sifive.com>
Date:   Fri May 12 10:26:06 2023 +0800

    RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743]

    Rebase to trunk and send V3 patch for:
    https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617821.html

    This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743.

    This issue happens is because we are currently very conservative in
optimization of user vsetvli.

    Consider this following case:

    bb 1:
      vsetvli a5,a4... (demand AVL = a4).
    bb 2:
      RVV insn use a5 (demand AVL = a5).

    LCM will hoist vsetvl of bb 2 into bb 1.
    We don't do AVL propagation for this situation since it's complicated that
    we should analyze the code sequence between vsetvli in bb 1 and RVV insn in
bb 2.
    They are not necessary the consecutive blocks.

    This patch is doing the optimizations after LCM, we will check and
eliminate the vsetvli
    in LCM inserted edge if such vsetvli is redundant. Such approach is much
simplier and safe.

    code:
    void
    foo2 (int32_t *a, int32_t *b, int n)
    {
      if (n <= 0)
          return;
      int i = n;
      size_t vl = __riscv_vsetvl_e32m1 (i);

      for (; i >= 0; i--)
      {
        vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
        __riscv_vse32_v_i32m1 (b, v, vl);

        if (i >= vl)
          continue;

        if (i == 0)
          return;

        vl = __riscv_vsetvl_e32m1 (i);
      }
    }

    Before this patch:
    foo2:
    .LFB2:
            .cfi_startproc
            ble     a2,zero,.L1
            mv      a4,a2
            li      a3,-1
            vsetvli a5,a2,e32,m1,ta,mu
            vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
    .L5:
            vle32.v v1,0(a0)
            vse32.v v1,0(a1)
            bgeu    a4,a5,.L3
    .L10:
            beq     a2,zero,.L1
            vsetvli a5,a4,e32,m1,ta,mu
            addi    a4,a4,-1
            vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
            vle32.v v1,0(a0)
            vse32.v v1,0(a1)
            addiw   a2,a2,-1
            bltu    a4,a5,.L10
    .L3:
            addiw   a2,a2,-1
            addi    a4,a4,-1
            bne     a2,a3,.L5
    .L1:
            ret

    After this patch:
    f:
            ble     a2,zero,.L1
            mv      a4,a2
            li      a3,-1
            vsetvli a5,a2,e32,m1,ta,ma
    .L5:
            vle32.v v1,0(a0)
            vse32.v v1,0(a1)
            bgeu    a4,a5,.L3
    .L10:
            beq     a2,zero,.L1
            vsetvli a5,a4,e32,m1,ta,ma
            addi    a4,a4,-1
            vle32.v v1,0(a0)
            vse32.v v1,0(a1)
            addiw   a2,a2,-1
            bltu    a4,a5,.L10
    .L3:
            addiw   a2,a2,-1
            addi    a4,a4,-1
            bne     a2,a3,.L5
    .L1:
            ret

            PR target/109743

    gcc/ChangeLog:

            * config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vsetvl_at_end):
New.
            (local_avl_compatible_p): New.
            (pass_vsetvl::local_eliminate_vsetvl_insn): Enhance local
optimizations
            for LCM, rewrite as a backward algorithm.
            (pass_vsetvl::cleanup_insns): Use new local_eliminate_vsetvl_insn
            interface, handle a BB at once.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
            * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
            * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
            * gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.

    Co-authored-by: Juzhe-Zhong <juzhe.zh...@rivai.ai>

Reply via email to