https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #7 from Maciej W. Rozycki <macro at orcam dot me.uk> ---
Thank you for all your explanations.  I think I'm still missing something
here, so I'll write it differently (and let's ignore the tail-agnostic vs
tail-undisturbed choice for the purpose of this consideration).

Let me paste the whole assembly code produced here (sans decorations):

        beq     a5,zero,.L2
        vsetvli zero,a6,e32,m1,tu,ma
.L3:
        beq     a4,zero,.L7
        li      a5,0
.L5:
        vle32.v v1,0(a0)
        vle32.v v1,0(a1)
        vle32.v v1,0(a2)
        vse32.v v1,0(a3)
        addi    a5,a5,1
        bne     a4,a5,.L5
.L7:
        ret
.L2:
        vsetvli zero,a6,e32,m1,tu,ma
        j       .L3

This seems to me to correspond to this source code:

  if (cond)
    __riscv_vsetvl_e32m1(avl);
  else
    __riscv_vsetvl_e16mf2(avl);
  for (size_t i = 0; i < n; i += 1) {
    vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl);
    vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl);
    vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl);
    __riscv_vse32_v_i32m1(out, c, avl);
  }

And in that case I'd expect the conditional to be optimised away, as its
result is ignored (along with the intrinsics) and does not affect actual
code executed except for the different execution path, i.e.:

        beq     a4,zero,.L7
        vsetvli zero,a6,e32,m1,tu,ma
        li      a5,0
.L5:
        vle32.v v1,0(a0)
        vle32.v v1,0(a1)
        vle32.v v1,0(a2)
        vse32.v v1,0(a3)
        addi    a5,a5,1
        bne     a4,a5,.L5
.L7:
        ret

However actual source code is as follows:

  size_t vl;
  if (cond)
    vl = __riscv_vsetvl_e32m1(avl);
  else
    vl = __riscv_vsetvl_e16mf2(avl);
  for (size_t i = 0; i < n; i += 1) {
    vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
    vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
    vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
    __riscv_vse32_v_i32m1(out, c, vl);
  }

Based on what you write I'd expect code like this instead:

        beq     a5,zero,.L2
        vsetvli a6,a6,e16,mf2,ta,ma
.L3:
        beq     a4,zero,.L7
        vsetvli zero,a6,e32,m1,tu,ma
        li      a5,0
.L5:
        vle32.v v1,0(a0)
        vle32.v v1,0(a1)
        vle32.v v1,0(a2)
        vse32.v v1,0(a3)
        addi    a5,a5,1
        bne     a4,a5,.L5
.L7:
        ret
.L2:
        vsetvli a6,a6,e32,m1,ta,ma
        j       .L3

which is roughly what you say LLVM produces.

Why is the `vl' value determined by hardware from `avl' by an explicit
request (!) of the programmer who inserted the vsetvl intrinsics ignored?
Is the compiler able to prove the use of `avl' in place of `vl' does not
affect the operation of the VLE32.V and VSE32.V instructions in any way?
What is the purpose of these intrinsics if they can be freely ignored?

Please forgive me if my questions seem to you obvious to answer or
irrelevant, I'm still rather new to this RVV stuff.

Reply via email to