https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945

            Bug ID: 118945
           Summary: RISC-V: VSETL pass: Don't promote Vectors ops from
                    Tail agnostic to Tail Undisturbed
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: vineetg at gcc dot gnu.org
          Reporter: vineetg at gcc dot gnu.org
                CC: jeffreyalaw at gmail dot com, juzhe.zhong at rivai dot ai,
                    rdapp at gcc dot gnu.org
  Target Milestone: ---

VSETVL Pass currently merged/fuses VSETVL insn even when they have different
tail policy. For OoO cores at least, TU are expensive than TA due to implicit
dependency on prior value of dst V reg. 

Consider following simple test:

-O3 -march=rv64gcv_zvl256b_zba -mabi=lp64d -mrvv-max-lmul=m2
-mrvv-vector-bits=scalable

int test(int* in, int n)
{
  int accum = 0;
  for (int i = 0; i < n; i++)
        accum += in[i];

  return accum;
}

test:
        ble     a1,zero,.L4
        vsetvli a5,zero,e32,m2,ta,ma
        vmv.v.i v2,0
.L3:
        vsetvli a5,a1,e32,m2,tu,ma #70
        vle32.v v4,0(a0)           #17 <-- doesn't need to be TU
        sub     a1,a1,a5
        sh2add  a0,a5,a0
        vadd.vv v2,v2,v4           #18
...


Using --param=vsetvl-strategy=simple we can see that the Vector load can just
be TA.

test:
        ble     a1,zero,.L4
        vsetvli a5,zero,e32,m2,ta,ma
        vmv.v.i v2,0
.L3:
        vsetvli a5,a1,e8,mf2,ta,ma     #70
        vsetvli zero,a5,e32,m2,ta,ma   #72  <-- TA for Vector load
        vle32.v v4,0(a0)               #17
        sub     a1,a1,a5
        vsetvli zero,a5,e32,m2,tu,ma   #73
        sh2add  a0,a5,a0
        vadd.vv v2,v2,v4               #18

So the solution is to have a seprate VSETV?LI for such cases.

Reply via email to