[Bug target/122846] New: risc-v rvv widening operations would perform better with a wider LMUL

bergner at oss dot tenstorrent.com via Gcc-bugs Tue, 25 Nov 2025 11:52:27 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122846


            Bug ID: 122846
           Summary: risc-v rvv widening operations would perform better
                    with a wider LMUL
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bergner at oss dot tenstorrent.com
  Target Milestone: ---

The following simplified test case taken from a benchmark shows an example
where using a larger LMUL value would improve performance.  The problem here is
that for any widening operations, the end result gets the max LMUL value (here
LMUL=default=1) and the operations that feed into that result must use smaller
LMUL values.  In this case, we use mf4 for our vector loads!  It would be
better to use LMUL=1 for the loads and then use larger LMUL values for the
widening operations.

linux~$ cat test.c
int
foo (const char *x, const char *y)
{
  int sum = 0;
  for (int i = 0; i < 1024; i++)
    sum += x[i] * y[i];
  return sum;
}

linux~$ gcc -S -O2 -march=rv64imv test.c
linux~ cat test.s
[snip]
foo:
.LFB0:
        .cfi_startproc
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,0
        addi    a5,a0,1024
.L2:
        vsetvli zero,zero,e8,mf4,ta,ma
        vle8.v  v3,0(a1)
        vle8.v  v4,0(a0)
        addi    a0,a0,4
        addi    a1,a1,4
        vwmul.vv        v2,v4,v3
        vmv1r.v v3,v1
        vsetvli zero,zero,e16,mf2,ta,ma
        vwadd.wv        v1,v3,v2
        bne     a5,a0,.L2
        vsetvli zero,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret

[Bug target/122846] New: risc-v rvv widening operations would perform better with a wider LMUL

Reply via email to