https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122846
Bug ID: 122846
Summary: risc-v rvv widening operations would perform better
with a wider LMUL
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: bergner at oss dot tenstorrent.com
Target Milestone: ---
The following simplified test case taken from a benchmark shows an example
where using a larger LMUL value would improve performance. The problem here is
that for any widening operations, the end result gets the max LMUL value (here
LMUL=default=1) and the operations that feed into that result must use smaller
LMUL values. In this case, we use mf4 for our vector loads! It would be
better to use LMUL=1 for the loads and then use larger LMUL values for the
widening operations.
linux~$ cat test.c
int
foo (const char *x, const char *y)
{
int sum = 0;
for (int i = 0; i < 1024; i++)
sum += x[i] * y[i];
return sum;
}
linux~$ gcc -S -O2 -march=rv64imv test.c
linux~ cat test.s
[snip]
foo:
.LFB0:
.cfi_startproc
vsetivli zero,4,e32,m1,ta,ma
vmv.v.i v1,0
addi a5,a0,1024
.L2:
vsetvli zero,zero,e8,mf4,ta,ma
vle8.v v3,0(a1)
vle8.v v4,0(a0)
addi a0,a0,4
addi a1,a1,4
vwmul.vv v2,v4,v3
vmv1r.v v3,v1
vsetvli zero,zero,e16,mf2,ta,ma
vwadd.wv v1,v3,v2
bne a5,a0,.L2
vsetvli zero,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret