https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121910
Bug ID: 121910
Summary: RISC-V: dynamic lmul choosing wrong vector mode
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: chenzhongyao.hit at gmail dot com
CC: juzhe.zhong at rivai dot ai, rdapp at gcc dot gnu.org
Target Milestone: ---
Target: riscv
https://godbolt.org/z/T79ozo5jc
look at the code from x264 (SPEC2017), vector register spilling to the stack.
-march=rv64gcv_zvl128b -O3 -mrvv-max-lmul=dynamic -mrvv-vector-bits=zvl
-fdump-tree-vect-details
#include <stdint.h>
/* full chroma mc (ie until 1/8 pixel)*/
void mc_chroma(uint8_t* dst, int i_dst_stride, uint8_t* src, int i_src_stride,
int mvx, int mvy, int i_width, int i_height) {
uint8_t* srcp;
int d8x = mvx & 0x07;
int d8y = mvy & 0x07;
int cA = (8 - d8x) * (8 - d8y);
int cB = d8x * (8 - d8y);
int cC = (8 - d8x) * d8y;
int cD = d8x * d8y;
src += (mvy >> 3) * i_src_stride + (mvx >> 3);
srcp = &src[i_src_stride];
for (int y = 0; y < i_height; y++) {
for (int x = 0; x < i_width; x++)
dst[x] = (cA * src[x] + cB * src[x + 1] + cC * srcp[x] +
cD * srcp[x + 1] + 32) >>
6;
dst += i_dst_stride;
src = srcp;
srcp += i_src_stride;
}
}
According to the vect(tree) dump log:
/app/example.c:19:27: note: Maximum lmul = 4, At most 20 number of live V_REG
......
/app/example.c:19:27: note: ***** Analysis succeeded with vector mode RVVM4QI
......
/app/example.c:19:27: note: Maximum lmul = 8, At most 40 number of live V_REG
......
/app/example.c:19:27: note: ***** Analysis succeeded with vector mode RVVM2QI
......
/app/example.c:19:27: note: ***** Choosing vector mode RVVM4QI
If register spilling already occurs with the RVVM2QI mode, then RVVM4QI—which
requires even more registers—should be more likely to spill. Therefore,
choosing RVVM4QI as the final vector mode may not be optimal in this scenario.
If I use -mrvv-max-lmul=m2 to limit the maximum lmul, the spilling issue does
not occur. However, for this x264 case, restricting max-lmul is not an ideal
solution, since other parts of the code may benefit from using a larger lmul.
please help address this bug when -mrvv-max-lmul=dynamic is used. I am
currently trying to fix it myself but haven’t found a good solution yet.