https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113162
Bug ID: 113162 Summary: RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- This following case: int f[12][100]; void foo (int v) { for (int r = 0; r < 100; r += 4) { int i = r + 1; f[0][r] = f[1][r] * (f[2][r] + v) - f[1][i] * (f[2][i]); f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r] + v); f[0][r+2] = f[1][r+2] * (f[2][r+2] + v) - f[1][i+2] * (f[2][i+2]); f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2] + v); } } using dynamic LMUL, GCC chooses LMUL = 2 to generate the vectorized codes: lui a5,%hi(f) addi a5,a5,%lo(f) addi a3,a5,800 addi a4,a5,400 vsetivli zero,8,e32,m2,ta,ma addi sp,sp,-32 vlseg4e32.v v8,(a4) vlseg4e32.v v16,(a3) vmv.v.x v2,a0 vadd.vv v6,v2,v16 vmul.vv v24,v6,v10 vmul.vv v6,v6,v8 vs2r.v v24,0(sp) addi a3,a5,928 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 addi a4,a5,528 vl2re32.v v6,0(sp) vmacc.vv v6,v18,v8 vadd.vv v4,v2,v20 vmv2r.v v26,v6 vmul.vv v0,v4,v12 vmul.vv v4,v4,v14 vmv.v.v v28,v22 vnmsub.vv v28,v14,v0 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 vsseg4e32.v v24,(a5) vlseg4e32.v v8,(a4) vlseg4e32.v v16,(a3) vadd.vv v6,v2,v16 vmul.vv v24,v6,v10 vmul.vv v6,v6,v8 vs2r.v v24,0(sp) addi a6,a5,128 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 addi a0,a5,1056 vl2re32.v v6,0(sp) addi a1,a5,656 vmacc.vv v6,v18,v8 vadd.vv v4,v2,v20 vmv2r.v v26,v6 vmul.vv v0,v4,v12 vmul.vv v4,v4,v14 vmv.v.v v28,v22 vnmsub.vv v28,v14,v0 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 vsseg4e32.v v24,(a6) vlseg4e32.v v8,(a1) vlseg4e32.v v16,(a0) vadd.vv v6,v2,v16 vmul.vv v24,v6,v10 vmul.vv v6,v6,v8 vs2r.v v24,0(sp) vadd.vv v4,v2,v20 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 vmul.vv v0,v4,v12 vl2re32.v v6,0(sp) vmv.v.v v28,v22 vnmsub.vv v28,v14,v0 vmacc.vv v6,v18,v8 vmul.vv v4,v4,v14 vmv2r.v v26,v6 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 addi a2,a5,256 addi a3,a5,1184 addi a4,a5,784 addi a5,a5,384 vsseg4e32.v v24,(a2) vsetivli zero,1,e32,m2,ta,ma vlseg4e32.v v8,(a4) vlseg4e32.v v16,(a3) vadd.vv v4,v2,v16 vadd.vv v2,v2,v20 vmul.vv v0,v4,v10 vmul.vv v6,v2,v12 vmul.vv v4,v4,v8 vmul.vv v2,v2,v14 vmv.v.v v24,v10 vnmsub.vv v24,v18,v4 vmv.v.v v26,v0 vmacc.vv v26,v8,v18 vmv.v.v v28,v14 vnmsub.vv v28,v22,v6 vmv.v.v v30,v2 vmacc.vv v30,v12,v22 vsseg4e32.v v24,(a5) addi sp,sp,32 jr ra There are redundant spillings (vs2r.v and vl2re32.v) which causes worse performance on real hardware comparing with default LMUL (LMUL = 1). After investigations, I find it is not dynamic LMUL cost model issue. Actually, dynamic LMUL cost model works well and chooses the perfect LMUL = 2 for this case. The spillings are redundant because we lack subreg liveness tracking in IRA/LRA, so RA consider this situation has many alloco conflict. Confirm with this following series lehua's subreg patch: https://patchwork.ozlabs.org/project/gcc/list/?series=381823 fix this issue perfectly: vsetivli zero,8,e32,m2,ta,ma vmv.v.x v2,a0 lui a5,%hi(f) addi a5,a5,%lo(f) addi a4,a5,400 vlseg4e32.v v8,(a4) addi a4,a5,800 vlseg4e32.v v16,(a4) vadd.vv v4,v2,v16 vmul.vv v6,v4,v8 vmul.vv v16,v4,v10 vadd.vv v4,v2,v20 vmul.vv v20,v4,v12 vmul.vv v4,v4,v14 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 vmv.v.v v26,v16 vmacc.vv v26,v18,v8 vmv.v.v v28,v22 vnmsub.vv v28,v14,v20 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 vsseg4e32.v v24,(a5) addi a4,a5,528 vlseg4e32.v v8,(a4) addi a4,a5,928 vlseg4e32.v v16,(a4) vadd.vv v4,v2,v16 vmul.vv v6,v4,v8 vmul.vv v16,v4,v10 vadd.vv v4,v2,v20 vmul.vv v20,v4,v12 vmul.vv v4,v4,v14 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 vmv.v.v v26,v16 vmacc.vv v26,v18,v8 vmv.v.v v28,v22 vnmsub.vv v28,v14,v20 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 addi a4,a5,128 vsseg4e32.v v24,(a4) addi a4,a5,656 vlseg4e32.v v8,(a4) addi a4,a5,1056 vlseg4e32.v v16,(a4) vadd.vv v4,v2,v16 vmul.vv v6,v4,v8 vmul.vv v16,v4,v10 vadd.vv v4,v2,v20 vmul.vv v20,v4,v12 vmul.vv v4,v4,v14 vmv.v.v v24,v18 vnmsub.vv v24,v10,v6 vmv.v.v v26,v16 vmacc.vv v26,v18,v8 vmv.v.v v28,v22 vnmsub.vv v28,v14,v20 vmv.v.v v30,v4 vmacc.vv v30,v22,v12 addi a4,a5,256 vsseg4e32.v v24,(a4) addi a4,a5,784 vsetivli zero,1,e32,m2,ta,ma vlseg4e32.v v8,(a4) addi a4,a5,1184 vlseg4e32.v v16,(a4) vadd.vv v4,v2,v16 vmul.vv v6,v4,v8 vmul.vv v4,v4,v10 vadd.vv v2,v2,v20 vmul.vv v16,v2,v12 vmul.vv v2,v2,v14 vmv.v.v v24,v10 vnmsub.vv v24,v18,v6 vmv.v.v v26,v4 vmacc.vv v26,v8,v18 vmv.v.v v28,v14 vnmsub.vv v28,v22,v16 vmv.v.v v30,v2 vmacc.vv v30,v12,v22 addi a5,a5,384 vsseg4e32.v v24,(a5) ret