https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
Bug ID: 118945 Summary: RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: vineetg at gcc dot gnu.org Reporter: vineetg at gcc dot gnu.org CC: jeffreyalaw at gmail dot com, juzhe.zhong at rivai dot ai, rdapp at gcc dot gnu.org Target Milestone: --- VSETVL Pass currently merged/fuses VSETVL insn even when they have different tail policy. For OoO cores at least, TU are expensive than TA due to implicit dependency on prior value of dst V reg. Consider following simple test: -O3 -march=rv64gcv_zvl256b_zba -mabi=lp64d -mrvv-max-lmul=m2 -mrvv-vector-bits=scalable int test(int* in, int n) { int accum = 0; for (int i = 0; i < n; i++) accum += in[i]; return accum; } test: ble a1,zero,.L4 vsetvli a5,zero,e32,m2,ta,ma vmv.v.i v2,0 .L3: vsetvli a5,a1,e32,m2,tu,ma #70 vle32.v v4,0(a0) #17 <-- doesn't need to be TU sub a1,a1,a5 sh2add a0,a5,a0 vadd.vv v2,v2,v4 #18 ... Using --param=vsetvl-strategy=simple we can see that the Vector load can just be TA. test: ble a1,zero,.L4 vsetvli a5,zero,e32,m2,ta,ma vmv.v.i v2,0 .L3: vsetvli a5,a1,e8,mf2,ta,ma #70 vsetvli zero,a5,e32,m2,ta,ma #72 <-- TA for Vector load vle32.v v4,0(a0) #17 sub a1,a1,a5 vsetvli zero,a5,e32,m2,tu,ma #73 sh2add a0,a5,a0 vadd.vv v2,v2,v4 #18 So the solution is to have a seprate VSETV?LI for such cases.