https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- I now see _99 = .SELECT_VL (ivtmp_97, POLY_INT_CST [4, 4]); ivtmp_44 = _99 * 32; vect_array.16 = .MASK_LEN_LOAD_LANES (vectp_in.14_43, 32B, { -1, ... }, _99, 0); vect__2.17_71 = vect_array.16[0]; vect__2.18_72 = vect_array.16[1]; vect__2.19_73 = vect_array.16[2]; vect__2.20_74 = vect_array.16[3]; vect__2.21_75 = vect_array.16[4]; vect__2.22_76 = vect_array.16[5]; vect__2.23_77 = vect_array.16[6]; vect__2.24_78 = vect_array.16[7]; vect_array.27[0] = vect__2.17_71; vect_array.27[1] = vect__2.18_72; vect_array.27[2] = vect__2.19_73; vect_array.27[3] = vect__2.20_74; vect_array.27[4] = vect__2.21_75; vect_array.27[5] = vect__2.22_76; vect_array.27[6] = vect__2.23_77; vect_array.27[7] = vect__2.24_78; .MASK_LEN_STORE_LANES (vectp_out.25_80, 32B, { -1, ... }, _99, 0, vect_array.27); ivtmp_93 = _99 * 4; .MASK_LEN_STORE (vectp_ia.28_94, 32B, { -1, ... }, _99, 0, vect__2.19_73); which I think is perfect and what I expected. It doesn't show the previous issue anymore. There seems to be some confusion with RA though: .L8: vsetvli a5,a3,e8,mf4,ta,ma vlseg8e32.v v8,(a2) slli a0,a5,5 slli a7,a5,2 sub a3,a3,a5 add a2,a2,a0 vmv1r.v v16,v8 vmv1r.v v17,v9 vmv1r.v v18,v10 vmv1r.v v19,v11 vmv1r.v v20,v12 vmv1r.v v21,v13 vmv1r.v v22,v14 vmv1r.v v23,v15 vsseg8e32.v v16,(a1) add a1,a1,a0 vse32.v v10,0(a4) add a4,a4,a7 bne a3,zero,.L8 why do we copy the register group v8-15 to v16-23? Is this because of the v10 use? Then a single move of v10 to v18 should have sufficed?