https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115895
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2024-07-12 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- <bb 3> [local count: 505088130]: # vectp_sp.25_67 = PHI <vectp_sp.25_68(5), sp_10(D)(2)> # vectp_mprr_2.31_75 = PHI <vectp_mprr_2.31_76(5), &mprr_2(2)> # ivtmp_79 = PHI <ivtmp_80(5), 15(2)> # loop_mask_69 = PHI <_82(5), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }(2)> # loop_mask_77 = PHI <_84(5), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }(2)> vect__5.27_70 = .MASK_LOAD (vectp_sp.25_67, 32B, loop_mask_69); vectp_sp.25_71 = vectp_sp.25_67 + 18446744073709551568; vect__5.28_72 = VEC_PERM_EXPR <vect__5.27_70, vect__5.27_70, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>; vect__5.29_73 = VEC_PERM_EXPR <vect__5.27_70, vect__5.27_70, { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }>; vect__6.30_74 = VEC_PACK_TRUNC_EXPR <vect__5.28_72, vect__5.29_73>; .MASK_STORE (vectp_mprr_2.31_75, 512B, loop_mask_77, vect__6.30_74); So indeed here the .MASK_LOAD overreads. We are still peeling for gaps, but not enough. In theory we should be able to adjust the mask - that might be cheaper than shortening the access, which is what we do without masking since masking off the gap would also allow to avoid peeling for gaps(?) <bb 3> [local count: 441952113]: # vectp_sp.25_70 = PHI <vectp_sp.25_71(5), sp_10(D)(2)> # vectp_mprr_2.31_79 = PHI <vectp_mprr_2.31_80(5), &mprr_2(2)> # ivtmp_82 = PHI <ivtmp_83(5), 0(2)> _72 = MEM <uint128_t> [(int *)vectp_sp.25_70]; _73 = {_72, 0, 0, 0}; vect__5.27_74 = VIEW_CONVERT_EXPR<vector(16) int>(_73); I'll leave this open for a bit - the easiest fix is to disable partial vectors when we do the gap optimization and not perform it for variable-length vectors (I'm not sure all the may/must checks are on the safe side). I think we want to use smaller vectors for the load in a more concious way, possibly the SLP work of lowering vectors will help here.