https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116760
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- So on x86 the cost model difference 14.2 vs trunk is -(*co_271(D))[_95] 1 times vec_construct costs 792 in body +(*co_271(D))[_95] 1 times vec_construct costs 88 in body and similar for -_103 1 times vec_to_scalar costs 72 in body +_103 1 times vec_to_scalar costs 8 in body r15-5565-gdbc38dd9e96a99 doesn't seem to fix this yet. The reason is that the cost hook for non-SLP considers VMAT_ELEMENTWISE with variable stride separately but not so VMAT_STRIDED_SLP with SLP. With SLP we don't get all the info we like (how we use lvectype/ltype vs. vectype). For GCC 15 I'm going to emulate GCC 14 behavior here by special-casing single-lane SLP. For the future we want to let the backend know how many and what kind of loads we do for VMAT_STRIDED_SLP, that's something the cost hook doesn't get us yet.