https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- So when the vectorizer has the need to use strided stores it would be cheapest to spill the vector and do N element loads and stores? I guess we can easily get bottle-necked by the load/store op bandwith here? That is, the vectorizer needs for (lane) dest[stride * lane] = vector[lane]; thus store a specific (constant) lane of a vector to memory, for each vector lane. (we could use a scatter store here but only AVX512 has that and builing the index vector could be tricky and not supported for all element types)