https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120639
--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> --- > We could use scatter stores, building the index vector somehow cleverly with > i_width contiguous indexes interspaced by i_dst_stride. In fact this vector > could be built as inductions when building the i_height number of vectors > to store and concatenated the same way? Interesting, so you mean having a strided index vector [0, 1, ..., vector_size, vector_size + 1, ..., i_width, 0 + stride, 1 + stride, ...]? What about something like i_width = 12 and a 64-bit strided element (that doesn't cover all of i_width but would require another 32-bit strided element)? Wouldn't we still need a mechanism to "fill" up to i_width?