https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120639
--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> --- > Am 20.06.2025 um 16:17 schrieb rdapp at gcc dot gnu.org > <gcc-bugzi...@gcc.gnu.org>: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120639 > > --- Comment #5 from Robin Dapp <rdapp at gcc dot gnu.org> --- >> Well, consider the desired index vector being a real induction (just >> store it somewhere). If we can handle that, we should be able to >> handle the scatter. If not, we can't handle the scatter. > > Hmm, I think I misunderstood. You are arguing that we could build an > induction > variable based on the i_height loop, right? So roughly like > > vect_vec_iv = {0, 1, ..., i_width}; > for (... i_height) > { > ... > idxs = "[vect_vec_iv, vect_vec_iv + {i_dst_stride, ...}, ...]" > IFN_SCATTER_STORE (dst, idxs); > vect_vec_iv += {i_dst_stride, i_dst_stride, ...}; > }? > > I guess this can always be implemented as a scatter one way or another? > > But my objective is actually two-fold in that I want to use the full vector > size and also conflate as many elements as possible into a single one (i.e. 8 > chars into one uint64_t). The second part helps gather/scatter as well as > strided loads/stores independently as it reduces the number of individual > elements (thus reducing the scatter/gather latency). > > So I think in order to make full use of the vector size the induction approach > can work as we construct the index vector appropriately. > > For conflating/reinterpreting a subset of dynamic indices we IMHO need static > code that is dynamically dispatched as described in my previous message. > > I.e. a loop over i_width: > while (rem > 0) > { > if (rem == 8) > "scatter/strided store with 64-bit elements" > if (rem == 4) > "scatter/strided store with 32-bit elements" > rem -= elsz; > } > > I realize that's not something we do at all right now, hence my initial > question. Irrespective of how/if something like that could be implemented (I > can only imagine virtual/composition modes right now), is it even desirable in > any way? I know that it would help our uarch at least. It would be possible to devise a versioning scheme plus eventually an in-loop dispatch for this. We currently cannot version for multiple vector variants, but we need to ensure rem is handled? > > -- > You are receiving this mail because: > You are on the CC list for the bug.