On Wed, 12 Jul 2023, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote: > > > >> Thanks Richard. > >> > >> Is it correct that the better way is to add optabs > >> (len_strided_load/len_strided_store), > >> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to > >> len_strided_load/len_strided_store optab (if it is strided load/store) in > >> expand_gather_load_optab_fn > >> expand_scatter_store_optab_fn > >> > >> of internal-fn.cc > >> > >> Am I right? Thanks. > > > > Yes. > > > > In priciple the vectorizer can also directly take advantage of this > > and code generate an internal .LEN_STRIDED_LOAD ifn. > > Yeah, in particular, having a strided load should relax some > of the restrictions around the relationship of the vector offset > type to the loaded/stored data. E.g. a "gather" of N bytes with a > 64-bit stride would in principle be possible without needing an > Nx64-bit vector offset type.
And it can be used to do the VMAT_ELEMENTWISE/VMAT_STRIDED_SLP in a more efficient way as well. We never got around using gather/scatter for these (because in practice those tend to be slower than what we do now there). Richard.