On Wed, 12 Jul 2023, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >
> >> Thanks Richard.
> >> 
> >> Is it correct that the better way is to add optabs 
> >> (len_strided_load/len_strided_store),
> >> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
> >> len_strided_load/len_strided_store optab (if it is strided load/store) in
> >> expand_gather_load_optab_fn 
> >> expand_scatter_store_optab_fn
> >> 
> >> of internal-fn.cc
> >> 
> >> Am I right? Thanks.
> >
> > Yes.
> >
> > In priciple the vectorizer can also directly take advantage of this
> > and code generate an internal .LEN_STRIDED_LOAD ifn.
> 
> Yeah, in particular, having a strided load should relax some
> of the restrictions around the relationship of the vector offset
> type to the loaded/stored data.  E.g. a "gather" of N bytes with a
> 64-bit stride would in principle be possible without needing an
> Nx64-bit vector offset type.

And it can be used to do the VMAT_ELEMENTWISE/VMAT_STRIDED_SLP in
a more efficient way as well.  We never got around using gather/scatter
for these (because in practice those tend to be slower than what we
do now there).

Richard.

Reply via email to