https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
> 
> --- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #6)
> > The best strathegy for GCN would be to gather V4QImode aka SImode into the
> > V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> > doing consecutive loads isn't a good strategy here.
> 
> I don't fully understand what you're trying to say here, so apologies if you
> knew all this already and I missed the point.....
> 
> In general, on GCN V4QImode is not in any way equivalent to SImode (when the
> values are in registers). The vector registers are not one single string of
> re-interpretable bits.
> 
> For the same reason, you can't load a value as V64QImode and then try to
> interpret it as V16SImode. GCN vector registers just don't work like
> SSE/Neon/etc.
> 
> When you load a V64QImode vector, each lane is extended to 32 bits, so what 
> you
> actually get in hardware is a V64SImode vector.
> 
> Likewise, when you load a V4QImode vector the hardware representation is
> actually V4SImode (which in itself is just V64SImode with undefined values in
> the unused lanes).

I see.  I wonder if there's not one or two latent wrong-code because of
this and the vectorizers assumptions ;)  I suppose modes_tieable_p
will tell us whether a VIEW_CONVERT_EXPR will do the right thing?
Is GET_MODE_SIZE (V64QImode) == GET_MODE_SIZE (V64SImode) btw?
And V64QImode really V64PSImode?

Still for a V64QImode load on { c[0], c[1], c[2], c[3], c[32], c[33], 
c[34], c[35], ... } it's probably best to use a single V64QImode gather 
with GCN then rather than four "consecutive" V64QImode loads and then
element swizzling.

Reply via email to