https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> The best strathegy for GCN would be to gather V4QImode aka SImode into the
> V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> doing consecutive loads isn't a good strategy here.

I don't fully understand what you're trying to say here, so apologies if you
knew all this already and I missed the point.....

In general, on GCN V4QImode is not in any way equivalent to SImode (when the
values are in registers). The vector registers are not one single string of
re-interpretable bits.

For the same reason, you can't load a value as V64QImode and then try to
interpret it as V16SImode. GCN vector registers just don't work like
SSE/Neon/etc.

When you load a V64QImode vector, each lane is extended to 32 bits, so what you
actually get in hardware is a V64SImode vector.

Likewise, when you load a V4QImode vector the hardware representation is
actually V4SImode (which in itself is just V64SImode with undefined values in
the unused lanes).

Reply via email to