https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119286

--- Comment #3 from Andrew Stubbs <ams at gcc dot gnu.org> ---
The RDNA consumer devices, such as gfx1100, support permute for V32 and
smaller, but not V64. Gather/scatter should be able to load from arbitrary
addresses, but synthesising a vector with those addresses may run into the
permutation issue. If that's not the issue then I don't know why it wouldn't
work.

The GFX9/CDNA devices support V64 properly.

As for the scalar loads, we never actually implemented any costs. In general,
loading a vector elementwise isn't actually worse than just doing the scalar
algorithm (very slow on a GPU), so we have focused on making the vector stuff
work .... except that RDNA has the hardware limitations.

Reply via email to