https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
We could try the same solution as for atomic: predicate ld/st to only execute
in lane 0, and propagate ld result.

Another solution might be to wrap each ld/st in two bar.warp.sync.

Reply via email to