https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #11 from Tom de Vries <vries at gcc dot gnu.org> ---
So, at this point we know that duplicating the BB containing VOTE_ANY causes
problems in executing. But AFAIU, we do not know why.
Is VOTE_ANY not supposed to be duplicated by design? If so, is there any
documentation of that design, that explains that?
At the nvptx level, VOTE_ANY translates to vote.ballot.b32, which does
cross-lane communication, but has defined behaviour in divergent mode AFAICT.
>From that perspective at least, there's no problem with duplicating VOTE_ANY.
My guess at this point, is that duplicating the block with VOTE_ANY has the
effect that the JIT compiler doesn't recognize control flow divergence before
XCHG_IDX, and fails to insert the proper barrier.
And XCHG_IDX translates to shfl.idx.b32, which has undefined behaviour in
divergent mode.